This section includes considerations and best practices for configuring CloudPools.
CloudPools settings can be changed either on the CloudPools setting tab or on a per-file-pool policy from the OneFS WebUI. It is highly recommended to change these settings on a per-file-pool policy. The following list includes general considerations and best practices for CloudPools settings.
- Encryption: Encryption is an option that can be enabled either on the PowerScale cluster or on ECS. The recommendation is to enable encryption on the PowerScale cluster instead of on ECS. If the average CPU is high (greater than 70%) on the PowerScale cluster, the encryption can be enabled on ECS instead of on the PowerScale cluster. Encryption adds an additional load on the PowerScale cluster. Encryption can also impact the CloudPools archive and recall performance.
- Compression: Compression is an option that can be enabled on the PowerScale cluster, in which file data is compressed before sending it to ECS. ECS will automatically compress the file data if it has not already been compressed to optimize space utilization. If network bandwidth is a concern, the recommendation is to enable compression on the PowerScale cluster to save network resources. When the compression is disabled on the PowerScale cluster, ECS will automatically compress the file data. Compression adds an additional load on the PowerScale cluster which means it might take more time to archive files from PowerScale storage to ECS.
- Data retention: The recommendation is to explicitly set the data retention for the file data being archived from the PowerScale cluster to ECS. If the SmartLink files are backed up with SyncIQ or NDMP, the data retention defines how long the cloud objects remain on the ECS. Once the retention period has passed, the PowerScale cluster sends a delete command to ECS. The ECS marks the associated cloud objects for deletion. The delete process is asynchronous and the space is not reclaimed until garbage collection completes. This process is a low-priority background process, which might take days to fully reclaim the space depending on how busy the system is.
- Local data cache: If the storage space is limited on the PowerScale cluster, the recommendation is to set lower values for the Writeback Frequency and Cache Expiration. This option reduces the time to keep file data in the local data cache and frees up storage space sooner on the PowerScale cluster.
File pool policies define what data will be archived from the PowerScale cluster to ECS. Consider the following details about file pool policies:
- Ensure the priority of file pool policies is set appropriately. Multiple file pool policies can be created for the same cloud storage account. When the SmartPools job runs, it processes file pool policies in priority order.
- In terms of freeing up storage space on the PowerScale cluster, the recommendation is not to archive small files that are less than 32 KB in size.
- If the files need to be updated frequently, the recommendation is not to archive those files.
- Previous versions of OneFS 8.2.0, OneFS supports up to 128 file pool policies (SmartPools and CloudPools combined). OneFS 8.2.0 and later supports up to 256 file pool policies. The recommendation is not to exceed 30 file pool policies per PowerScale cluster.
- If the file pool policy is updated, it has no impact on the files already archived. It will only affect the files to be archived when the SmartPools job next runs.
- If multiple file pool policies are configured for CloudPools, it is recommended to configure the same number of CloudPools storage accounts, CloudPools on PowerScale, and replication groups on ECS. Each of the file pool policies targets a different bucket and keeps data separated.
- Archiving based on Modified or Created times rather than Accessed time, which results in files that are used often including applications, libraries and scripts. So, care should be taken to exclude these types of files from being archived to the cloud, which would result in delays for clients/users loading these applications. One example is archiving users’ home directories and the home directories contain files that are created once but accessed often.
More considerations include:
- Deduplication: CloudPools can archive deduped files from a PowerScale cluster to cloud storage. However, undeduped files will be created when recalling those files from the cloud to the PowerScale cluster. For more information about deduplication within OneFS, see the Next-Generation Storage Efficiency with Dell PowerScale SmartDedupe white paper.
- Small file storage efficiency (SFSE): CloudPools and SFSE cannot work together. For PowerScale clusters using CloudPools, any SmartLink files cannot be containerized or packed. It is best practice to not archive small files that will be optimized using SFSE. The efficiencies gained from implementing SFSE for small files outweigh the storage advantages gained from archiving them to the cloud using CloudPools. For more information about the Small File Storage Efficiency feature of OneFS, see the Dell PowerScale OneFS: Data Reduction and Storage Efficiency white paper.
- Network proxy: When a PowerScale cluster cannot connect to the CloudPool storage target directly, network proxy servers can be configured for an alternate path to connect to the cloud storage.
- SmartConnect: If users access SmartLink files regularly through a specific node, clogging the inline access path might impact client performance. You can configure PowerScale SmartConnect for load-balancing connections for the cluster. For more information about SmartConnect, see the Dell PowerScale: Network Design Considerations white paper.
- Cloud storage account: Do not delete a cloud storage account that is in use by archived files. Any attempt to open a SmartLink file associated with a deleted account will fail. In addition, NDMP backup and restore and SyncIQ failover and failback will fail when a cloud storage account has been deleted.
- Cloud objects and data retention: Cloud objects are crucial for SmartLink files. Any attempt to open a SmartLink file associated with deleted cloud objects will fail. OneFS checks data retention and the reference count for cloud objects before garbage collection. When data retention has expired and there is no reference count for cloud objects, cloud objects will be deleted through garbage collection. Data retention is a concept used to determine the Date of Death (DoD) setting for objects that support a SmartLink file. DoD is used to trigger garbage collection only if the reference count is zero for a file on the cluster only. The reference count is a concept used to determine whether cloud objects are associated with SmartLink files, including SmartLink files in the snapshots, SyncIQ backup, and NDMP backup. The considerations include:
- Data retention periods include cloud data retention period, incremental backup retention period for NDMP incremental backup and SyncIQ, and full backup retention period for NDMP only. If more than one period applies to a SmartLink file, the longest period is applied.
- If a SmartLink file is unchanged through multiple SyncIQ backups or NDMP backups, its data retention will remain unchanged.
- Data retention is set or updated on any event that changes the backed-up version of a file or the state of the SmartLink file.
- If a SmartLink file is changed and incrementally backed up, its data retention is set by calculating the current time plus incremental backup retention period.
- If a SmartLink file is recalled, the reference count is removed, and its data retention is set by calculating the current time plus cloud data retention period. Its cloud objects are deleted through garbage collection after its data retention has expired.
- If a SmartLink file is deleted, its data retention is set by calculating the current time plus cloud data retention period. If cloud objects are still associated with snapshots, SyncIQ backup, or NDMP backup, its cloud objects are not deleted through garbage collection after its data retention has expired.
- OneFS upgrade (CloudPools 1.0 to CloudPools 2.0): Before beginning the upgrade, check the OneFS CloudPools upgrade path information shown in the following table. See ECS configuration to ensure that proper configurations of DNS, Load balancer, ECS BaseURL, and CloudPools URI are accurate.
Table 3. OneFS CloudPools upgrade path
8.0.x or 8.1.x | Strongly discouraged | OK if needed but recommend 8.2.2 | Strongly recommended | Strongly recommended |
Note: Contact your Dell representative if you plan to upgrade OneFS to 8.2.0.
In a SyncIQ environment with unidirectional replication, the SyncIQ target cluster should be upgraded before the source cluster. The reason is that OneFS allows the CloudPools-1.0-formatted SmartLink files to be converted into CloudPools-2.0-formatted SmartLink files through a post-upgrade SmartLink conversion process. Otherwise, SyncIQ policy need to be reconfigured to deep copy but deep copy will cause archived file content to read from the cloud and replicated. In a SyncIQ environment with bi-directional replication, it is recommended to disable SyncIQ on both source and target clusters and upgrade both source and target clusters simultaneously. You can then reenable SyncIQ on both source and target clusters once the OneFS upgrades have been committed on both source and target clusters. Depending on the number of SmartLink files on the target DR cluster and the processing power of that cluster, the SmartLink conversion process can take considerable time.
Note: No need to stop SyncIQ and Snapshot during the upgrade in a SyncIQ environment with unidirectional replication. SyncIQ must resynchronize all converted stub files, it might take SyncIQ some time to catch up with all the changes.
To check the status of the SmartLink upgrade process, run the following command, substituting the appropriate job number.
# isi cloud job view 6
ID: 6
Description: Update SmartLink file formats
Effective State: running
Type: smartlink-upgrade
Operation State: running
Job State: running
Create Time: 2019-08-23T14:20:26
State Change Time: 2019-09-17T09:56:08
Completion Time: -
Job Engine Job: -
Job Engine State: -
Total Files: 21907433
Total Canceled: 0
Total Failed: 61
Total Pending: 318672
Total Staged: 0
Total Processing: 48
Total Succeeded: 21588652
Note: CloudPools recall jobs will not run while SmartLink upgrade or conversion is in progress.
For Not All Nodes on Network (NANON) cluster, it is recommended to get the unconnected nodes connected to the network before starting the SmartLink conversion. Also, you need disable SnapDelete until the SmartLink conversion is completed.