Data layout and tiering recommendations

Thank you for your feedback!

OneFS SmartPools enables large multi-tier clusters to be created using high performance nodes with SSD for performance tiers and high-capacity SATA-only nodes for the high-capacity archive tier. For example, a file pool policy could move files from the performance tier to a more cost-effective capacity-biased tier after the desired period of inactivity.
When using SmartPools tiering with large heterogeneous clusters, there are some cardinal guidelines to be aware of:
File pool rules needs to be carefully crafted:
- Simpler is always better
- Avoid storing snapshots on a slower tier than the head data.
- Be especially careful with any rules that promote data back up to a faster tier after previously being migrated down
In general, slower tiers can impact the performance of faster tiers, especially with I/O sensitive workflows:
- Very fast to fast works well:
For example: PowerScale F710 to H700.
- Streaming or mid-performance to Archive works well:
For example: PowerScale H700 to A300.
- Archive to deep Archive works well:
For example: PowerScale A300 to A3000.
For optimal large cluster performance, we recommend observing the following OneFS SmartPools best practices:
- Ensure that cluster capacity utilization (HDD and SSD) remains below 85% on each pool.
- If the cluster consists of more than one node type, direct the default file pool policy to write to the higher performing node pool. Data can and then be classified and down-tiered as necessary.
- A file pool policy can have three ‘OR’ disjunctions and each term joined by an ‘OR’ can contain at most five ‘AND’s.
- Define a performance and protection profile for each tier and configure it accordingly.
- File pool policy order precedence matters, as the policies are applied on a first match basis (the first file pool policy to match the expression will be the applied policy).
- By default, the SmartPools job runs only once per day. If you create a file pool policy to be run at a higher frequency, ensure the SmartPools job is configured to run multiple times per day.
- Enable SmartPools Virtual Hot Spares with a minimum of 10% space allocation. This ensures that there is space available for data reconstruction and re-protection in the event of a drive or node failure, and generally helps guard against file system full issues.
- Avoid creating hard links to files which will cause the file to match different file pool policies
- If node pools are combined into tiers, the file pool rules should target the tiers rather than specific node pools within the tiers.
- Avoid creating tiers that combine node pools both with and without SSDs.
- The number of SmartPools tiers should not exceed 5.
- Where possible, ensure that all nodes in a cluster have at least one SSD, including nearline and high-density nodes.
- For performance workloads, SSD metadata read-write acceleration is recommended. The metadata read acceleration helps with getattr, access, and lookup operations while the write acceleration helps reduce latencies on create, delete, setattr, mkdir operations. Ensure that sufficient SSD capacity (6-10%) is available before turning on metadata-write acceleration.
- If SmartPools takes more than a day to run, or the cluster is already running the FSAnalyze job, consider scheduling the FilePolicy (and corresponding IndexUpdate job) to run daily and reducing the frequency of the SmartPools job to monthly. The following table provides a suggested job schedule when deploying FilePolicy:
Job

Schedule

Impact

Priority

FilePolicy

Every day at 22:00

LOW

6

IndexUpdate

Every six hours, every day

LOW

5

SmartPools

Monthly – Sunday at 23:00

LOW

6
- Avoid using the ‘isi set’ command or the OneFS Filesystem Explorer to change file attributes, such as protection level, for a group of data. Instead use SmartPools file pool policies.
As an extension to tiering, CloudPools enables cluster data to be archived to cloud storage using the same file pool policy engine as SmartPools. Supported cloud providers include Microsoft Azure, Amazon S3, DELL ECS, and native OneFS.
For large clusters, CloudPools can be used to reduce node pool percentage utilization by transferring cold archive data to the cloud.
More information about OneFS data tiering and file pool policies is available in the Storage Tiering with Dell PowerScale SmartPools white paper.

Job	Schedule	Impact	Priority
FilePolicy	Every day at 22:00	LOW	6
IndexUpdate	Every six hours, every day	LOW	5
SmartPools	Monthly – Sunday at 23:00	LOW	6

Your Browser is Out of Date

Data layout and tiering recommendations

Data layout and tiering recommendations