Home > Storage > PowerScale (Isilon) > Product Documentation > Storage (general) > Dell PowerScale: Considerations and Best Practices for Large Clusters > Best practices checklist
For optimal large cluster operation and performance, we recommend observing the following best practices. Note that this information will likely be covered elsewhere in this paper.
When it comes to architecting and scaling large OneFS powered clusters, plan ahead and strive for simplicity.
Just because you can build a large cluster does not necessarily mean you should.
For high-performance workloads, consider a pod architecture.
Undertake a thorough analysis of any workloads and applications that any large cluster will be supporting.
Define, implement, and regularly test a data protection and recover strategy and business continuance plan.
Maintain sufficient free space and pay attention to data ingest rate. Keep cluster capacity utilization (hard drive and SSD) below 85%.
Ensure Virtual Hot Spare and SmartPools spillover both remain enabled (the default).
For large PowerScale clusters, use an Ethernet back end network whenever possible.
When building a new heterogeneous cluster, deploy the faster nodes first so that the system files (under /ifs/,ifsvar) are on the most performant pool.
Where possible, connect all nodes to a front-end Ethernet network (avoid NANON).
If SmartPools is licensed, ensure that spillover is enabled (default setting).
Implement and maintain a thorough cable coloring, naming, and management convention.
Keep a cluster maintenance and change log.
The key for file and directory layout always revolves around balance. Keep the directory tree structure and its file contents as uniform as possible.
Manage your data: Archive infrequently accessed data and delete unused data.
Maintain appropriate data protection levels as the cluster grows
Record your original settings before making any configuration changes to OneFS or its data services.
Monitor cluster capacity and data ingest rate.
Ensure that all desired data services are licensed and configured.
Observe NFS and SMB connection limits.
Many cluster configuration settings are global and have cluster-wide effects. If you consider changing cluster-wide configuration settings, be sure that you fully understand the global settings and their implications
Manage snapshot creation and deletion schedules.
Set up SmartConnect for load balancing and use Round Robin as the balancing policy.
Use SmartQuotas to understand, predict, control and limit storage usage.
Avoid running Job Engine jobs at ‘HIGH’ impact on large clusters.
If using SmartPools tiering, reconfigure the Storage Target field from “anywhere” to a specific tier or node pool to direct ingest to a performance node pool or tier.
Ensure the SmartPools job only runs during off-hours.
Add a cluster to an InsightIQ monitoring instance, assuming the cluster is no more than 80 nodes is size.
Deploy a lab cluster or OneFS Simulator environment to test and validate any new cluster configurations before making changes that affect the production environment.
Confirm that remote support functions work correctly through Secure Remote Services and internal email/SNMP notifications.
Upgrade OneFS to a newer release at least once a year.
Configure and pay attention to cluster events and alerts and/or monitor log files and SNMP MIBs and traps.
Regularly run and review cluster health check reports.
Keep the node and drive firmware as up to date as possible. This is especially important with PowerScale Gen6 hardware.
Sign up for product updates on the Dell support site for notification on ETAs, KBs, new releases, and breaking issues.
While the best practice is to keep clusters up-to-date on OneFS releases, when that is not possible at least look at the new versions’ release notes to determine if they contain any bug fixes pertinent to your workflow.
Capacity planning: Use FSA and IIQ tools to plan ahead and never let your cluster get too full given the performance impacts (and recovery efforts required) you could run into.
Ensure that alerting is properly configured and that not only you (and your team) are receiving connect homes, but that Dell is as well. When support cases are opened, we take quick action on them even when you are busy.
Use SupportAssist or Secure Remote Services for log uploads and remote access.
If you have multiple node types/generations, choose wisely on where to tier data to avoid SmartPools job impacts, or impacts from tier capacity imbalance.
Ensure your nodes are running well within temperature and humidity tolerances, not on the edges.