Cluster sizing and scaling are two different but related considerations. Sizing is concerned with ensuring the cluster meets the workload requirements for storage and processing throughput. Scaling is concerned with cluster growth over time as capacity needs increase.
The architecture is a parallel scale-out system with decoupled compute and storage. Some sizing requirements can be addressed through scaling while others must be addressed through node level sizing.
Sizing and scaling of a cluster are complex topics that require knowledge of the workloads. This section accentuates the main considerations that are involved but does not provide detailed recommendations for workload sizing. Design guides for specific workloads running on the platform include workload specific sizing guidance. Your Dell Technologies or authorized partner sales representative can help with detailed sizing calculations.
There are many parameters that are involved in cluster sizing. The primary parameters are:
- Storage capacity
- Storage capacity is usually the first parameter that is used to size a cluster. Calculating storage capacity is important and straightforward. However, storage capacity should be calculated while taking the other sizing parameters into account to maintain a balance between storage and processing capability. The use of decoupled storage and compute simplifies this consideration since the balance can be adjusted at any time during the life of the cluster.
- Data volumes and growth rates
- Data volume and growth rates each have multiple impacts on cluster sizing. data stack storage capacity should account for growth due to data ingestion and growth of ingest volumes over time. Data ingestion also impacts network utilization. Since the modern data stack storage is external to the cluster nodes, network bandwidth is required to access it. The processing throughput requirements must be considered as well as the data size.
- Memory and processor capacity
- Memory and processor requirements for jobs running on the cluster must be considered when sizing. Memory and processor capacity increases as nodes are added to the cluster. You can create heterogenous node configurations for workloads with specific requirements, which may be necessary.
- Service-level agreements
- Production cluster sizing must meet any performance requirements that SLAs specify. Critical path jobs that must meet a specific execution time or throughput may require adjusting the cluster sizing and balance between compute and storage accordingly. Overall cluster throughput is as important as storage capacity, and often influences the number of nodes independent of the required storage capacity.