Home > Workload Solutions > Data Analytics > Guides > Design Guide—Dell Validated Design for Analytics—Data Lakehouse > Sizing guidelines
Example cluster configurations lists some cluster level starting points for possible deployments.
Configuration | Proof of concept (POC) | Pilot | Small and medium business (SMB) | Enterprise |
Control plane nodes | 0 | 3 | 3 | 3 |
Worker nodes | 3 | 4 | 10 | 20 |
Available memory | 1536 GB | 2048 GB | 5120 GB | 10,240 GB |
Available physical cores | 192 | 256 | 640 | 1280 |
Available server storage | 69 TB | 92 TB | 230 TB | 460 TB |
The proof of concept (POC) configuration is a minimal configuration for basic evaluation. In this scenario three worker nodes are used to host the control plane, runtime services, and workloads. The worker nodes host the Symcloud Manager, Compute, and Storage roles in addition to any workloads. This configuration provides limited resources for workloads but is adequate for basic functionality evaluation. More worker nodes can be added to this configuration. For anything larger than five workers Dell Technologies recommends starting with, or upgrading to, the pilot configuration.
Converting a POC cluster to a production grade cluster requires redeploying the software, although the worker node hardware can be reused.
The pilot configuration is a minimal production grade configuration. In this scenario three control plane nodes host the Symcloud Manager roles. Four worker nodes host the Compute and Storage roles along with any workloads. This configuration provides isolation between control plane and runtime functions. Dell Technologies recommends it for preproduction or development and test usage.
The pilot configuration can be scaled up by adding additional worker nodes without redeploying the existing nodes.
The small and medium business (SMB) configuration is a small production grade configuration. Three control plane node host the Symcloud Manager roles, and ten worker nodes are available for workloads. This configuration provides enough resources to support one or two teams running analytics workloads.
The SMB configuration can be scaled up by adding additional worker nodes without redeploying the existing nodes.
The enterprise configuration is a large production grade configuration. Three control plane nodes host the Symcloud Manager roles, and 20 worker nodes are available for workloads. This configuration provides substantial resources for running analytics workloads supporting multiple teams.
The amount of expected lakehouse data primarily determines lakehouse storage sizing. This aspect of the sizing is independent of the compute cluster sizing.
The available network bandwidth between the compute and storage clusters must also be considered. Bandwidth on the storage and compute clusters scales in direct proportion to the number of nodes. However, the dense storage capacity possible with ECS and PowerScale can result in a large storage capacity without enough bandwidth to support the lakehouse data transfer requirements. An analysis of workload data transfer requirements is necessary to correctly size the storage for both capacity and bandwidth.
The architecture is not limited to a single type of lakehouse storage. PowerScale can be used with the HDFS protocol, or ECS can be used with the S3 protocol. Any workload can reference either or both of these storage types. It is also possible to use multiple external PowerScale and ECS storage systems.
The network architecture allows both compute and storage clusters to use the same fabric. This configuration enables the network bandwidth to scale as either storage or compute nodes are added. The bandwidth available to the external storage systems should also be considered when referencing external storage that is not connected to the core Cluster data network.
The control plane node sizing that Dell Technologies recommends in Lakehouse control plane node is adequate for all production clusters and should not be changed. The Symcloud management services must be deployed on three individual nodes.
The control plane services also consume a small quantity of worker node resources. Sizing in this design guide allocates 49 GB and four cores to these services. Approximately 320 GB of storage for the control plane is also required on worker nodes. This space is allocated from the boot drives and does not impact available server storage for user workloads.
All the available cluster resources across all worker nodes are pooled and allocated on demand. This configuration provides an abstraction where workloads can be mapped to available resources independent of the physical node used. Accelerators are considered a resource, and any workload pod that requires accelerator resources must run on a node with an available accelerator.
The recommended worker node sizes in this design are based on general-purpose usage. These worker nodes can support various analytics workloads without modification. However, there are scenarios where it is appropriate to change the configurations to match the intended workloads.
Heterogenous node configurations are possible. A cluster can include nodes with differing memory, compute, and storage sizes. The resources from all these nodes are added to the overall resource pool.
From a resource point of view, there is little difference between many small nodes and a few large nodes. If the nodes have enough resources to handle the largest expected pod resource request, the difference between nodes is transparent. However, three additional considerations are involved in this tradeoff; network bandwidth, fault zones, and operational overhead.
Available network bandwidth is proportional to the number of nodes. A few large nodes have less bandwidth than many small nodes, even if the aggregate memory and compute resources are the same. The bandwidth requirements for workloads should be factored into cluster and node sizing.
Fault zones are important for overall reliability of the infrastructure. Although the cluster can continue running when a node fails, resources from that node are lost on failure. Large node configurations in a small cluster can have a substantial impact on available resources when the node fails, even if it is a temporary failure. Sizing should ensure the loss of a node only impacts a small proportion of the overall cluster capacity.
Operational overhead is another consideration for sizing. Every node entails some operational overhead in terms of maintenance and monitoring, so larger nodes can be more efficient. One larger node can also be more energy-efficient than several smaller nodes. Operational capacity should be part of the overall sizing effort.
For parallel, scale-out workloads like Apache Spark the resources are allocated based on availability at the cluster level and multiple workload pods that are launched. As a result, workload pods can run on any physical node that can meet the resource requirements. Depending on the Spark job workload, many small pods or a few large pods may be appropriate. The container platform runtime is flexible in this aspect. It is possible to deploy Spark clusters dynamically based on the job itself, instead of requiring a fixed Spark cluster optimized for many types of jobs.
Some workloads may have large memory requirements that cannot be achieved by scaling out. You may have to increase the memory size in some or all nodes to account for the largest expected memory allocation for that workload.
The platform can support up to 100 pods or worker nodes. Cluster and node sizing should aim for substantially fewer pods than this limit.
The resources requirements for the intended cluster workloads must be factored into node and cluster sizing. Detailed sizing of workload requirements is complex. However, once workload requirements are known, the mapping into cluster requirements is straightforward. The flexibility of the platform also allows for ongoing adjustment and fine-tuning, so the sizing does not have to be exact.
Apache Spark is used here as an example of how workload sizing should be mapped to cluster requirements. Example Spark instance requirements summarizes the resource requirements for three sample Spark clusters.
Spark cluster resources | Small instance | Medium instance | Large instance | |||
Worker | Cluster | Worker | Cluster | Worker | Cluster | |
Number of pods (Spark workers) | 4 | 8 | 12 | |||
Memory (GB) | 8 | 32 | 16 | 128 | 32 | 384 |
Cores | 4 | 16 | 6 | 48 | 8 | 96 |
Lakehouse storage (GB) | 2 | 8 | 8 | 64 | 2 | 24 |
Ephemeral storage (GB) | 8 | 32 | 16 | 128 | 32 | 384 |
In the table above, three clusters with varying resource requirements and scale have been included. The resources for each Spark worker have been specified, and the expected number of worker pods is included. Based on these requirements, the total resource requirements for each cluster are calculated.
For lakehouse storage, the amount of net new
lakehouse storage that is required is used in the calculation. If the jobs are expected to process existing data, no additional storage is required. If the jobs generate data, significant storage may be required. In the large instance only 24 GB of lakehouse storage was estimated, while the medium instance requires 64 GB. The medium instance is expected to generate more data than the large instance even though it uses less compute resources.
Based on these calculations, the cluster level resources can be determined. For this example, the pilot cluster configuration can support four medium Spark clusters before it runs out of cores, or it can support two large instances.