The configurations in this chapter, including the node specifications and storage and networking configurations, are designed to give several options for supporting CDP Private Cloud Base. They are also designed to be largely compatible with the addition of CDP Private Cloud Experiences. Other options and configurations are possible. Contact a Dell Technologies sales specialist to help with your infrastructure planning and design if:
You are specifically planning ahead to add CDP Private Cloud Experiences later.
You want to discuss any design choices unique to your situation.
The cluster network is designed to meet the needs of a high performance and scalable cluster, while providing redundancy and access to management capabilities.
The architecture is a leaf and spine model that is based on 25 GbE network technology. It uses Dell EMC PowerSwitchS5248F-ON switches for the leaves, and Dell EMC PowerSwitchZ9264F-ON switches for the spine.
IPv4 is used for the network layer. The architecture does not currently support or allow for the use of IPv6 for network connectivity.
Dell Technologies recommends the use of PowerSwitch hardware for the cluster networking.
Dell EMC PowerSwitch networking provides:
Disaggregated hardware and software switching solutions
Support for Open Network Install Environment (ONIE), enabling zero-touch installation of alternate network operating systems
Your choice of network operating system to help simplify data-center fabric orchestration and automation.
A broad ecosystem of applications and tools, both open-source and Linux-based, providing more options to optimize and manage your network
Dell EMC high-capacity network fabrics are cost-effective and easy to deploy, providing a clear path to a software-defined data center. They offer high density for 25/40/50/100 GbE deployments in top-of-rack, middle-of-row, and end-of-row deployments.
Dell EMC Networking OS10 Enterprise Edition is a network operating system supporting multiple architectures and environments, as shown in Dell EMC Networking OS10.
OS10 allows multilayered disaggregation of network functions. OS10 contributions to open source provide users with the freedom and flexibility to pick their own third-party networking, monitoring, management, and orchestration applications. OS10 Enterprise Edition bundles an industry-hardened networking stack featuring standard L2 and L3 protocols over established northbound interfaces such as CLI and SNMP. The Switch Abstraction Interface (SAI) and Control Plane Services (CPS) abstraction layers provide disaggregation:
At the Network Processing Unit (NPU)
For the software applications written atop the Linux kernel
Network fabric architecture
The cluster network is designed to meet the needs of a high performance and scalable cluster, while providing redundancy and access to management capabilities. The architecture is a leaf and spine model that is based on 25 GbE network technology. It uses Dell EMC PowerSwitchS5248F-ON switches for the leaves, and Dell EMC PowerSwitchZ9264F-ON switches for the spine. IPv4 is used for the network layer. The architecture does not support or allow for the use of IPv6 for network connectivity.
Also, servers are connected by their iDRAC port to a 1 GbE management switch providing out-of-band access to the iDRAC interface.
Server node connections
Server connections to the network switches for the Data network use Ethernet technology.
All data connections in the cluster use industry-standard 25 GB Ethernet networking. Dell Technologies recommends this technology when deploying on Dell EMC PowerEdge R740xd and Dell EMC PowerEdge R640 servers.
Edge Nodes have an additional network connection available. This connection facilitates high-performance cluster access between applications running on those nodes, and the optional Edge network.
Server connections to the BMC network use a single connection from the iDRAC port to an S3148 management switch in each rack.
For clusters larger than a single pod, an aggregation layer is required. The aggregation layer can be implemented at either Layer 2 (L2) or Layer 3 (L3). The choice depends on the initial size and planned scaling. Layer 2 is preferred for lower cost and medium scalability, and can support approximately 500 nodes.
Layer 3 aggregation is recommended for:
Larger initial deployments over 500 nodes
Deployments where extreme scale up is planned to about 1500 nodes
Instances where the cluster must be co-located with other applications in different racks
The scalability depends on the switches that are used and the oversubscription ratio, and is summarized in Cluster node counts.
The following sections describe the fabric details:
Each pod uses a Dell EMC PowerSwitchS5248F-ON as the first layer switch.
Note: The pod switches are often called Top of Rack (ToR) switches. However, this architecture splits a physical rack from a logical pod.
The S5248F-ON is a disaggregated hardware and software data center fixed switch. It can provide a cumulative bandwidth of 4.0 Tbps of throughput at full duplex using high-density 25/100GbE ports. It is configured with:
2 ports of 100 GbE (QSFP28-DD)
4 ports of 100 GbE (QSFP28)
48 ports of 25 GbE (QSFP28)
The Dell EMC PowerSwitchZ9264F-ON is a multiple-rate 100 GbE, 2U spine switch optimized for high performance, ultralow latency data center requirements. The PowerSwitchZ9264F-ON can provide a cumulative bandwidth of 7.4 Tbps of throughput at line-rate traffic from every port. It can be configured with up to:
For a single pod, the ToR switches can act as the aggregation layer for the entire cluster. For multiple-pod clusters, a spine layer is required. In this architecture, each pod is managed as a separate entity from a switching perspective. The individual pod ToR switches connect only to the spine switch.
25 GbE cluster aggregation switches
For clusters consisting of more than one pod, this architecture uses the Dell EMC PowerSwitchZ9264F-ON for a spine switch.
The PowerSwitchZ9264F-ON can be used for both Layer 2 and Layer 3 implementations.
The uplink from each S5248F-ON pod switch to the aggregation layer uses four 100 GbE interfaces in a bonded configuration, providing a collective bandwidth of 400 Gb from each pod.
Layer 3 cluster aggregation
The Dell EMC PowerSwitchZ9264F-ON core switch can also be used for aggregation at Layer 3 in larger clusters using 25 GbE.
This architecture uses a different network architecture for the cluster using Layer 3 aggregation that is based on ECMP and a leaf-spine organization. In this configuration, a cluster can scale to over 1500 nodes, with a low 3:1 oversubscription per pod.
In addition to the Cluster Data network, a separate network is provided for cluster management - the iDRAC (or BMC) network.
The iDRAC management ports are all aggregated into a per-rack Dell EMC PowerSwitchS3148 switch, with dedicated VLAN. This aggregation provides a dedicated iDRAC or BMC network, for hardware provisioning and management. Switch management ports are also connected to this network.
If out-of-band management is required the management switches can be connected to the core, or connected to a dedicated Management network.
Core network integration
The aggregation layer functions as the network core for the cluster. In most instances, the cluster connects to a larger core within the enterprise, as displayed in 25 GbE multiple pod networking equipment.
With the Dell EMC PowerSwitchZ9264F-ON, two 100 GbE ports are reserved for connection to the core. Details of the connection are site-specific, and must be determined as part of the deployment planning.
Layer 2 and Layer 3 separation
The Layer 2 and Layer 3 boundaries are separated at either the pod or the aggregation layer. Either option is equally viable. This architecture is based on Layer 2 for switching within the cluster. The blue and green colors in Multiple pod view with Layer 3 ECMP represent the Layer 2 and Layer 3 boundaries.
25 GbE network equipment summary
The number of cables that are needed for a cluster are summarized in:
Note: 25 GbE node connections typically use a QSFP+ to Quad QSFP breakout cable. The cable count is typically one-fourth the number of connections in Per-node network cables required.
Dell EMC PowerEdge rack server hardware configurations
Cloudera CDP Private Cloud Base supports the Dell EMC PowerEdge R640 and Dell EMC PowerEdge R740xd servers.
Infrastructure Nodes are used to host the critical cluster services, and the configuration is optimized to reduce downtime and provide high performance. The recommended configurations for several sizing options are listed in Infrastructure Nodes configuration.
Table 15. Infrastructure Nodes configuration
Dell EMC PowerEdge R640 server
2.5" chassis with up to 10 hard drives and three PCIe slots
Riser configuration 4, 2 x 16 LP
Dual, hot-plug, redundant power supply (1+1), 750 W
Mellanox ConnectX-4 LX dual port 10/25 GbE SFP28 adapter, rNDC
Dell EMC PERC H740P RAID controller, 8 GB NV cache, mini card
Disk - HDD
8 x 2 TB 7.2 K RPM SATA 6 Gbps 512n 2.5" hot-plug hard drives
Disk - SSD
2 x 800 GB SSD SAS mix use 12 Gbps 512e 2.5" hot-plug AG drives, 3 DWPD, 4380 TBW
Disk - NVMe
From PERC controller
The Infrastructure Nodes (Master Node 1, Master Node 2, Master Node 3, and Edge Node) are configured with multiple partitions and file systems, using all available drives. Each partition is optimized for both performance and reliability.
Dell Technologies recommends the disk volume and partition layouts for this set of machines, which are listed in:
Contains BIOS boot files that must be within first 2 GB of disk
2 disk RAID 1
Root file system
2 disk RAID 1
Operating system swap space partition
2 disk RAID 1
User home directories
2 disk RAID 1
All available space
Contains variable data like system logging files, databases, mail and printer spool directories, transient and temporary files
ZooKeeper data log directory (dataLogDir). Typically the path is /var/lib/zookeeper, but is now /journal/zookeeper.
NameNode edits directories (dfs.namenode.edits.dir). Typically the path is /data/1/dfs/nn, but is now /journal/dfs/nn. It defaults to the same as dfs.name.dir, and must be changed.
4 disk RAID 10
Operational data directory for databases. This directory primarily contains the Cloudera Manager databases, since the PostgreSQL data directory (PGDATA) is typically /var/lib/pgsql. Alternatives to PostgreSQL should be configured to store their datafiles here.
Worker Nodes are the workhorses of the cluster, combining compute and storage. Depending on their intended workload they can be optimized for storage-heavy, compute-heavy, or mixed loads.
CDP Private Cloud Base supports various hybrid solutions where compute tasks are separated from data storage and where data can be accessed from remote clusters. This architecture provides the following alternative Worker Node configurations:
These options are 2U chassis configurations using large form factor (LFF) 3.5” drives for data. They provide dense storage capability with high performance compute and solid-state storage for fast caching of temporary data.
Table 18. Storage-centric server configuration
Dell EMC PowerEdge R740xd server
Chassis with up to 12 x 3.5" HDD, 4 x 3.5" HDDs on MP and 4 x 2.5" HDDs on FlexBay for 2 CPU configuration
Riser configuration 2, 3 x 8, 1 x 16 slots
Dual, hot-plug, redundant power supply (1+1), 1100 W