Dell Technologies recommends the General purpose Worker node configuration as a starting point for most clusters. This configuration is optimized for reliability, provides high performance, and is consistent with recommendations from Cloudera.
Table 14. General purpose Worker node configuration Platform | PowerEdge R7515 server |
Chassis | 2U 3.5 in. chassis with up to 12 hot-plug hard drives |
Chassis configuration | Riser configuration 2, two 16 full-height plus two 16 low-profile PCIe slots |
Power supply | Dual hot-plug, 750 W redundant (1+1) power supplies |
Processor | AMD EPYC 7413 2.65 GHz, 24 C/48 T, 128 M cache (180 W) DDR4-3200 |
Memory | Eight 32 GB RDIMM, 3200 MT/s, dual rank, 16 Gb BASE |
Persistent memory | None |
OCP network card | Broadcom 57414 dual port 25 GbE OCP SFP28 LOM mezzanine card |
Extra network card | Mellanox ConnectX-5 dual port 10/25 GbE SFP28 PCIe full-height adapter |
Storage controller | Dell HBA330 12 Gbps SAS HBA controller (non-RAID), minicard |
Disk - HDD | Ten 8 TB SAS ISE 12 Gbps 7.2 K 512e 3.5 in. HDD |
Disk - SSD | Two 3.84 TB SAS 12 Gbps MU FIPS-140 PM6 512e 2.5 in. SSD with 3.5 in. hybrid carrier 3 DWPD |
Disk - NVMe | None |
Boot configuration | BOSS-S2 controller card with two 480 GB M.2 SSD (RAID 1) |
Dell Technologies recommends the disk volume and partition layouts for this set of machines, which are listed in:
Table 15. General purpose Worker node volumes Operating system | RAID | Two 480 GB M.2 SSD | 0 |
HDFS data | No RAID | Ten 8 TB SAS HDD | 1-10 |
Temporary data | No RAID | Two 3.84 TB SAS SSD | 11-12 |
Table 16. General purpose Worker node partitions boot | 1024 MB | ext4 | 0 | Primary | This partition contains BIOS start-up files that must be within first 2 GB of disk. |
/ | 100 GB | ext4 | 0 | LVM | This partition contains the root file system. |
swap | 4 GB | swap | 0 | swap | This partition contains the operating system swap space partition. |
/home | 1 GB | ext4 | 0 | LVM | This partition contains the user home directories. |
/var | ~350 GB | ext4 | 0 | LVM | This partition contains variable data like system logging files, databases, mail and printer spool directories, transient, and temporary files. |
/data/<n> | 8 TB | ext4 | 1-10 | Primary | These partitions are used for HDFS data as 12 individual file systems. |
/data/ssd1 | 3.84 TB | ext4 | 11 | Primary | This partition is used for temporary files. |
/data/ssd2 | 3.84 TB | ext4 | 12 | Primary | This partition is used for temporary files. |
The General purpose Worker node configuration is sized for a typical mix of storage and computes in a Cloudera CDP Private Cloud Base cluster.
Two network ports are included for connection to the Cluster Data network.
Two SSDs in a RAID 1 configuration using the Boot Optimized Storage System (BOSS) card are used for the operating system volume. The swap partition is small since swapping causes excessive latency for running jobs. The home directories are allocated in a separate, small partition since user files should not be stored on Worker nodes. Most of the storage is allocated to the /var partition for runtime files. You can use LVM to adjust the storage allocation between /, /home, and /var for specific needs.
Ten 3.5 in. NL-SAS hard drives are used for the primary data storage. These drives are mounted as individual partitions and provide approximately 80 TB of raw storage. Cloudera supports a maximum of 100 TB per node for HDFS storage. If you want, you can reduce the drive size. A larger HDFS storage capacity increases the time that is required for background scans and block reports. It also increases the recovery time should a node fail.
Two SSDs are used for storage of temporary files such as MapReduce temporary and spill files, and Spark cache. You can also use this drive for HBase tiered cache or tiered HDFS storage. You can also increase the size of this drive or add an additional U.2 NVMe drive as necessary.
The recommended memory size of 256 GB is intended for Worker nodes that run jobs which can benefit from additional memory such as Spark, Impala, and HBase region servers. You can reduce the memory allocation for nodes that primarily provide storage services with minimal compute capability.