Dell Technologies recommends the General purpose Worker node configuration as a starting point for most clusters. This configuration is optimized for reliability, provides high performance, and is consistent with recommendations from Cloudera.
Table 13. General purpose Worker node configuration
|Platform||Dell EMC PowerEdge R750 server|
|Chassis||2U 3.5" chassis with up to 12 HDDs (SAS or SATA), two 2.5" rear HDDs (SAS or SATA), adapter PERC|
|Chassis configuration||Riser configuration 6, four 8, two 16 slots|
|Power supply||Dual hot-plug, 1400 W, mixed mode, redundant (1+1) power supplies|
|Processor||Dual Intel Xeon Gold 6348 2.6 G, 28 C/56 T, 11.2 GT/s, 42 M cache, turbo, HT (235 W) DDR4-3200|
|Memory||512 GB - Sixteen 32 GB RDIMM, 3200 MT/s, dual rank|
|OCP network card||Intel E810-XXV dual port 10/25 GbE SFP28 OCP NIC 3.0|
|Extra network card||None|
|Storage controller||Dell HBA355i Adapter RAID controller, full height|
|Disk - HDD||Twelve 4 TB 7.2 K RPM NLSAS 12 Gbps 512n 3.5" hot-plug hard drives|
|Disk - SSD||None|
|Disk - NVMe||3.2 TB enterprise NVMe mixed use, U2, G4, P5600 FlexBay|
|Boot configuration||BOSS-S2 controller card with two M.2 480 GB (RAID 1)|
Dell Technologies recommends the disk volume and partition layouts for this set of machines, which are listed in:
Table 14. General purpose Worker node volume layout
|Operating system||RAID 1||Two 480 GB M.2 SSD||0|
|HDFS data||No RAID||Twelve 4 TB NL-SAS||1-12|
|Temporary data||No RAID||One 3.2 TB NVMe||13|
Table 15. General purpose Worker node file system layout
| boot ||1024 MB||ext4||0||Primary||This partition contains BIOS start-up files that must be within first 2 GB of disk.|
| / ||100 GB||ext4||0||LVM||This partition contains the root file system.|
| swap ||4 GB||swap||0||swap||This partition contains the operating system swap space partition.|
| /home ||1 GB||ext4||0||LVM||This partition contains the user home directories.|
| /var ||~350 GB||ext4||0||LVM||This partition contains variable data like system logging files, databases, mail and printer spool directories, transient, and temporary files.|
| /data/<n> ||4 TB||ext4||1-12||Primary||This partition is used for HDFS data as 12 individual file systems.|
| /data/ssd ||3.2 TB||ext4||13||Primary||This partition is used for temporary files.|
The General purpose Worker node configuration is sized for a typical mix of storage and computes in a Cloudera CDP Private Cloud Base cluster.
Two network ports are included for connection to the Cluster Data network.
Two SSDs in a RAID 1 configuration using the boot optimized storage system (BOSS) are used for the operating system volume. The swap partition is small since swapping causes excessive latency for running jobs. The home directories are allocated in a separate, small partition since user files should not be stored on Worker nodes. Most of the storage is allocated to the /var partition for runtime files. You can use LVM to adjust the storage allocation between /, /home, and /var for specific needs.
Twelve 3.5” NL-SAS hard drives are used for the primary data storage. These drives are mounted as individual partitions and provide approximately 48 TB of raw storage. Cloudera supports a maximum of 100 TB per node for HDFS storage. You can increase the drive size to 8 TB for maximum storage capacity, but doing so has a negative impact on cluster performance. A larger HDFS storage capacity increases the time that is required for background scans and block reports. It also increases the recovery time should a node fail.
A single NVMe drive is used for storage of temporary files such as MapReduce temporary and spill files, and Spark cache. You can also use this drive for HBase tiered cache or tiered HDFS storage. You can also increase the size of this drive or add an additional U.2 NVMe drive as necessary.
The recommended memory size of 512 GB is intended for Worker nodes that run jobs which can benefit from additional memory such as Spark and HBase region servers. You can reduce the memory allocation for nodes that primarily provide storage services with minimal compute capability.