Infrastructure nodes are used to host the critical cluster infrastructure services, including:
- NameNode processes
- YARN ResourceManager
- ZooKeeper
- HBase masters
- Cloudera Manager
- Supporting databases
Dell Technologies recommends the configuration that is listed in Table 11 as a starting point. This configuration is optimized for reliability, provides high performance, and is consistent with recommendations from Cloudera.
Machine function | Component |
Platform | PowerEdge R6515 server |
Chassis | 1U 2.5 in. chassis with up to 10 hard drives, including up to eight SAS or SATA drives, or nine NVMe drives |
Chassis configuration | Riser configuration 0, two 16 LP PCIe slots |
Power supply | Two hot-plug, 550 W redundant power supply (1+1) |
Processor | AMD EPYC 7313P 3.00 GHz, 16 C/32 T, 128 M Cache (155 W) DDR4-3200 |
Memory | Eight 16 GB RDIMM, 3200 MT/s, dual rank |
Persistent memory | None |
OCP network card | Broadcom 57414 dual port 25 GbE OCP SFP28 LOM mezzanine card |
Extra network card | Mellanox ConnectX-5 dual port 10/25 GbE SFP28 PCIe low-profile adapter |
Storage controller | Dell PERC H740P RAID controller, 8 Gb NV cache, minicard |
Disk - HDD | Five 2.4 TB 10 K RPM SAS 12 Gbps 512e 2.5 in. hot-plug HDD |
Disk - SSD | One 1.92 TB SATA read-intensive 6 Gbps 512 2.5 in. hot-plug AG SSD, 1 DWPD |
Two 960 GB SAS mixed use 12 Gbps SSD | |
Disk - NVMe | None |
Boot configuration | From PERC controller |
Dell Technologies recommends the disk volume and partition layouts for this set of machines, which are listed in:
Usage | Volume type | Physical disks | Volume Id |
Operating system | RAID 1 | Two 960 GB SAS SSD | 0 |
HDFS metadata and operational databases | RAID 6 | Five 2.4 TB SAS HDD | 1 |
ZooKeeper and NameNode journal | No RAID | One 1.92 TB SATA SSD | 2 |
Mount point | Size | File system type | Volume Id | Partition type | Description |
/boot | 1024 MB | ext4 | 0 | Primary | This partition contains BIOS start-up files that must be within first 2 GB of disk. |
/ | 100 GB | ext4 | 0 | LVM | This partition contains the root file system. |
swap | 4 GB | swap | 0 | swap | This partition contains the operating system swap space partition. |
/home | 1 GB | ext4 | 0 | LVM | This partition contains user home directories. |
/var | ~850 GB | ext4 | 0 | LVM | This partition contains variable data like system logging files, databases, mail and printer spool directories, transient, and temporary files. |
/journal/zookeeper | 900 MB | ext4 | 2 | LVM | This partition is used for the ZooKeeper data log. The ZooKeeper configuration property dataLogDir must be changed to match this path at installation time. |
/journal/dfs | 900 MB | ext4 | 2 | LVM | This partition is used to store the NameNode transactions (edits) files. The configuration property dfs.namenode.edits.dir must be changed to match this path at installation time. |
/var/lib/dfs | 3 TB | ext4 | 1 | LVM | This partition is used for the NameNode ( fsimage ) table. The configuration property dfs.namenode.name.dir must be changed to match this path at installation time. |
/var/lib/zookeeper | 500 MB | ext4 | 1 | LVM | This partition is used for the ZooKeeper database. The ZooKeeper configuration property dataDir must be changed to match this path at installation time. |
/var/lib/pgsql | ~5 TB | ext4 | 1 | LVM | This partition contains the operational data directory for databases. This directory primarily contains the Cloudera Manager databases, since the PGDATA is typically /var/lib/pgsql . Alternatives to PostgreSQL should be configured to store their datafiles here. |
Dell Technologies recommends this Infrastructure node configuration for Master nodes in the cluster. The configuration is sized to support Master nodes in a production deployment.
Edge nodes and Utility nodes should use this configuration as a starting point. You can change the processor, memory, and storage recommendations to specialize those nodes.
The configuration includes four network ports to provide two ports for the Cluster Data network, and two ports for the Edge network or other external connections.
Two SSDs in a RAID 1 configuration are used for the operating system volume. The swap partition is small since swapping causes excessive latency for critical cluster infrastructure. The home directories are allocated in a separate small partition since user files should not be stored on infrastructure nodes. Most of the storage is allocated to the /var partition for runtime files. You can use LVM to adjust the storage allocation between /, /home, and /var for specific needs.
A six HDD RAID 6 volume is used for most of the Infrastructure node storage. This configuration provides a good balance between performance, data durability, storage efficiency, and administration overhead. This volume is divided into partitions for:
- NameNode file system data
- ZooKeeper data
- Cloudera Manager database storage
- Hive metastore
- Ranger database
- Any other required operational databases
You can use LVM to adjust the storage allocation for specific needs.
An alternative configuration is to set up a four HDD RAID 10 volume for database storage, and a two SSD RAID 1 for NameNode and ZooKeeper data. This configuration option provides slightly better performance for database writes, but adds administration overhead and increases recovery time if drives fail.
A single SSD drive is used for storage of ZooKeeper and quorum journals.