This topic discusses:
Figure 8 depicts the Cloudera Data Platform Data Center high-level architecture.
The cluster environment consists of multiple software services running on multiple physical server nodes.
The implementation divides the server nodes into several roles, and each node has a configuration that is optimized for its role in the cluster. The physical server configurations are divided into two broad classes:
Worker Nodes | Worker Nodes handle the bulk of the Hadoop processing. |
Master Nodes | Master Nodes support services needed for the cluster operation |
A high-performance network fabric connects the cluster nodes together, and separates the core Data network from management functions.
The minimum supported configuration is eight cluster nodes, which include three Master Nodes, one Utility Node, one Edge Node, and three Worker Nodes. Starting with a ten-node cluster with five Worker Nodes is a common practice. The nodes have the roles that are described in Table 5.
Note: All these nodes roles are required.
Node role | Hardware configuration |
Master Nodes | Infrastructure |
Utility Nodes | Infrastructure |
Edge Nodes | Infrastructure |
Worker Nodes | Worker |
Table 6 defines the various nodes.
Node | Definition |
Master Node | Runs all the daemons that are required to manage the cluster storage and compute services |
Worker Node | Runs all the services that are required to store blocks of data on the local hard drives and run processing tasks against that data |
Utility Node | Runs Cloudera Manager and the Cloudera Management Services |
Edge Node | Contains all client-facing configurations and services, including gateway configurations |
Table 7 describes recommended host role assignments for a medium-sized high availability deployment.
Table 7: CDP Data Center nodes and roles
Node | Service |
Master Node 1 | NameNode, JournalNode, FailoverController, YARN ResourceManager, ZooKeeper, JobHistory Server, SPARK History Server, Kudu master |
Master Node 2 | NameNode, JournalNode, FailoverController, YARN ResourceManager, ZooKeeper, Kudu master |
Master Node 3 | JournalNode, ZooKeeper, Kudu master (All require an odd number of masters for high availability.) |
Utility Node 1 | Cloudera Manager, Cloudera Manager Management Service, Hive Metastore, Impala Catalog Server, Impala StateStore, Oozie, ZooKeeper (Requires a dedicated disk), JournalNode (Requires a dedicated disk), Apache Atlas, Apache Ranger |
Edge Nodes | Hue, HiveServer2, Gateway configuration |
Worker Nodes | DataNode, NodeManager, Impalad, Kudu tablet server |
These recommendations for role assignments are intended as a starting point. Depending on the cluster size and the services that are used, the role assignments may be different. See Runtime Cluster Hosts and Role Assignments in the CDP Data Center documentation for more details.
Three distinct networks are used in the cluster.
Table 8 describes the networks and their purposes.
Table 8: Cloudera Distribution for Apache Hadoop network definitions
Network | Description | Available services |
Cluster Data Network | The Data network carries the bulk of the traffic within the cluster. This network is aggregated within each pod, and pods are aggregated into the cluster switch. | The Cloudera Enterprise services are available on this network. Note: The Cloudera Enterprise services do not support multihoming, and are only accessible on the Cluster Data Network. |
iDRAC/BMC Network | The BMC network connects the BMC or iDRAC ports and the out-of-band management ports of the switches. It is used for hardware provisioning and management. This network is aggregated into a management switch in each rack. | This network provides access to the BMC and iDRAC functionality on the servers. It also provides access to the management ports of the cluster switches. |
Edge Network | The Edge network provides connectivity from the Edge Nodes to an existing network, either directly or by the cluster spine switches. | SSH access to Edge Nodes is available on this network. Other application services may be configured and available. |
Connectivity between the cluster and existing network infrastructure can be adapted to specific installations. Common scenarios are:
Figure 9 displays cluster logical networks details.
Figure 9: Cluster logical networks
The architecture is organized into three units for sizing as the Hadoop environment grows. From smallest to largest, they are:
Each has specific characteristics and sizing considerations that are documented in this architecture guide. The design goal for the Hadoop environment is to enable you to scale the environment by adding additional capacity as needed, without replacing any existing components.
A rack is the smallest size designation for a Hadoop environment. A rack consists of the power, network cabling, and data and management switches to support a group of Worker Nodes.
A rack is a physical unit, and its capacity is defined by physical constraints including available space, power, cooling, and floor loading. A rack should use its own power within the data center, independent from other racks, and should be treated as a fault zone. In the event of a rack level failure in a multiple-rack pod or cluster, the cluster will continue to function with reduced capacity.
This architecture uses 12 nodes as the nominal size of a rack, but higher or lower densities are possible. Typically, a rack will contain about 12 nodes using scale-out servers like the Dell EMC PowerEdge R640 and Dell EMC PowerEdge R740xd. The node density of a rack does not affect overall cluster scaling and sizing, but it does affect fault zones in the cluster.
A pod is the set of nodes that are connected to the first level of network switches in the cluster, and consists of one or more racks. A pod can include a smaller number of nodes initially, and expand up to these maximums over time.
A pod is a second-level fault zone above the rack level. If a pod level failure occurs in a multiple pod cluster, the cluster continues to function with reduced capacity. A pod can support enough Hadoop server nodes and network switches for a minimum commercial scale installation.
In this architecture, a pod supports up to 36 nodes (nominally three racks). This size results in a bandwidth oversubscription of 2.25:1 between pods in a full cluster. The size of a pod can vary from this baseline recommendation. Changing the pod size affects the bandwidth oversubscription at the pod level, the size of the fault zones, and the maximum cluster size.
A cluster is a single Hadoop environment that is attached to a pair of network switches providing an aggregation layer for the entire cluster.
A cluster can range in size from a single pod in a single rack to many pods in multiple racks. A single pod cluster is a special case and can function without an aggregation layer. This scenario is typical for smaller clusters before additional pods are added.
In this architecture, the maximum number of nodes in a cluster depends on the choice of Layer 2 or Layer 3 switching, and the switch models used. See Cluster node counts for the limits.
The minimum supported configuration is eight nodes:
Although a minimum of one Edge Node is required per cluster, larger clusters and clusters with high ingest volumes or rates may require additional Edge Nodes. Cloudera recommends a baseline of one Edge Node for every 20 Worker Nodes.
Table 9 shows the recommended number of nodes per pod and pods per cluster. It also shows some alternatives for cluster sizing with different bandwidth oversubscription ratios.
Note: The network design in this guide uses a 2.25:1 oversubscription ratio.
Table 9: Recommended cluster sizing
Nodes per rack | Nodes per pod | Pods per cluster | Nodes per cluster | Bandwidth oversubscription |
12 | 36 | 8 | 288 | 2.25:1 |
12 | 48 | 8 | 384 | 3 :1 |
12 | 36 | 10 | 360 | 3 :1 |
12 | 24 | 16 | 384 | 3 :1 |
Total cluster storage capacity is a function of the server platform and disk drives chosen, and scales with the number of Worker Nodes.
The amount of usable storage in a cluster also depends on the type of data durability that is used and the type of data compression used. The usable storage capacity can be calculated as:
This calculation is straightforward but depends on estimating the storage efficiency and compression ratio.
The Hadoop Distributed File System (HDFS) storage system supports two options for data durability: replication and erasure coding, and these options have different storage efficiencies.
When replication is used, HDFS creates multiple copies of data across nodes to guard against data loss. The number of replicas (replication factor) is configurable and can be changed on a file by file basis. The default replication factor is three, the value that is normally used for storage capacity estimates.
HDFS replication decreases the storage efficiency by the replication factor.
When erasure coding is used, data is divided into blocks, encoded with parity, and distributed across nodes. The details of the encoding are specified in an erasure coding policy. Erasure coding policies allow a tradeoff between data durability and storage efficiency. For example, a Reed-Solomon 6-3 policy has durability of 3 and a 67% storage efficiency, while a Reed-Solomon 3-2 policy has durability of 2 and a 60% storage efficiency.
Compression can also be used to reduce the storage required. Compression is optional and applies to individual files. HDFS supports multiple data compression codecs, and it is possible for each compressed file to use a different codec. The compression ratio that is achieved for a given file depends on both the dataset and codec that is used and is difficult to estimate. The best approach is to:
The hardware configurations for the Master Nodes support clusters in the petabyte storage range without changes.
CDP Data Center licensing for a cluster is based on:
The calculation uses a base price per node, including 16 CCUs and 48 TB. Storage and CCUs are aggregated across the cluster, and variable pricing applies to CCUs and storage above the base allocation for the cluster. As of the publication date of this document, one CCU equals one physical CPU core and 8 GB of RAM.
The recommended configurations in this architecture guide take licensing costs into consideration as part of their designs. See the Cloudera document, Platform pricing, for more details.
The architecture implements high availability (HA) at multiple levels through a combination of hardware redundancy and software support.
Hadoop redundancy | HDFS implements redundant storage for data resiliency through replication and erasure coding, and is aware of node and rack locality. |
Network redundancy | The production network can optionally use bonded connections to pairs of switches in each pod, and switch pairs at the aggregation level. This configuration provides increased bandwidth capacity, and allows operation at reduced capacity if a network port, network cable, or switch fails. Bonded networking is not normally used when using 25 GbE as the core fabric. For large clusters, Dell EMC recommends the use of Layer 3 aggregation, which provides network redundancy at the spine switch level. |
HDFS highly available NameNodes | The architecture implements high availability for the HDFS directory through a quorum mechanism that replicates critical NameNode data across multiple physical nodes. |
Resource manager high availability | The architecture supports high availability for the Hadoop YARN resource manager. Without resource manager HA, a Hadoop resource manager failure causes running jobs to fail. If there is a resource manager failure, jobs can continue running when resource manager HA is enabled. Note: Dell EMC recommends resource manager HA for production clusters. |
Database server high availability | The architecture supports high availability for the operational databases. The database server that is used for both the Cloudera Manager operational and metadata databases stores its data on a RAID 10 partition, providing redundancy if there is a drive failure. Note: Dell EMC's default installation uses a single PostgreSQL instance, so there is a single point of failure. Database server high availability can be implemented using: |