Home > Workload Solutions > Data Analytics > Guides > Design Guide—Data Management with Cloudera Data Platform on Intel-powered Dell EMC Infrastructure > Architecture concepts
Cloudera CDP Private Cloud Base provides data management, enterprise analytics, and management tools for big data. The data management services include HDFS file storage and Ozone object storage. The Cloudera Runtime provides the analytics services, which include components like Hive, HBase, MapReduce, and Spark. The management tools include:
Successful deployment and operation of Cloudera CDP Private Cloud Base depends on a well-designed infrastructure architecture. The architecture must provide high performance, scalability, reliability, and manageability.
Node architecture depicts the nodes and services in a CDP Private Cloud Base deployment.
The cluster infrastructure consists of multiple physical server nodes. Each node has a physical configuration that is specialized for its role in the cluster. These nodes are further specialized through the software services that are assigned to them. The table below defines the various cluster nodes and their physical configuration.
Node | Definition | Physical configuration |
Master node | This node runs all the services that are required to manage the cluster storage and compute services. | Infrastructure node |
Utility nodes | These nodes run Cloudera Manager and other supporting cluster services, such as Cloudera Management Services and supporting databases. | Infrastructure node |
Gateway node | This node contains all client-facing configurations and services, including gateway configurations. | Infrastructure node |
Worker node | This node runs all the services that are required to store blocks of data on the local hard drives, and run processing tasks against that data. | Worker node |
A high-performance network fabric connects the cluster nodes in a Cluster Data network. An additional Edge network provides an interface between the cluster and external systems and applications. See Network architecture for a detailed description of the network design.
The minimum supported configuration is eight cluster nodes, which include three Master nodes, one Utility node, one Gateway node, and three Worker nodes. Dell Technologies recommends a ten-node cluster with five Worker nodes as a starting point.
The table below describes recommended host role assignments in a medium-sized high availability deployment.
Node | Service |
Master node 1 | NameNode, JournalNode, FailoverController, YARN ResourceManager, ZooKeeper, JobHistory Server, Spark History Server, Kudu master |
Master node 2 | NameNode, JournalNode, FailoverController, YARN ResourceManager, ZooKeeper, Kudu master |
Master node 3 | JournalNode, ZooKeeper, Kudu master (All require an odd number of masters for high availability.) |
Utility node 1 | Cloudera Manager, Cloudera Management Service, Impala Catalog Server, Impala StateStore, Oozie, Apache Ranger |
Utility node 2 | Hive Metastore, relational database server (such as PostgreSQL) |
Gateway nodes | Hue, HiveServer2, Gateway configuration |
Worker nodes | DataNode, NodeManager, Impala Daemon (impalad), Kudu tablet server |
These recommendations for role assignments are intended as a starting point. The role assignments may differ depending on the cluster size and the services that are used. See Runtime Cluster Hosts and Role Assignments in the CDP Private Cloud Base documentation for more details.
An alternative cluster architecture can be used that uses PowerScale OneFS HDFS protocol for storage. PowerScale nodes do not support the Ozone storage protocol. The figure below depicts the nodes and services in this deployment scenario.
Compute and storage can be scaled independently using this alternative architecture. The PowerScale storage nodes provide the HDFS NameNode and DataNode services instead of the services being assigned to the Master nodes and Worker nodes. The Worker nodes only include enough storage for runtime operations like shuffle-sort spill files and cache.
This alternative architecture reduces the HDFS bandwidth requirements for the Cluster Data network. PowerScale OneFS implements data durability internally and uses a private back-end network for internal operations. A single copy of the data is transferred to the PowerScale storage nodes when Worker nodes write to HDFS. No replication traffic occurs on the Cluster Data network. Also, HDFS recovery traffic for failed drives or nodes does not occur on the Cluster Data network.
This design implements HA at multiple levels through a combination of hardware redundancy and software support.
Cluster sizing and scaling are two different but related considerations. Sizing is concerned with ensuring the cluster meets the workload requirements for storage and processing throughput. Scaling is concerned with growth of the cluster over time as capacity needs increase. Since Cloudera CDP Private Cloud Base is a parallel scale-out system, some sizing requirements can be addressed through scaling while others must be addressed through node level sizing.
Sizing and scaling of a CDP Private Cloud Base cluster are complex topics that require knowledge of the workloads. This section highlights the main considerations that are involved but does not provide detailed recommendations. Your Dell Technologies or authorized partner sales representative can help with detailed sizing calculations.
There are many parameters that are involved in cluster sizing. The primary parameters are:
The architecture is organized into three units for scaling as the CDP Private Cloud Base environment grows. From smallest to largest, they are:
Each unit has specific characteristics and sizing considerations that are documented in this guide. The architecture design enables you to scale the CDP Private Cloud Base environment by adding additional capacity as needed, without replacing any existing components.
A rack is the smallest size designation for a CDP environment. A rack consists of the power, network cabling, and the data and management switches required to support a group of Worker nodes.
A rack is a physical unit. Physical constraints define its capacity, including available space, power, cooling, and floor loading. A rack should use its own power within the data center, independent from other racks, and should be treated as a fault zone. If a rack level failure occurs in a multiple-rack pod or cluster, the cluster continues to function with reduced capacity.
This architecture uses 12 nodes as the nominal size of a rack, but higher or lower densities are possible. Typically, a rack contains about 12 nodes using scale-out servers like the PowerEdge R660 and PowerEdge R760. Environmental constraints like power and cooling can determine the rack density. The node density of a rack does not affect overall cluster scaling and sizing, but it does affect fault zones in the cluster.
A pod is the set of nodes that are connected to the first level of network switches in the cluster. It consists of one or more racks. A pod can include a smaller number of nodes initially, and expand over time to a maximum that the available switch ports define.
A pod is a second-level fault zone above the rack level. If a pod level failure occurs in a multiple pod cluster, the cluster continues to function with reduced capacity. A pod can support enough server nodes and network switches for a minimum commercial scale installation.
In this architecture, a pod supports up to 48 nodes (nominally four racks). The recommended pod size is 36 nodes. The size of a pod can vary from this baseline recommendation. Changing the pod size affects the network bandwidth oversubscription at the pod level, the size of the fault zones, and the maximum cluster size.
A cluster is a single CDP environment that is attached to a pair of network switches providing an aggregation layer for the entire cluster.
A cluster can range in size from a single pod in a single rack to many pods in multiple racks. A single pod cluster is a special case and can function without an aggregation layer. This scenario is typical for smaller clusters before additional pods are added.
At the cluster level, pod-to-pod (or uplink) network bandwidth factors into overall cluster performance. The maximum oversubscription ratio is of uplink bandwidth 4:1. For most clusters, the oversubscription ratio should be kept close to 2:1. The number of nodes in each pod defines the oversubscription ratio.
The minimum supported configuration is eight nodes:
Although a minimum of one Gateway node is required per cluster, larger clusters and clusters with high ingest volumes or rates may require additional Gateway nodes. Cloudera recommends a baseline of one Gateway node for every 20 Worker nodes.
The recommended cluster node counts are 36 nodes per pod, which results in a 2.25:1 oversubscription ratio and a maximum cluster size of 1152 nodes. Recommended cluster sizing below shows the recommended numbers of nodes per pod and pods per cluster. Those numbers are based on the design that is described in Network architecture. The table also shows some alternatives for cluster sizing with different bandwidth oversubscription ratios.
Nodes per rack | Nodes per pod | Pods per cluster | Nodes per cluster | Bandwidth oversubscription |
12 | 36 | 32 | 1152 | 2.25:1 |
12 | 48 | 32 | 1536 | 3 :1 |
12 | 24 | 32 | 768 | 1.5 :1 |
12 | 32 | 32 | 1024 | 2 :1 |
Total cluster storage capacity is a function of the server platform and disk drives chosen, and scales with the number of Worker nodes. Cloudera supports a maximum of 100 TB per node for HDFS storage.
The amount of usable storage in a cluster also depends on the types of data durability and data compression that are used. The usable storage capacity can be calculated as:
number of nodes x raw storage per node x storage efficiency x compression ratio
This calculation is straightforward but depends on estimating the storage efficiency and compression ratio.
The HDFS storage system supports two options for data durability:
These options have different storage efficiencies. They also have different performance characteristics, and erasure coding is not supported for all services. For more information, see the Cloudera document, Erasure coding overview.
When replication is used, HDFS creates multiple copies of data across nodes to guard against data loss. The number of replicas (replication factor) is configurable and can be changed on a file-by-file basis. The default replication factor is three, which is the value that is typically used for storage capacity estimates.
HDFS replication decreases the storage efficiency by the replication factor.
When erasure coding is used, data is divided into blocks, encoded with parity, and distributed across nodes. The details of the encoding are specified in an erasure coding policy. Erasure coding policies allow a tradeoff between data durability and storage efficiency. For example, a Reed-Solomon 6-3 policy has durability of 3 and a 67% storage efficiency. A Reed-Solomon 3-2 policy has durability of 2 and a 60% storage efficiency.
HDFS replication and erasure coding can be used simultaneously in a cluster. Erasure coding policies are specified at the HDFS directory level, while replication is specified on a file-by-file basis.
Compression can also be used to reduce the storage required. Compression is optional and applies to individual files. HDFS supports multiple data compression codecs, and it is possible for each compressed file to use a different codec. The compression ratio that is achieved for a given file depends on both the dataset and codec that is used, and is difficult to estimate. The best approach is to:
The recommended configurations for worker nodes can support HDFS only clusters, or clusters running both HDFS and Ozone storage systems. These configurations are intended to be used during a transition from HDFS to Ozone. The storage in the recommended configurations can be divided between Ozone and HDFS on a drive-by-drive basis. The drives can also be partitioned so each drive supports both storage systems. Partitioning the drives and using logical volumes makes it easier to change the proportion of storage allocated to HDFS and Ozone over time compared to dedicated drives.
Much higher storage density is possible using dedicated Ozone storage nodes. See Ozone hardware requirements for Cloudera's recommendations on role assignment and node sizing for mixed HDFS - Ozone or dedicated Ozone clusters. Ozone architecture provides more detail on the internal implementation of Ozone. Contact your Dell Technologies sales representative for additional assistance sizing and configuring dedicated Ozone storage clusters.
Worker node configurations can be optimized for specific workloads.
The recommended configurations are designed to provide a balance of disk and network throughput for general-purpose workloads.
Changes to processor and memory size can be made to handle specific workloads. The impact on the overall cluster capacity should also be considered when changing the configurations. See Worker nodes for more details.
The hardware configurations for the Infrastructure nodes support clusters in the petabyte storage range without changes. You can add more Infrastructure nodes to distribute services across multiple nodes for increased capacity. Adding Infrastructure nodes may be necessary for operational databases or to implement dedicated servers for services like ZooKeeper and Ranger. See Infrastructure nodes for more details about specific server configuration. Use the Cloudera documentation for Runtime Cluster Hosts and Role Assignments to determine the number of utility nodes required and their roles.
If PowerScale is used for the primary HDFS storage, the sizing and scaling of the cluster is different from implementations using servers with direct attached storage.
The primary difference is that scaling of compute and storage is performed independently by adding either PowerEdge Worker nodes or PowerScale storage nodes. This difference simplifies changing the balance of compute and storage over time. Dell Technologies recommends using the PowerScale H7000. A ratio of five Worker nodes to one PowerScale H7000 storage chassis is a good starting point.
NameNode sizing is also different. When you use PowerScale for primary HDFS storage, the NameNode services are integrated into the PowerScale nodes. Separate NameNode infrastructure servers are not required to be sized and scaled, and the NameNode services are scaled as PowerScale storage nodes are added.
Network bandwidth utilization is different with PowerScale HDFS storage. PowerScale’s data durability replaces HDFS replication and erasure coding when using PowerScale. OneFS implements data durability internally and uses a dedicated backend network. A uniform amount of network bandwidth is used for both reads and writes with PowerScale HDFS because HDFS write replication does not consume additional network bandwidth. This option simplifies cluster throughput calculations and increases performance.
Storage sizing calculation is different since PowerScale determines the translation from raw storage capacity to usable HDFS capacity, rather than HDFS replication and erasure coding.
Some local storage is required on the Worker nodes for intermediate storage when using PowerScale HDFS. This storage is used for MapReduce spill files, Spark cache, and other temporary files. The total intermediate storage across the cluster should be approximately 20% of the total PowerScale HDFS storage. The recommended PowerScale Worker node configurations include this storage. See PowerScale Worker nodes for more information.