This design implements high availability (HA) at multiple levels through a combination of hardware redundancy and software support.
- Hadoop storage resiliency
- CDP HDFS implements data resiliency through replication and erasure coding. The HDFS implementation understands node and rack locality, and distributes data to minimize the impact of a node- or rack-level failure.
- HDFS highly available NameNodes
- This design implements high availability for the HDFS directory through a quorum mechanism that replicates critical NameNode data across multiple physical nodes.
- Network resiliency
- The production network can optionally use bonded connections to pairs of switches in each pod. Pod-level switches should use redundant connections to switch pairs at the aggregation level. This configuration provides increased bandwidth capacity and allows operation at reduced capacity if a network port, network cable, or switch fails.
- Resource manager high availability
This design supports high availability for the
Hadoop YARN resource manager. Without resource manager HA, a
Hadoop resource manager failure causes running jobs to fail. If there is a resource manager failure, jobs can continue running when resource manager HA is enabled.
Note: Dell Technologies recommends resource manager HA for production clusters.
- Database server high availability
This design supports high availability for the operational databases. The database server that is used for the
Cloudera Manager operational and metadata databases stores its data on a RAID 6 or 10 partition. If there is a drive failure, that partition provides storage reliability. Database server high availability can be implemented using:
Note: Dell Technologies recommends implementing HA for the database server.
- One or more additional PostgreSQL instances on other nodes in the cluster
- An external, high availability database server