Modern big data analytics environments must operate at large scale with high performance and are usually implemented as scalable clusters with tightly coupled compute and storage. As these environments grow, operational and infrastructure complexity can limit the agility and flexibility that are required to support changing workload and storage demands.
One technique used to simplify these environments is to separate the compute and storage functions. This approach allows independent scaling and management of compute and storage but must be carefully designed to avoid performance bottlenecks.
This architecture describes an implementation of Cloudera Data Platform (CDP) Private Cloud Base 7.1.6 on Dell EMC infrastructure with independent compute and storage. Dell EMC PowerFlex is used to provide the compute environment and high-performance scalable local storage. Dell EMC PowerScale provides scalable network attached storage for large datasets. A high-performance Dell EMC PowerSwitch network fabric connect these systems.
Using PowerFlex for the compute infrastructure provides operational simplicity and flexibility for the CDP environment. The PowerFlex storage layer provides scalable, high-performance access for runtime data needed by CDP.
PowerScale is used as the primary storage for the CDP environment. The PowerScale HDFS protocol adapter is a Cloudera Certified storage system that provides, scalable, high-performance storage with the durability and storage efficiency of OneFS. The Hadoop Name Node functionality that is integrated into PowerScale simplifies the deployment and scaling of CDP since dedicated Name Node servers are not required.
Unlike tightly coupled compute and storage environments, this architecture also provides the flexibility to add additional workloads outside the CDP environment. The PowerFlex capabilities support the deployment of additional virtual machines on the compute cluster with local storage. The multiprotocol capabilities of PowerScale support access to the CDP data over protocols like NFS.
Note: This document may contain language from third-party content that is not under Dell's control and is not consistent with Dell's current guidelines for Dell's own content. When such third-party content is updated by the relevant third parties, this document will be revised accordingly.