Home > Workload Solutions > Oracle > White Papers > Oracle Big Data SQL on Dell EMC PowerFlex > Cloudera Hadoop version 6.2.1
Cloudera delivers an enterprise data platform that enables companies to build end-to-end data pipelines with true hybrid cloud features. Cloudera Data Platform can span from edge devices to public or private clouds with integrated security and governance to protect customers’ data. Cloudera has found that customers have spent many years investing in their big data assets and want to continue to build on that investment by moving toward a more modern architecture that helps leverage multiple form factors.
The latest offering from Cloudera for on-premises analytics is Cloudera Data Platform Cloud (CDP). CDP is a next-generation hybrid data platform with cloud-native, self-service analytic experiences bringing the speed, scale, and economics of a true cloud experience. CDP Private Cloud is deployed as two components—the data lake cluster (CDP-PVC Base) and the compute experiences for data warehousing and machine learning running on an OpenShift Kubernetes cluster. This architecture allows administrators to deploy independently scalable compute environments with on-demand provisioning and autoscaling. The use of a container-based architecture also allows organizations to easily manage multiple service versions concurrently while sharing a single instance of shared data, resource metadata, and security configuration across all the hosted compute experiences.
For this solution, we added a Cloudera Hadoop instance for loading the LINEITEM and SUPPLIER tables from the decision support benchmark into HDFS. These tables are the two largest tables in the decision support benchmark and are best positioned to show the value of Cloudera Hadoop integration into a virtualization scenario.
We hosted three Cloudera Hadoop nodes on three separate PowerFlex HCI nodes:
The three PowerFlex HCI nodes running the Cloudera Hadoop service shared the processors with the storage service. This HCI configuration worked well because the Cloudera Hadoop nodes had most of the processor capacity and memory available. All reads and writes were distributed across the PowerFlex nodes driving optimal performance.