Extracting Insights on a Scalable and Security-Enabled Data Platform from Cloudera
Download PDFFri, 14 Jul 2023 19:48:55 -0000
|Read Time: 0 minutes
Summary
This joint paper outlines a brief discussion on the key hardware considerations when configuring a successful deployment and recommends configurations based on the most recent PowerEdge Server portfolio offerings.
Market positioning
Cloudera Data Platform (CDP) Private Cloud is a scalable data platform that allows data to be managed across its lifecycle—from ingestion to analysis—without leaving the data center. It comprises two products: Cloudera Private Cloud Base (the on-premises portion built on Dell PowerEdge servers) and Cloudera Private Cloud Data Services. The Data Services provide containerized compute analytics applications that scale dynamically and can be upgraded independently. This platform simplifies managing the growing volume and variety of data in your enterprise, and unleashes the business value of that data. By disaggregating compute and storage, and supporting a container based environment, CDP Private cloud helps enhance business agility and flexibility. The platform also includes secure user access and data governance features.
Key considerations
- Data throughput - CDP Private Cloud on Dell PowerEdge servers is built on high-performing Intel architecture. Intel® Ethernet network controllers, adapters, and accessories enable agility in the data center and support high throughput. Unlike many other point solutions, CDP Private Cloud is an end-to-end platform for data, from collecting and engineering to reporting and using AI capabilities.
- Balanced system configuration - CDP Private Cloud can handle multiple varying workloads, including analytics and machine learning (ML). Its capabilities are supported by generation-over-generation improvements in underlying Intel technologies that offer more cores and higher memory capacity.
- Data latency - As data grows and needs to be accessed across the cluster, data-access response times are critical, especially for real-time analytics applications.
Available configurations
Table 1. Cloudera Data Platform (CDP) Private Cloud Base Cluster
Note: For a storage-only configuration (HDFS/Ozone), customers can still choose traditional high-density storage nodes with high-capacity rotational HDDs based on the PowerEdge R740xd2 platform, although external storage systems, such as Dell PowerScale or Dell ECS, are recommended. Customers should be aware that using large capacity HDDs increases the time of background scans (bit-rot detection) and block report generation for HDFS. It also significantly increases recovery time after a full node failure. Also, using nodes with more than 100 TB of storage is not recommended by Cloudera. Source: https://blog.cloudera.com/disk-and-datanode-size-in-hdfs/. For more information and specifications, contact a Dell representative.
Table 2. CDP Private Cloud Data Services (Red Hat OpenShift Kubernetes)/Embedded Container Service (ECS) Cluster
Learn more
Contact your Dell Technologies or Intel account team for a customized quote 1-877-289-3355.
Note: This document may contain language from third-party content that is not under Dell Technologies’ control and is not consistent with current guidelines for Dell Technologies’ own content. When such third-party content is updated by the relevant third parties, this document will be revised accordingly.