Extract Insights on a Scalable and Security-Enabled Data Platform from Cloudera
Download PDFMon, 29 Jan 2024 22:48:44 -0000
|Read Time: 0 minutes
Summary
This joint paper outlines the key hardware considerations when configuring a data platform based on the most recent Dell’s 16th Generation PowerEdge Server portfolio offerings.
Market positioning
Cloudera® Data Platform (CDP) Private Cloud is a scalable data platform that allows data to be managed across its life cycle—from ingestion to analysis—without leaving the data center. It consists of two products: Cloudera Private Cloud Base (the on-premises portion built on Dell PowerEdge™ servers[RAK1] [DD2] [DD3] ) and Cloudera Private Cloud Data Services. The Data Services provide containerized compute analytic applications that scale dynamically and can be upgraded independently. This platform simplifies managing the growing volume and variety of data in your enterprise, unleashing the business value of that data. CDP Private Cloud helps enhance business agility and flexibility by disaggregating compute and storage and supporting a container-based environment. The platform also includes secure user access and data governance features.
Key Considerations
- Scalability and Performance: The CDP Platform is built on Dell’s 16th Generation PowerEdge servers with Intel® 4th Generation Xeon processor architecture. It can accommodate growing enterprise data workloads and efficiently handle increasing demands for analytics and machine learning in a smaller footprint.
- Compatibility and Integration: Ensuring compatibility and seamless integration between CDP Private Cloud and the hardware components is essential for a successful deployment in a Cloud environment. Delivering faster time-to-market and minimizing the total cost of ownership are ensured with Intel architecture-based Dell PowerEdge servers that are well suited to work with the CDP Platform running on a private cloud
- Availability and Resilience: The reliability and resilience features of the 16th Generation PowerEdge servers, (such as redundant power supplies, hardware monitoring, and failover capabilities, so on), are critical for maintaining[RAK4] [RAK5] the reliability and availability of the CDP Platform.
Available Configurations
The new Dell PowerEdge HS5610 is a 1U, two-socket rack server purpose-built for Cloud Service Providers’ most popular IT applications, this also lends itself well for Hybrid Cloud Edge deployments. Vi This scalable server optimizes technology without the financial and operational burden of supporting extreme configurations. With tailored performance, I/O flexibility, and open ecosystem system management, you gain simplicity for large-scale, heterogeneous SaaS, PaaS, and IaaS data centers.
Some of the benefits include –
- Faster performance by using 4th generation Intel® Xeon® Scalable processors with up to 32 cores per socket
- Accelerated in-memory applications with up to 16 DDR5 RDIMMS with speeds up to 4800 MT/sec
- Designed to take up less space than traditional servers, which makes them a good option for data centers with limited space and for cloud service providers
- Designed to be cooled efficiently, which can help to prevent overheating and ensure the longevity of the servers at cloud and on-premises
- Power efficient, which can help to reduce the overall operating costs of a data center
- Configurations that can easily scale to meet changing demand, which can help to optimize the cost of a data center
- Long living instances for space and cost reductions
- Validated workloads that reduce data center costs and overhead
- Resilient Architecture for Zero Trust IT environment and operations
| Cloudera® Data Platform (CDP) Private Cloud Base Cluster |
| |||
| Edge Node (1 Node) + Master Nodes (Minimum of Three Nodes Required)
| Worker Nodes for Use with External Storage System (Minimum of Three Nodes Required) | Worker Nodes with Local All-Flash Storage (Minimum of Three Nodes Required) | Worker Nodes with Local HDDs (Minimum of Three Nodes Required) |
|
Functions | Edge node: Apache Hadoop® clients, NameNode, Resource Manager, Apache ZooKeeper | DataNode, NodeManager, CDP DC (YARN) workloads |
| ||
Platform | Dell PowerEdge HS5610 (1RU) Chassis with up to 10x 2.5" SAS/SATA/NVMe Direct Drives | Dell PowerEdge HS5610 (1RU) Chassis with up to 10x 2.5" SAS/SATA/NVMe Direct Drives | Dell PowerEdge HS5620 (2RU) Chassis with up to 16x 2.5" SAS/SATA and 8x 2.5” NVMe | Dell PowerEdge HS5620 (2RU) Chassis with up to 12x 3.5" Drives and 2 x 2.5” rear storage (NVMe) |
|
CPU | 2 x 4th Gen Intel® Xeon® Gold 6426Y processor | 2 x 4th Gen Intel® Xeon® Gold 6448Y processor
|
| ||
DRAM | 256 GB (16 x 16 GB DDR5-4800) | 512 GB (16 x 32 GB DDR5-4800) |
| ||
Boot Device | Dell EMC™ Boot Optimized Server Storage (BOSS-N1) with 2 x 480 GB M.2 NVMe SSDs (RAID 1) |
| |||
Storage Adapter | Dell PERC H755N NVMe RAID adapter | None | Dell HBA355i |
| |
Storage HDFS/Ozone | 2x (up to 4x) 3.84 TB SATA Read Intensive SSD 2.5in AG Drive, 1DWPD | Not Required. Use an external storage system instead | 8x (up to 16x) 3.84 TB SATA Read Intensive SSD 2.5in AG Drive, 1DWPD | 12x 4 TB (or larger) 7.2 K RPM NLSAS 12 Gbps 512n 3.5” hot plug HDD |
|
Storage Fast Cache | 1 x 1.6 TB or 3.2 TB Enterprise NVMe Mixed Use AG Drive U.2 Gen4 | 1 x 3.2 TB Enterprise NVMe Mixed Use AG Drive U.2 Gen4 |
| ||
Network Interface Controller | Intel Ethernet Network Adapter E810-XXVDA2 for OCP3 (dual-port 10/25 GbE) |
| |||
Additional NIC | None | Intel Ethernet Network Adapter E810-XXV (dual-port 10/25 GbE), or | None | None |
|
Note: For storage-only configuration (Hadoop Distributed File System/Ozone), customers can still choose traditional high-density storage nodes with high-capacity rotational HDDs based on the HS5610 platform, however, external storage systems like Dell PowerScale or ECS are recommended. Customers should be aware that using large capacity HDDs increases the time of background scans (bit-rot detection) and block report generation for HDFS and significantly increases recovery time after full node failure. Also, using nodes with more than 100 TB of storage is not recommended by Cloudera. Source: https://blog.cloudera.com/disk-and-datanode-size-in-hdfs/. For more information and specifications, contact a Dell representative.
| CDP Private Cloud Data Services (Red Hat® OpenShift® Kubernetes®)/Embedded Container Service (ECS) Cluster | ||||
| Container Services Administration Host | Master Nodes (Three Nodes Required) | Worker Nodes (10 Nodes or More) | ||
Functions | OpenShift administration services
| OpenShift services, Kubernetes services | Kubernetes operators, Cloudera® Data Platform (CDP) Private Cloud workload pods | ||
Platform | Dell PowerEdge HS5610 (1RU) Chassis with up to 10x 2.5" SAS/SATA/NVMe Direct Drives | ||||
CPU | 2 x 4th Gen Intel® Xeon® Gold 6426Y processor | 2 x 4th Gen Intel® Xeon® Gold 6448Y processor
| |||
DRAM | 128 GB (16 x 8 GB DDR5-4800) | Standard configuration: 512 GB (16 x 32 GB DDR5-4800) Large memory configuration: 1024 GB (16 x 64 GB DDR5-4800) | |||
Boot device | Dell EMC™ Boot Optimized Server Storage (BOSS-N1) with 2 x 480 GB M.2 NVMe SSDs (RAID 1) | ||||
Storage adapter | Not required for all-NVMe configuration. | ||||
Storage (NVMe) | 1 x 1.6 TB Enterprise NVMe Mixed Use AG Drive U.2 Gen4 | 1 x 3.2 TB Enterprise NVMe Mixed Use AG Drive U.2 Gen4 | 1 x 6.4 TB Enterprise NVMe Mixed Use AG Drive U.2 Gen4
| ||
NOTHING |
| Intel Ethernet Network Adapter E810-XXVDA2 for OCP3 (dual-port 10/25 GbE) |
| ||
Additional NIC | Intel Ethernet Network Adapter E810-XXV (dual-port 10/25 GbE) | ||||
Learn More
Contact your Dell account team for a customized quote on 1-877-289+-3355 or go to the Intel and Cloudera solutions page.
- For workloads requiring high network bandwidth, customers might use an Intel Ethernet Network Adapter E810-CQDA2 with PCIe (dual-port 100 GbE) and 100 GbE top-of-rack (ToR) switches.
- Additional NIC is recommended for connectivity to an external storage system using a dedicated storage network. we [repeat endnote 2]
[RAK1]Dell to confirm the legal name for this platform
[DD2]“Dell PowerEdge HS5610 cloud scale server” is the correct name.
[RAK4]Can add more based on dells feedback
[RAK5]Added benefit section in available configs