Insights on Cloudera Data Platform on VMware Cloud Foundation Powered by VMware vSAN
Download PDFThu, 05 Oct 2023 19:34:38 -0000
|Read Time: 0 minutes
Summary
This joint paper outlines a brief discussion on the key hardware considerations when configuring a successful deployment and recommends configurations based on 15th Generation PowerEdge Server.
Market positioning
VMware Cloud Foundation is built on VMware’s leading hyperconverged architecture, VMware vSAN, with all-flash performance and enterprise-class storage services including deduplication, compression, and erasure coding. vSAN implements hyperconverged storage architecture by delivering an elastic storage and simplifying the storage management.
VMware vSAN is the market leader in hyperconverged Infrastructure (HCI), enabling low cost and high-performance next-generation HCI solutions. It converges traditional IT infrastructure silos onto industry-standard servers, virtualizes physical infrastructure to help customers easily evolve their infrastructure without risk, improves TCO over traditional resource silos, and scales to tomorrow with support for new hardware, applications, and cloud strategies.
Cloudera Data Platorm (CDP) Private Cloud Base supports a variety of hybrid solutions where compute tasks are separated from data storage and where data can be accessed from remote clusters, including workloads created using CDP Private Cloud Experiences. This hybrid approach provides a foundation for containerized applications by managing storage, table schema, authentication, authorization, and governance.
Key Considerations
- Often, enterprises have at least a development CDP cluster, a preproduction staging CDP cluster, and a production cluster. With virtualization, there is the flexibility to share the hardware for these Hadoop clusters. The CDP version for the development cluster is likely more current than that of the others because developers like to work with the newer versions. Dedicating a set of hardware to one version of a Hadoop vendor’s product does not make the best use of resources.
- Co-locating CDP VMs on host servers with VMs supporting different workloads is also possible, particularly for situations that are not performance critical. Doing this can balance the use of the system. This often enables better overall utilization by consolidating applications that either use different kinds of hardware resources or use the hardware resources at different times of the day or night.
- Efficiency: VMware enables easy and efficient deployment of CDP on an existing virtual infrastructure as well as consolidation of otherwise dedicated CDP cluster hardware into a data center or cloud environment.
- Availability and fault tolerance: vSphere features such as VMware vSphere High Availability (vSphere HA) and VMware vSphere Fault Tolerance (vSphere FT) can protect the CDP components from server failure and improve availability. Resource management tools such as VMware vSphere vMotion can provide availability during planned server downtime and maintenance windows.
Available Configurations
| Cloudera Data Platform on VMware Cloud Foundation (VCF) with vSAN |
| ||
| VCF Management Domain 4 nodes required
| VCF Workload Domain for Cloudera Data Platform Base
4 (minimum) up to 64 nodes per workload domain Up to 15 workload domains (including management domain)
|
| |
Platform | PowerEdge R650 supporting 10 NVMe drives (direct), or VxRail E660N |
| ||
CPU | 2x Intel® Xeon® Gold 5318Y processor (2.1GHz, 24 cores) | 2x Intel Xeon Gold 6348 processor (2.6GHz, 28 cores 4 GHz)
|
| |
DRAM | 256GB (16x 16GB DDR4-3200) or more | 512 GB (16 x 32 GB DDR4-3200) or more |
| |
Boot Device | Dell BOSS-S2 with 2x 240GB or 2x 480GB M.2 SATA SSD (RAID1) |
| ||
Cache tier Drives | 2x 400GB Intel Optane P5800X (PCIe Gen4) |
| ||
Capacity tier Drives (1) | 6x (up to 8x) 1.92TB Enterprise NVMe Read Intensive AG Drive U.2 Gen4 | 8x 1.92TB or 3.84TB Enterprise NVMe Read Intensive AG Drive U.2 Gen4 |
| |
Network Interface Controller | Intel E810-XXVDA2 for OCP3 (dual-port 25Gb) | Intel E810-XXVDA2 for OCP3 (dual-port 25Gb), or Intel E810-CQDA2 PCIe (dual-port 100Gb) |
|
Note: For more than 7 workload domains, each node needs a minimum of 512GB DRAM (16x 32GB) and more capacity (use 3.84TB drives instead of 1.92TB).
This solution can be deployed on either Dell PowerEdge based vSAN ReadyNodes or VxRail appliances.
Solution adopted from https://core.vmware.com/resource/cloudera-data-platform-vmware-cloud-foundation-powered-vmware-vsan.
For more information and specifications, contact a Dell representative. Alternative storage configurations can be considered.
Authors: Todd Mottershead (Dell), Seamus Jones (Dell), Esther Baldwin (Intel), Krzysztof Cieplucha (Intel), Teck Joo (Intel), Amandeep Raina (Intel), and Patryk Wolsza (Intel)