Dell PowerEdge R760 with 5th Generation Xeon CPU and Cloudera® Data Platform for AI-based Use Cases
Download PDFFri, 30 Aug 2024 19:29:46 -0000
|Read Time: 0 minutes
Dell PowerEdge R760 with 5th Generation Xeon CPU and Cloudera® Data Platform Offer 60% Performance Boost for AI-based Use Cases
Summary
Cloudera is a hybrid data platform designed for unmatched freedom to choose—any cloud, any analytics, any data. The solution platform consists of two products: Cloudera® Private Cloud Base (the on-premises portion deployed on Dell® PowerEdge™ servers) and Cloudera Private Cloud Data Services. These data services offer containerized compute or analytic applications that scale dynamically and can be upgraded independently. The platform simplifies the management of growing data volume and variety with enhanced business agility and flexibility by disaggregating compute and storage while supporting a container-based environment. It also includes secure user access and data governance features.
Figure 1. Cloudera Data Platform solution overview
The document outlines the recommended configurations for the CDP Private Cloud Base Cluster and the CDP Private Cloud Data Services Cluster.
Configurations for Cloudera Data Platform
The Dell PowerEdge R760 is a scalable solution capable of expansion while remaining affordable. Businesses seeking an affordable rack server that can scale to tackle enterprise-class workloads will benefit the most from this solution.
Table 1. CDP Private Cloud Base recommended configuration
Cloudera® Data Platform (CDP) Private Cloud Base Cluster | ||||
| Edge Node (1 Node) +1 Management Nodes (Minimum of three Management nodes recommended) | Worker Nodes for Use with External Storage System (Minimum of Three Nodes Required) | Worker Nodes with Local All-Flash Storage (Minimum of Three Nodes Required) | Worker Nodes with Local HDDs (Minimum of Three Nodes Required) |
Functions | Edge node: Hadoop® clients, Name Node, Resource Manager, ZooKeeper | Data Node, Node Manager, CDP DC (YARN) workloads | ||
Platform | Dell PowerEdge R760 | Dell PowerEdge R760 | ||
CPU | 2 x Intel® Xeon® Gold 6542Y (16 Cores at 2.8 GHz) processor or better | 2 x Intel® Xeon® Gold 6548Y+ processor (32 Cores at 2.5 GHz)
| ||
DRAM | 256 GB (16 x 16 GB DDR5-4800 MT/s) | 512 GB (16x32GB DDR5 5600 MT/s [5200 MT/s]) | ||
Restart Device | 2 x 480 GB SATA SSD | |||
Storage HDFS | 2x (up to 4x) 3.2 TB Enterprise NVMe Mixed Usage Gen4/Gen5 drive
| Not Required. Use an external storage system instead. | 4x (up to 8x) 3.2 TB Enterprise NVMe Mixed Usage Gen4/Gen5 drive | 12x (up to 16x) 3.84 TB SSD SATA Read Intensive 2.5in AG Drive, 1DWPD |
Storage Fast Cache(Yarn) | 1x 1.6 TB Enterprise NVMe Mixed Usage Gen4/Gen5 drive | 1x 3.2 TB Enterprise NVMe Mixed Usage Gen4/Gen5 drive | ||
Network Interface Controller | Intel Ethernet Network Controller E810-C for QSFP (dual-port 100 GbE)
|
Table 2. CDP Private Cloud Data Services recommended configuration
CDP Private Cloud Data Services (Red Hat® OpenShift® Kubernetes®)/Embedded Container Service (ECS) Cluster | |||
| 1 Container Services Administration Host +1 Bootstrap Node for OpenShift
1 Non HA Node for ECS | Management Nodes for OpenShift (Three Nodes Required)
3 HA Nodes for ECS | Worker Nodes ( Minimum 10 Nodes for Openshift or Minimum for nodes for ECS) |
Functions | OpenShift administration services | OpenShift services, Kubernetes services | Kubernetes operators, Cloudera® Data Platform (CDP) Private Cloud workload pods |
CPU | 2x Intel® Xeon® Gold 6542Y (16 Cores at 2.8 GHz) processor or better | 2x Intel® Xeon® Gold 6548Y+ 2 processor (32 Cores at 2.5 GHz) | |
DRAM | 128 GB (16x 8 GB DDR5-4800) | Standard configuration: 512 GB (16x 32 GB DDR5 5600 MT/s [5200 MT/s]) Large memory configuration: 1024 GB (16x 64 GB DDR5 5600 MT/s [5200 MT/s]) | |
Restart device | 2x 480 GB SATA SSD | ||
Storage adapter | Not required for all-NVMe configuration. | ||
Storage (NVMe) | 1x 1.6 TB Enterprise NVMe Mixed Usage Gen4/Gen5 drive | 1x 3.2 TB Enterprise NVMe Mixed Usage Gen4/Gen5 drive | 1x 6.4 TB Enterprise NVMe Mixed Usage Gen4/Gen5 drive
|
NIC | Intel Ethernet Network Controller E810-C for QSFP (dual-port 100 GbE) |
Configurations tested
Dell Technologies® evaluated the potential benefits of moving from the PowerEdge R650 servers to the newer PowerEdge R760 servers (as shown in Table 3). Two clusters were deployed to compare the performance of both server models. To ensure an apples-to-apples comparison, the software stack was kept the same on both generations of servers. Customers can expect more benefits when switching the software stack from older versions of CDP to newer versions that run on more recent versions of operating systems with JDK 11 or higher. Please refer to the Cloudera® Support Matrix for details - https://supportmatrix.cloudera.com/.
The two clusters ran a set of end-to-end data science pipelines adapted from an industry-standard benchmark. These use cases include some of the most widely used machine and deep learning algorithms running on distributed Spark. Each workflow has a different dataset characteristic, as shown in Table 4. The performance metric for the workflow is the time required to train the specific models (Training Time) and the time taken to run an inference using the model created during Training (Serving Time).
Figure 2. Overview of Solution Architecture for Configurations Tested
Table 3. CDP Private Cloud Data Services test configuration
PowerEdge R650 with 3rd Generation Intel® Xeon® Processor | PowerEdge R760 with 5th Generation Intel® Xeon® processors | |
Number of nodes | 1 Management + 3 Worker | 1 Management + 3 Worker |
System | PowerEdge R650 | PowerEdge R760 |
CPU | Intel(R) Xeon(R) Gold 6348 CPU @2.60GHz | INTEL(R) XEON(R) Gold 6548Y+ @2.50GHz |
Number of CPUs per node | 2 | 2 |
Cores per socket | 28 | 32 |
Base Frequency | 2.6Ghz | 2.5Ghz |
All-Core max Frequency | 3.5Ghz | 4.1Ghz |
Max Turbo Frequency | 3.4Ghz | 3.5Ghz |
Total cores | 56 | 64 |
Installed Memory per node | 1024 GB (16x64GB DDR4 2933 MT/s [2933 MT/s]) | 1024 GB (16x64GB 5200 MT/s [5200 MT/s]) |
NIC card | Intel Ethernet Controller E810-C for QSFP | Intel Ethernet Controller E810-C for QSFP |
Storage per node | 4x Dell Ent NVMe P5600 MU 3.2 TB for HDFS 1x 900 GB DELL BOSS VD | 4x Dell Ent NVMe P5600 MU 3.2 TB for HDFS 2x Dell Ent NVMe CM6 MU 3.2 TB |
Operating System | CentOS 7.9 | CentOS 7.9 |
Workload | AI Retail Use Cases | AI Retail Use Cases |
Hadoop Distribution | Cloudera Data Platform Private Cloud Base 7.1.8 | Cloudera Data Platform Private Cloud Base 7.1.8 |
Java | Cloudera OpenJDK 1.8 | Cloudera OpenJDK 1.8 |
Other software | Spark v2.4 Python v3.7 Horovod v0.25 TensorFlow v2.9.1 | Spark v2.4 Python v3.7 Horovod v0.25 TensorFlow v2.9.1 |
Workload Dataset Size (Total) | 1 Terabyte | 1 Terabyte |
Table 4. AI use case description
Use Case | Description | Class | Data | Algorithm |
Customer Segmentation | Find customer segments based on their behavior. Clustering/segmentation of customers based on return behavior (return frequency, return/order ratio,…) and buying behavior (frequency of purchases, recency of purchases,.. ) | Clustering | Number | K-means |
Call Transcription | Accurately transcript audio conversations of customers to text. | Classification | Audio | RNN |
Sales Forecasting | Forecast the weekly sales for each store department and each store of a retail chain with multiple stores based on a limited sales data history. | Regression | Number | Holt-Winters |
Spam Detection | Find comments, reviews, or descriptions of items in a retail business with spam content. The problem to be solved is to identify those reviews that are spam. | Classification | Text | Naïve Bayes |
Hardware Failure | Predict imminent hardware failure, given existing logs of hardware events | Classification | Number | Support Vector Machine |
Product Recommendation | Improve cross-selling by giving "next-product-to-buy" recommendations. Based on previously bought products, recommend products the customer might also be interested in. Those recommendations are found by comparing customers (by their products) and/ or products (by their customers) | Recommendation | Number | collaborative Filtering |
Classification of trips | Classifying categories and trip types using data from existing customer shopping trips. | Classification | Number | Gradient Boosted Trees |
Facial Recognition | Accurately recognize customer facial images. | Classification | Image | CNN + Logistic Regression |
Fraud Detection | Detect if a given financial transaction is fraudulent or not. | Classification | Number | Logistic Regression |
Figure 2 shows the Dell PowerEdge Platform with 5th Gen Intel® Xeon® Scalable processors delivering up to a 60% performance boost compared to the Dell PowerEdge Server with 3rd Gen Intel® Xeon® Scalable processors. The various use cases show different gains according to their specific implementations, intrinsic runtime characteristics, and compute requirements that stress different system components at different stages of the data processing pipeline. Use cases like Customer Segmentation—using K-means from the Classical Machine Learning category—and Call Transcription—from the Deep Learning category—were able to take full advantage of the newer processor features from memory modules with higher bandwidth and performance.
Figure 3. AI use case training performance on Dell® PowerEdge R760
Figure 4. AI use case serving performance on Dell® PowerEdge R760
Conclusion
The Cloudera® Data Platform when deployed on Dell® PowerEdge R760 enables faster, more efficient, and scalable machine learning workloads. There is a significant performance gain of up to 55% when training a 1-terabyte dataset using some of the most popular AI algorithms. A remarkable performance gain of up to 60% is also observed when inferring insights from a 1-terabyte dataset deployed on a distributed Cloudera® stack.
Running Cloudera® Data Platform on the latest Dell® PowerEdge servers powered by Fifth Gen Intel® Xeon® Scalable processors can increase data center performance and accommodate growing enterprise machine learning workloads, delivering faster time-to-market while reducing and minimizing total cost of ownership. The separation of storage from compute allows either one to be scaled independently to match the needs of the users, data growth, or usage models – a key advantage for the Cloudera® Data Platform.
You can expect additional improvements when modernizing your data management echo system. The Cloudera platform has also improved over time to offer the following benefits through CDP Private Cloud Data Services:
- Simplified multitenancy and isolation: The containerized deployment of applications in CDP Private Cloud ensures that each application is sufficiently isolated and can run independently from others on the same Kubernetes infrastructure to eliminate resource contention.
- Simplified deployment of applications: CDP Private Cloud ensures a faster deployment of applications with a shared Data Lake compared to monolithic clusters where separate copies of security and governance data would be required for each separate application.
- Better utilization of infrastructure: Similar to CDP Public Cloud, CDP Private Cloud enables you to provision resources in real time when deploying applications. In addition, the ability to scale or suspend applications on a need basis in CDP Private Cloud ensures that your on-premises infrastructure is used optimally.
The modern Cloudera Data Platform allows you to turn any data into fuel for a digital transformation engine through modern data architectures for petabyte scale data meshes, data fabrics, and the open data lakehouse powered by Apache Iceberg.
References
- CDP Private Cloud Data Services: Documentation | CDP Private Cloud (cloudera.com)
- Requirements | CDP Private Cloud (cloudera.com)
- Cloudera Support Matrix
Author: Kacper Ufa (Intel), Amandeep Raina (Intel), Rodrigo D Escobar (Intel), Vijay Bandari(Intel), Esther Baldwin (Intel), Manya Rastogi (Dell), Tarun Dave(Cloudera)