Dell PowerEdge XR8000’s first MLPerf® Inference v4.0 Results – Power-efficient Edge AI Applications
Mon, 08 Jul 2024 21:23:03 -0000
|Read Time: 0 minutes
Artificial intelligence is swiftly becoming an essential component of the telecommunications industry. As such, it has become increasingly important to demonstrate the telecom servers’ capability to excel in AI inference tasks—with not just high performance but also efficient power usage. Recognizing this, the latest MLPerf® Inference v4.0 benchmarks have become a crucial standard for evaluating the AI capabilities of telecom servers. Among the submissions, Dell PowerEdge XR8620t server stands out as the first entry from the edge platform for the telecom server XR8000 series. This marks a significant milestone; this platform not only demonstrates exceptional AI inference performance but also showcases remarkable efficiency in power consumption.
In this blog, we delve into the results achieved by the Dell PowerEdge XR8620t server, breaking down its performance and power efficiency metrics. Through detailed analysis, we aim to illuminate how these achievements contribute to the broader goal of integrating AI seamlessly into telecommunications infrastructure, setting new benchmarks for the industry.
Dell PowerEdge XR8620t server
The PowerEdge XR8000 is Dell’s latest rugged Edge/Telco server offering that comes with chassis XR8000r. The chassis can be populated with XR8610t and XR8620t sleds. The server sleds are designed to run complex workloads using highly scalable memory, I/O, and network options.
The PowerEdge XR8620t system is a half-width 2U compute sled that supports:
● One 4th Generation Intel Xeon Scalable processor with up to 32 cores
● Eight DDR5 DIMM slots
● 2 x M.2 2280 or 22110 direct connect NVMe drives on dual M.2 NVMe direct riser module (non-RAID) or 2 x M.2 2280 BOSS-N1 with RAID 0/1
● 2 x M.2 2280 or 22110 on ROR-N1 (RAISER) with RAID 0/1
● Ambient Operating Temperature -5C to +55C (with some configurations supporting -20 to +65C)
Figure 1: Dell PowerEdge XR8620t
Figure 2: Dell PowerEdge XR8000 shown with both configurations XR8620t and XR8610t
MLPerf® v4.0 Test Setup
MLCommons™ is an open engineering organization which focuses on machine learning innovation and collaboration. It oversees MLPerf® which is an industry-standard benchmark suite that evaluates ML performance across diverse hardware and software. Dell PowerEdge XR8620t made its debut submission to the latest MLPerf® Inference v4.0, for both performance and power evaluations. This milestone shows Dell’s commitment to transparency and readiness in handling ML workloads. Below are key features of the NVIDIA L4 GPUs utilized in the test, followed by the detailed configuration of the XR8620t system under test.
NVIDIA L4 Tensor Core GPU
NVIDIA L4 Tensor Core GPU is a versatile accelerator ideal for machine learning inferencing, graphics, and video processing tasks both in the cloud and at the edge. It features a half-height (low profile), half-length, single-slot card form factor and is equipped with 24 GB of GDDR6 memory. The GPU connects via x16 PCIe Gen4 and operates at a maximum power of 72 W. Cooling is managed passively through a superior thermal design that requires system airflow. For submission, the system under test used a single L4 card.
Figure 3: L4 Technical Specifications
→ |
Figure 4: Installation of the GPU PCIe card in Dell PowerEdge XR8620t
Dell PowerEdge XR8620t Tested Configuration
Table 1: System under Test Configuration
Server Model | Dell PowerEdge XR8620t (1x L4, TensorRT) |
CPU | Intel(R) Xeon(R) Gold 6433N |
Memory | 128GB |
Storage | 2x 1.92 TB |
GPU | 1x NVIDIA L4 |
OS | Ubuntu 22.04.3 |
Software Stack | TensorRT 9.3.0, CUDA 12.2, cuDNN 8.8.0, Driver 535.154.05, DALI 1.28.0 |
MLPerf® Public ID | 4.0-0036 (Power submission) 4.0-0037 (Performance submission) |
For PowerEdge XR8620t configuration details XR8620_L4x1_TRT.json XR8620_L4x1_MaxQ_TRT_MaxQ.json
Beyond Telco, Ready for Edge Inferencing
Performance Highlights:
The Dell PowerEdge XR8620t performance results submitted to MLPerf® Inference v4.0 with NVIDIA L4 GPU showcase its strong potential for edge AI applications. Evaluated across a diverse set of standardized benchmarks, the server demonstrated impressive capabilities in handling various machine learning workloads. These benchmarks cover essential tasks such as image classification (Resnet), object detection (Retinanet), speech recognition (RNNT), and image segmentation (3D-UNET). All of which highlight the XR8620t’s versatility and showcase the platform’s excellent performance on the image and video market segments. The inclusion of video-to-text (Stable Diffusion-xl) performance data, further shows the server’s capability to handle emergent AI applications, expanding possibilities for edge deployments.
The table below presents the performance results of the XR8620t showing the system throughput and latency under real-world scenarios: Offline, SingleStream, and MultiStream.
Table 2: Dell PowerEdge XR8620t MLPerf® Inference v4.0 performance results
Benchmarks | 3d-unet-99 | 3d-unet-99.9 | resnet | retinanet | rnnt | stable-diffusion-xl | |
SingleStream | Latency (ms) | 1,842.59 | 1,842.59 | 0.34 | 4.90 | 19.34 | 5,299.91 |
MultiStream | Latency (ms) | N/A | N/A | 0.83 | 42.16 | N/A | N/A |
Offline | Samples/s | 1.06 | 1.06 | 12,097.30 | 220.50 | 3,875.63 | 0.19 |
Note: Table provided for reference only. The XR8620t results are available at MLCommons Inference v4.0 Edge Results with Public ID 4.0-0037
Verified MLPerf® v4.0 Inference Closed Submissions. Results verified by MLCommons™ Association.
The normalized chart on Figure 5 shows the performance comparison between the single socket XR8620t and the dual-socket Dell PowerEdge XR7620, a server purposely built for edge computing. Both systems were submitted to MLPerf® Inference v4.0 using a single NVIDIA L4 GPU. Despites its smaller and more compact design, XR8620t achieves performance metrics within a -16% to +8% margin of XR7620. This data makes XR8620t a viable and compelling alternative for edge computing applications with minimal performance trade-offs and power saving advantages.
Figure 5: Normalized performance comparison between Dell PowerEdge XR8620t with 1x NVIDIA L4 GPU and Dell PowerEdge XR7620 with 1x NVIDIA L4 GPU
Note: The results are available at MLCommons Inference v4.0 Edge Results with Public IDs: XR8620t (ID 4.0-0037) and XR7620 (ID 4.0-0035)
Verified MLPerf® v4.0 Inference Closed Submissions. Results verified by MLCommons™ Association
Power Efficiency at the Edge:
The raw power results of the Dell PowerEdge XR8620t with an NVIDIA L4 GPU are highlighted in the table below. The data demonstrates the server’s ability to handle different machine learning inference workloads efficiently. This capability is crucial for edge environments where efficiency directly impacts feasibility of the task and cost.
Table 3: Dell PowerEdge XR8620t MLPerf® Inference 4.0 power efficiency results
Benchmarks | resnet | retinanet | 3d-unet-99 | 3d-unet-99.9 | rnnt | ||
SingleStream | Efficiency (perf/watts) | 0.0038 | 0.0037 | 0.0072 | 0.0072 | 0.005 | |
MultiStream | Efficiency (perf/watts) | 0.0037 | 0.0038 | N/A | N/A | N/A | |
Offline | Efficiency (perf/watts) | 45.703 | 0.7173 | 0.0039 | 0.0039 | 13.361 |
Note: Table provided for reference only. The results are available at MLCommons Inference v4.0 Edge Results under the closed power division with Public IDs: XR8620t (ID 4.0-0036)
Verified MLPerf® v4.0 Inference Closed Submissions. Results verified by MLCommons™ Association
The comparison chart below shows the efficiency of the Dell PowerEdge XR8620t relative to the Dell PowerEdge XR5610, both systems having the same GPU count and type. In the chart, the blue line represents XR8620t’s efficiency calculated as performance divided by power consumption, then normalized. The efficiency of the XR5610 shown as a gray line, provides a baseline for comparison. XR8620t clearly outperforms XR5610 efficiency by up to 26% across the different benchmarks.
By achieving higher performance per watt, the Dell PowerEdge XR8620t supports more sustainable edge AI solutions and ensures reliable and cost-effective operations. This advantage translates to lower operational costs and a great solution to edge AI applications.
Figure 6: Normalized Power Efficiency comparison between Dell PowerEdge XR8620t MLPerf® Inference 4.0 with 1x NVIDIA L4 GPU and Dell PowerEdge XR5610 with 1x NVIDIA L4 GPU
Note: XR8620t results are available at MLCommons Inference v4.0 Edge Results with Public ID 4.0-0036 and the results for XR5610 results are available at MLCommons Inference v3.1 Edge Results with Public ID 3.1-0073, both under the closed power divisions.
Verified MLPerf® v4.0 Inference Closed Submissions. Results verified by MLCommons™ Association
Real-World Use Cases:
The integration of edge computing in the telecommunication sector opens new avenues for optimizing network performance and delivering innovative services. One interesting use case is in network optimization, where edge computing enables real-time analysis of network traffic patterns. Improving the network efficiency minimizes connectivity latency and therefore increases the quality of the service for end-users. Beyond Telco applications, the Dell PowerEdge XR8620t versatility extends to Edge AI. As an example, the XR8620t can be used in a retail environment where its Edge AI capabilities can be leveraged for customer behavior analytics, inventory management, and marketing though intelligent images and/or video-based workloads.
Similarly in manufacturing, the XR8620t can run AI workloads for predictive maintenance of equipment, anomaly detection in production processes, and optimization of supply chain operations. By harnessing the power of Edge AI on the Telco Dell PowerEdge server, industries can unlock new levels of efficiency, and innovation.
Example of Dell rugged server used for Edge AI – OpenBrew , OpenBrew pdf
Conclusion
Dell PowerEdge XR8620t’s great performance in MLPerf® Inference v4.0, along with its excellent efficiency, makes it a good choice for edge AI applications. Its rugged form factor, its robust configuration support, and advanced use of GPU capabilities ensures that the XR8000t can handle workloads efficiently, paving the way for future innovation in telecommunication and in edge computing.
References
MLPerf® name and logo are trademarks of MLCommons™ Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.