MLPerf™ Inference v4.0 Performance on Dell PowerEdge R760 with NVIDIA L40S GPUs
Download PDFFri, 17 May 2024 16:25:45 -0000
|Read Time: 0 minutes
Summary
Artificial intelligence is rapidly transforming a wide range of industries with new applications emerging every day. As this technology becomes more pervasive, the right infrastructure is necessary to support its growth.
This Direct from Development (DfD) tech note describes the new capabilities you can expect from the PowerEdge R760, coupled with NVIDIA L40S GPU. This document covers the product features, MLPerf benchmark, and test configuration results to help determine the Artificial Intelligence use cases best suited for enterprises looking to invest in this mainstream rack server.
Market positioning
Organizations in multiple industries are adopting server accelerators to outpace the competition — honing product and service offerings with data-gleaned insights, enhancing productivity with better application performance, optimizing operations with fast and powerful analytics, and shortening time to market by doing it all faster than ever before. Dell Technologies offers a choice of server accelerators in Dell PowerEdge servers, so you can turbo-charge your applications.
PowerEdge R760 Rack Server
The Dell PowerEdge R760 is Dell’s latest two-socket rack server that is designed to run complex workloads using highly scalable memory, I/O, and network options. Gain the performance you need with this full-featured enterprise server, designed to optimize even the most demanding workloads, such as Artificial Intelligence and Machine Learning.
It is powered by up to 2 x 4th Gen Intel® Xeon® Scalable or Intel® Xeon® Max Processors with up to 56 cores. It can also support up to 2 x 5th Gen Intel® Xeon® Scalable Processors with up to 64 cores.
Figure 1. Dell PowerEdge R760 server
These R760 servers can support up to two double wide 350 W, or six single wide 75 W accelerators. For the purpose of this testing with MLPerf 4.0, we have used NVIDIA’s latest L40S GPUs, which are built to power the next generation of data center workloads—from generative AI and large language model (LLM) inference and training, to 3D graphics, rendering, and video.
Figure 2. Inside the system with full length risers and GPU
NVIDIA L40S: Ada Lovelace GPU architecture
NVIDIA Ada architecture GPUs deliver outstanding performance for graphics, AI, and compute workloads with exceptional architectural and power efficiency. The Ada RT Core has been enhanced to deliver 2x faster ray-triangle intersection testing and includes two important new hardware units. It supports Shader Execution Reordering (SER) which dynamically organizes and reorders shading workloads to improve RT shading efficiency. It also provides Optical Flow Accelerator and AI frame generation that boosts DLSS 3’s frame rates up to 2x over the previous DLSS 2.0
Figure 3. NVIDIA L40S GPU
Table 1. L40S GPU Details
Model | NVIDIA L40S |
Form factor | PCIe Gen4 |
GPU architecture | Ada Lovelace |
CUDA cores | 18176 |
Memory size | 48 GB |
Memory type | GDDR6 |
Base clock | 1110 MHz |
Boost clock | 2520 MHz |
Memory clock | 2250 MHz |
MIG support | No |
Peak memory bandwidth | 864 GB/s |
Total board power | 350 W |
NVIDIA L40S specifications
- Fourth-Generation Tensor Cores: Deliver up to 4X higher inference performance over the previous generation (FP8).
- Advanced Video and Vision AI Acceleration: Can host up to 3X more video streams concurrently than the previous generation.
- Third-Generation RT Cores: Deliver up to 2X the ray-tracing performance over the previous generation.
MLPerf Benchmark
The primary output of MLCommons centers around its jointly-developed, open-source benchmarking suite: MLPerf™. MLPerf provides benchmarking suites that include both the “training” and “inference” aspects of ML. (For more about those topics, see the section Appendix - MLPerf workloads and scenarios.) MLPerf benchmarking suites offer multiple processing scenarios in Image classification, Object detection, Speech-to-text, and Natural language processing. The MLPerf benchmarking tool is free to use for both vendors and end-users, and members and non-members alike. MLCommons also hosts a repository where vendors (primarily) can post “reviewed” results that have been submitted for formal review by MLCommons. These are available for reference by the general public. For more information, see MLPerf Inference: Datacenter Benchmark Suite Results.
Test Configuration
For our testing, we used the following PowerEdge and system configurations:
Table 2. Dell PowerEdge Server - hardware configuration
System Name | PowerEdge R760 |
Status | Available |
System Type | Data Center |
Number of Nodes | 1 |
Host Processor Model | Intel Xeon Platinum 8580 |
Host Processors per Node | 2 |
Host Memory Capacity | 16x 96GB 5600 MT/s |
Host Storage Capacity | 6TB, NVME |
Accelerator Model Name | L40S NVIDIA |
Accelerator Per Node | 2 |
Accelerator Memory Configuration | 48GB, GDDR6 |
Table 3. Dell PowerEdge Server - software configuration
OS | Ubuntu 20.04.6 |
Software Stack | TensorRT 9.3.0, CUDA 12.3, cuDNN 8.9.6, Driver 545.23.08, DALI 1.28.0 |
Host Memory Configuration | 16x 96GB 5600 MT/s |
Framework | TensorRT 9.3.0, CUDA 12.3 |
Results
MLPerf v4.0 benchmark results are based on the Dell R760 server with two NVIDIA L40 GPUs and optimized software stacks. In this section, we show the performance observed in various scenarios.
With increasing demand for healthcare facilities, providers are turning towards artificial intelligence for easier and faster data management. With higher throughput for medical imaging data, scalable and affordable options can be made possible.
Figure 4. Medical image segmentation model
Rack servers continue to provide applications such as web hosting. AI-powered Natural Language Processing algorithms can help analyze user queries and provide real-time responses.
Figure 5. Natural Language Processing model
For compute intensive tasks, AI algorithms and deep learning models can help with inferencing and training tasks, and can help analyze user queries and provide real-time responses. Object detection or image recognition is being used increasingly for video surveillance in retail or for worker safety applications in manufacturing.
Figure 6. Object detection model
Text-to-speech chatbots are gaining popularity, along with voice assistants helping with multiple languages. R760 offers a great opportunity to support those use cases.
Figure 7. Text to speech model
Note: All testing was conducted in the Solutions and Performance Analysis Lab at Dell Technologies in February 2024.
Conclusion
The R760 supports various deep learning inference scenarios in the MLPerf benchmark, as well as other complex workloads, such as database and advanced analytics. It is an ideal solution for data center modernization to drive operational efficiency, lead to higher productivity, and minimize total cost of ownership (TCO).
The high performance and versatility are demonstrated across natural language processing, image classification, object detection, medical imaging, and speech-to-text inference scenarios. As AI is advancing in all segments, Dell PowerEdge servers can help you chose the right configuration for your performance requirements.
Appendix - MLPerf workloads and scenarios
References
- NVIDIA Ada Lovelace Professional GPU Architecture
- NVIDIA L40S - Unparalleled AI and graphics performance for the data center
- Dell Technologies - Acceleration-Optimized servers and accelerator portfolio