Overview

Dell Technologies has submitted results to the MLPerf Training benchmarking suite for the fifth round. This blog provides an overview of our submissions for the latest version, v1.1. Submission results indicate that different Dell EMC servers (Dell EMC DSS8440, PowerEdge R750xa, and PowerEdge XE8545 servers) offer promising performance for deep learning workloads. These workloads are across different problem types such as image classification, medical image segmentation, lightweight object detection, heavyweight object detection, speech recognition, natural language processing, recommendation, and reinforcement learning.

The previous blog about MLPerf v1.0 contains an introduction to MLCommons™ and the benchmarks in the MLPerf training benchmarking suite. We recommend that you read this blog for an overview of the benchmarks. All the benchmarks and rules remain the same as for v1.0.

The following graph with an exponentially scaled y axis indicates time to converge for the servers and benchmarks in question:

Fig 1: All Dell Technologies submission results for MLPerf Training v1.1

Figure 1 shows that this round of Dell Technologies submissions includes many results. We provided 51 results. These results encompass different Dell Technologies servers including Dell EMC DSS8440, PowerEdge R750xa, and PowerEdge XE8545 servers with various NVIDIA A100 accelerator configurations with different form factors: PCIe, SXM4, and different VRAM variants including 40 GB and 80 GB versions. These variants also include 300 W, 400 W, and 500 W TDP variants.

Note: For the hardware and software specifications of the systems in the graph, see https://github.com/mlcommons/training_results_v1.1/tree/master/Dell/systems.

Different benchmarks were submitted that span areas of image classification, medical image segmentation, lightweight object detection, heavy weight object detection, speech recognition, natural language processing, recommendation, and reinforcement learning. In all these areas, the Dell EMC DSS8440, PowerEdge R750xa, and PowerEdge XE8545 server performance is outstanding.

Highlights

Full coverage

Dell Technologies not only submitted the most results but also comprehensive results from a single system. PowerEdge XE8545 x 4 A100-SXM-80GB server results include submissions across the full spectrum of benchmarked models in the MLPerf training v1.1 suite such as BERT, DLRM, MaskR-CNN, Minigo, ResNet, SSD, RNNT, and 3D U-Net.

Multinode results

The performance scaling of the multinode results is nearly linear or linear and results scale well. This scaling makes the performance of Dell EMC servers in a multinode environment more conducive to faster time to value. Furthermore, among other submitters that include NVIDIA accelerator-based submissions, we are one of three submitters that encompass multinode results.

Improvements from v1.0 to v1.1

Updates for the Dell Technologies v1.1 submission include:

The v1.1 submission includes results from the PowerEdge R750xa server. The PowerEdge R750xa server offers compelling performance, well suited for artificial intelligence, machine learning, and deep learning training and inferencing workloads.
Our results include numbers for 10 GPUs with 80 GB A100 variants on the Dell EMC DSS8440 server. The results for 10 GPUs are useful because more GPUs in a server help to train the model faster, if constrained in a single node environment for training.

Fig 2: Performance comparison of BERT between v1.0 and v1.1 across Dell EMC DSS8440 and PowerEdge XE8545 servers

We noticed the performance improvement of v1.1 over v1.0 with the BERT model, especially with the PowerEdge XE8545 server. While many deep learning workloads were similar in performance between v1.0 and v1.1, the many results that we submitted help customers understand the performance difference across versions.

Conclusion

Our number of submissions was significant (51 submissions). They help customers observe performance with different Dell EMC servers across various configurations. A higher number of results helps customers understand server performance that enables a faster time to solution across different configuration types, benchmarks, and multinode settings.
Among other submissions that include NVIDIA accelerator-based submissions, we are one of three submitters that encompass multinode results. It is imperative to understand scaling performance across multiple servers as deep learning compute needs continue to increase with different kinds of deep learning models and parallelism techniques.
PowerEdge XE8545 x 4A100-SXM-80GB server results include all the models in the MLPerf v1.1 benchmark.
PowerEdge R750xa server results were published for this round; they offer excellent performance.

Next steps

In future blogs, we plan to compare the performance of NVLINK Bridged systems with non-NVLINK Bridged systems.

Your Browser is Out of Date

Dell EMC Servers Offer Excellent Deep Learning Performance with the MLPerf™ Training v1.1 Benchmark