This whitepaper introduced and provided an overview of MLPerf Training v2.0. The key takeaways include:
- Our results include a large number of submissions to the closed division. This large number of submissions helps customers view the data in different ways to decide what suits their workloads for a like-to-like comparison between different vendors and OEMs.
- Though this round did not introduce new hardware, we were able to see performance improvement across different benchmarks. These improvements further justify that software optimization alone can yield higher performance gains, which can be experienced by our customers.
- The PowerEdge XE8545 server with the NVIDIA A100 SXM 80 GB card yielded the highest performance for different classes of workloads. Other PCIE-based servers, such as the PowerEdge R750xa and Dell DSS 8440 servers, with NVIDIA A100 and NVIDIA A30 accelerators are also an excellent choice for deep learning training.
- Among other submitters that include NVIDIA accelerator-based submissions, we are one of the few submitters that encompass multinode results. It is imperative to understand scaling performance across multiple servers as deep learning compute needs continue to increase with different kinds of deep learning models and parallelism techniques.
- PowerEdge XE8545 and R750xa servers and Dell DSS 8440 servers are powerful compute engines that can handle different kinds of workloads and perform optimally at a multinode large-scale setting. Linear performance can be expected with these servers.
- MLPerf Training v2.0 had 258 closed submission results, which is approximately a 150 percent increase compared to the previous round. This increase is a testament to validate the growth of MLPerf as an industry standard deep learning benchmark.