This section describes performance improvements from MLPerf Training v2.1 to MLPerf Training v3.0.
The following figure shows the performance gains that customers can expect if they upgrade to the latest generation of Dell servers. It shows the performance improvement factor using a PowerEdge XE9680 server with the previous generation PowerEdge XE8545 server as a baseline across different benchmarks. Note that the PowerEdge XE9680 server has eight NVIDIA H100 SXM GPUs; the previous generation PowerEdge XE8545 server has four NVIDIA A100 SXM GPUs.
The most improvement at 846 percent was observed with the SSD benchmark, followed by the BERT benchmark at 611 percent. Other benchmarks yielded a greater than 230 percent improvement. These results are significant. The two-times improvement in time to train means more time for other workloads in the data center, yielding faster time to value for the business. With this acceleration, customers can expect faster prototyping, model training, and expediting their MLOps pipeline.
The following figure compares PowerEdge R750xa and R760xa servers with four NVIDIA H100 PCIe GPUs. The down arrows indicate an improvement in system performance.
The figure shows the performance of PowerEdge R750xa and R760xa servers with the same accelerator: the NVIDIA H100 GPU. The Y axis shows that the convergence time is lower. The most improvement in performance was from ResNet (13.1 percent), followed by the RNN-T (13.06 percent) and MaskRCNN (12.5 percent) benchmark and others.
These results also demonstrate the effectiveness of PCIe Gen 5 and TDP for multi-GPU training workloads. The TDP for GPUs is 310 W for PowerEdge R750xa servers and 350 W for PowerEdge R760xa servers.
The following figure compares a PowerEdge R750xa server with four NVIDIA A100 GPUs and a PowerEdge R760xa server with four NVIDIA H100 GPUs. The results show significant improvements. The highest improvements are seen with SSD (64.56 percent), ResNet (53.17 percent), followed by U-NET3D (50.17 percent) and other benchmarks. The down arrows indicate performance improvement with the new generation servers.
Upgrading to the newer generation server and GPUs allows customers to see extended performance growth with their workloads as indicated by these results.
The following figure shows that using a PowerEdge XE8640 server with four NVIDIA H100 GPUs and a PowerEdge XE9680 server with eight NVIDIA A100 GPUs renders almost similar performance across different benchmarks. These results demonstrate that it is more effective to use four NVIDIA H100 GPUs over eight NVIDIA A100 GPUs for compute density and power: NVIDIA H100 GPUs can deliver faster time to value.
The following figure shows the improvement that the NVIDIA H100 GPU offers compared to the NVIDIA A100 GPU using the latest PowerEdge XE9680 server. These results show a significant improvement. The highest improvement was seen with BERT (65.6 percent), followed by SSD (53.1 percent) and ResNet (50.2 percent), and other benchmarks. It is evident that the NVIDIA H100 GPU offers superior performance in all the workload types, and it is an excellent choice for accelerating training workloads. The down arrows represent improved performance.
For the fastest time to value, we recommend upgrading to the NVIDIA H100 GPU with the PowerEdge XE9680 server. This configuration delivers impressive performance improvements compared to older generation servers and GPUs configurations.
The following figure compares the older generation PowerEdge XE8545 server with the new generation servers such as the PowerEdge XE8640 and XE9680 servers. As expected, it is noticeable that the older generation server with the NVIDIA A100 GPU takes the most time to converge. The PowerEdge XE9680 server with eight NVIDIA H100 GPUs proves to have the fastest time to converge across all the benchmarks.
The following figure shows the improvement in performance that customers can expect when they use the NVIDIA H100 GPU PCIe form factor compared to the SXM form factor. The down arrows indicate better performance.
The highest gains were seen with RNN-T (39.12 percent), followed by ResNet (33.55 percent) and others. The SXM form factor offers considerable acceleration with multi-GPU training. The TDP for an NVIDIA H100 PCIe form factor GPU is 350 W, whereas the TDP for the SXM form factor is 700 W. This TDP can be an important data point to consider when choosing a server and GPUs. Power-constrained data centers can benefit from the acceleration that the PCIe form factor renders, while taking a comparatively low performance hit.
The new generation four-GPU servers such as the PowerEdge R760xa and XE8640 servers are both good choices for training deep learning workloads. The PowerEdge R760xa server is an excellent server choice for delivering high performance while dealing with hard power constraints.
NVIDIA A100 GPUs with the SXM form factor offer faster time to convergence compared to the PCIe form factor. The PowerEdge R750xa and XE8545 servers with NVIDIA A100 GPUs offer faster time to convergence compared to previous generation hardware systems.
Customers can also benefit from software-only improvements. However, software-only improvements are good, but they are not as good as hardware and software improvements, as seen in the following figure:
If your business needs faster time to convergence, it is easier now than ever to upgrade to new generation servers with NVIDIA GPUs. These configurations provide many benefits, packed with high performance computing, support for newer model architectures, and innovations that deliver deep learning workloads. The benefits help business needs extensively and help deploy AI at the forefront in the modern era of computing.