Our MLPerf training submission v2.0 includes multinode and single-node SUT.
The following general syntax is used for the system name:
<Number of servers> x < Dell server name> x <number of accelerators> x <NVIDIA accelerator name>
To identify a single-node system, note that there are no entries for the number of servers or the number of servers is equal to one. For example, the following are single-node systems:
The following figure shows the performance of different single node systems:
Figure 12. Performance of single node systems on ResNet50 with NVIDIA A100 and A30 accelerators
The time to converge with the NVIDIA 4xA100-PCIe-80GB accelerator is less than the NVIDIA 4xA100-SXM-40GB accelerator with the ResNet50 model, as shown in the following figures:
Figure 13. Performance of single-node systems on BERT with NVIDIA A100 and A30 accelerators
Figure 14. Performance of single-node systems on RNN-T with NVIDIA A100 and A30 accelerators
Figure 15. Performance of single-node systems on RetinaNet with NVIDIA A100 and A30 accelerators
The following figure compares the performance of the NVIDIA A30 accelerator with the NVIDIA A100 accelerator:
Figure 16. 8xA30 compared to 4xA100 performance
The previous graphs show that single-node PowerEdge R750xa, Dell DSS 8440, and PowerEdge XE8545 systems performed well with NVIDIA A100 and A30 accelerators across different workload categories. The NVIDIA A100-SXM-80GB accelerator with the PowerEdge XE8545 server delivers the fastest time to value across different workload types.