For comparison, we consider the PowerEdge R740 server results that were submitted with 2nd Generation Intel Xeon scalable processors.
The following table lists the PowerEdge R740 server configuration:
Table 4. PowerEdge R740 configuration
Component | Description |
Processor | 2 x Intel Xeon Gold 6248R @ 3.00 GHz |
Memory | 384 GB (16 GB 3200 MT/s * 24) |
Local disk | 2x 1.8 TB SSD (No RAID) |
Operating system | CentOS 8.2.2004 |
GPU | 3x NVIDIA A100-PCIe-40G |
CUDA Driver | 460.32.03 |
Other software versions | TensorRT 7.2.3, CUDA 11.1, cuDNN 8.1.1, Driver 460.32.03, DALI 0.30.0, Triton 21.02 |
System profiles | Performance |
PCIe | Gen 3 |
ECC on GPU | ON |
MLPerf v1.0 System ID | R740_A100-PCIe-40GBx3_TRT |
The same A100-PCIe GPUs are used. The new PowerEdge R750xa server supports up to four A100 PCIe GPUs and the previous generation PowerEdge R740 server supports up to three PCIe A100 GPUs. This characteristic is the main reason that the PowerEdge R750xa server outperforms the PowerEdge R740 server significantly in overall performance that a single system can deliver. See the following figure for performance results:
Figure 6. DLRM 99 percent and 99.9 percent Offline and Server performance for PowerEdge R750xa and R740 servers
For more uniform performance, we divide the whole system number by the number of GPUs in the server. The following figure provides an example of the per GPU numbers on DLRM models:
Figure 7. DLRM 99 percent and 99.9 percent Offline and Server performance for PowerEdge R750xa and R740 servers
The figure shows that the PowerEdge R750xa server is 5.8 percent faster in the Offline scenario, and approximately 37 percent faster in the Server scenario. The DLRM model provides extremely high throughput, with thousands of queries occurring per second. The CPU and PCIe provide a significant impact on the server performance. These results demonstrate that for some high throughput models like DRLM, higher performance gains can be realized using the 3rd Gen Xeon scalable processors even while using same accelerators. Furthermore, PCIe Gen 4 also helps to improve performance in models like the DLRM Server scenario.