The following table provides details about the PowerEdge R750xa server configuration and software environment for the MLPerf Inference v1.0 submission:
Table 1. Test bed configuration
Component | Description |
Processor | 2 x Intel Xeon Gold 6338 CPU 32 Cores @ 2.00 GHz |
Memory | 256 GB (16 GB 3200 MT/s * 16) |
Local disk | 2 x 1.8 TB SSD (No RAID) |
Operating system | CentOS 8.2.2004 |
GPU | 4 x NVIDIA A100-PCIe-40G |
CUDA Driver | 460.32.03 |
Other software versions | NVIDIA TensorRT 7.2.3, NVIDIA CUDA 11.1, NVIDIA cuDNN 8.1.1, Driver 460.32.03, NVIDIA DALI 0.30.0, NVIDIA Triton 21.02 |
System profiles | Performance |
ECC on GPU | ON |
MLPerf v1.0 System ID | R750xa_A100-PCIE-40GBx4_TRT |
The PowerEdge R750xa server is designed specifically for accelerated workloads, which makes it ideal for cutting edge machine learning models, high-performance computing (HPC), and GPU virtualization. It uses Ice Lake processors and supports up to four double-wide GPUs. The PowerEdge R750xa server supports all PCIe data center GPUs in the PowerEdge portfolio, including the NVIDIA and AMD GPU product lines.
The PowerEdge R750xa server has undergone NVIDIA’s comprehensive certification program and is NVIDIA-Certified for enterprise AI (https://www.nvidia.com/en-us/data-center/products/certified-systems/). The PowerEdge R750xa server supports the newest NVIDIA GPUs (such as A100, A40, A30, and A10 GPUs) and Gen 3 GPUs (such as M10 and T4 GPUs). The PowerEdge R750xa server is more flexible; it can support two or four GPUs and does not require the data center administrator to keep four GPUs. The server also supports NVIDIA NVLink Bridge.
The PowerEdge R750xa server also supports multiinstance GPUs (MIGs), such as the NVIDIA A100 GPU. This feature enables the user to provision the A100 GPU into a maximum of seven individual instances. Each instance can be assigned to a different user, workload, or application that helps to increase GPU utilization.
The PowerEdge R750xa server is designed to be future proof. The GPUs are placed in the front of the server, together with up to eight storage drives, to enable higher airflow through the server. This configuration makes it easier to support GPUs with higher thermal design power (TDP) as they are released.
Ice Lake is the successor to Intel’s Cascade Lake processor. The Ice Lake processor has up to 40 cores, six terabytes of system memory per socket, up to eight channels of DDR4-3200 memory per socket, and up to 64 PCIe Gen 4 lanes per socket. It is also the first CPU that Intel has released to support PCIe Gen4, which doubles the bit rate of Gen3, which is ideal for transferring data between CPUs and GPUs.
The following table lists the Ice Lake specifications:
Table 2. Intel Xeon Gold 6338 specifications
Component | Description |
Product collection | 3rd Generation Intel Xeon Scalable Processors |
Code name | Ice Lake |
Processor name | Gold 6338 |
Status | Launched |
Number of CPU cores | 32 |
Number of threads | 64 |
Processor base frequency | 2.0 GHz |
Maximum turbo speed | 3.20 GHz |
Cache L3 | 48 MB |
Memory type | DDR4-2933 |
ECC memory supported | Yes |
PCI Express revision | 4.0 |
Maximum number of PCIe lanes | 64 |
The following figure shows the Ice Lake processor:
Figure 2. Ice Lake processor
Four NVIDIA A100 PCIe GPUs were used with the PowerEdge R750xa server for the MLPerf Inference v1.0 submission. The A100 GPU has many features that optimize inference workloads. It supports acceleration of different precisions (from FP32 to INT4) enables structural sparsity and delivers orders of magnitude of performance gains.
The following table lists the NVIDIA A100 PCIe GPU specifications:
Table 3. NVIDIA A100 PCIe GPU specifications
Component | Description |
GPU architecture | NVIDIA Ampere |
NVIDIA Tensor cores | 432 |
NVIDIA CUDA cores | 6912 |
Single precision | 19.5 TFLOPS |
Double precision | 9.7 TFLOPS |
INT8 | 1248 TOPS |
INT4 | 2496 TOPS |
GPU memory | 40 GB HBM2 |
ECC | Yes |
Memory bandwidth | 1,555 GB/s |
Interconnect interface | PCIe Gen4: 64 GB/s |
Form factor | PCIe |
Thermal solution | Passive |
Compute APIs | NVIDIA CUDA, DirectCompute, OpenCL, OpenACC |
MIG | Various instance sizes with up to 7 MIGs @ 5 GB |
TDP | 250 watts |
The following figure shows the NVIDIA A100 PCIe GPU:
Figure 3. NVIDIA A100 PCIe GPU