Dell EMC Servers Offer Excellent Deep Learning Performance with the MLPerf™ Training v1.1 Benchmark
Wed, 01 Dec 2021 19:20:14 -0000|
Read Time: 0 minutes
Dell Technologies has submitted results to the MLPerf Training benchmarking suite for the fifth round. This blog provides an overview of our submissions for the latest version, v1.1. Submission results indicate that different Dell EMC servers (Dell EMC DSS8440, PowerEdge R750xa, and PowerEdge XE8545 servers) offer promising performance for deep learning workloads. These workloads are across different problem types such as image classification, medical image segmentation, lightweight object detection, heavyweight object detection, speech recognition, natural language processing, recommendation, and reinforcement learning.
The previous blog about MLPerf v1.0 contains an introduction to MLCommons™ and the benchmarks in the MLPerf training benchmarking suite. We recommend that you read this blog for an overview of the benchmarks. All the benchmarks and rules remain the same as for v1.0.
The following graph with an exponentially scaled y axis indicates time to converge for the servers and benchmarks in question:
Fig 1: All Dell Technologies submission results for MLPerf Training v1.1
Figure 1 shows that this round of Dell Technologies submissions includes many results. We provided 51 results. These results encompass different Dell Technologies servers including Dell EMC DSS8440, PowerEdge R750xa, and PowerEdge XE8545 servers with various NVIDIA A100 accelerator configurations with different form factors: PCIe, SXM4, and different VRAM variants including 40 GB and 80 GB versions. These variants also include 300 W, 400 W, and 500 W TDP variants.
Note: For the hardware and software specifications of the systems in the graph, see https://github.com/mlcommons/training_results_v1.1/tree/master/Dell/systems.
Different benchmarks were submitted that span areas of image classification, medical image segmentation, lightweight object detection, heavy weight object detection, speech recognition, natural language processing, recommendation, and reinforcement learning. In all these areas, the Dell EMC DSS8440, PowerEdge R750xa, and PowerEdge XE8545 server performance is outstanding.
Dell Technologies not only submitted the most results but also comprehensive results from a single system. PowerEdge XE8545 x 4 A100-SXM-80GB server results include submissions across the full spectrum of benchmarked models in the MLPerf training v1.1 suite such as BERT, DLRM, MaskR-CNN, Minigo, ResNet, SSD, RNNT, and 3D U-Net.
The performance scaling of the multinode results is nearly linear or linear and results scale well. This scaling makes the performance of Dell EMC servers in a multinode environment more conducive to faster time to value. Furthermore, among other submitters that include NVIDIA accelerator-based submissions, we are one of three submitters that encompass multinode results.
Improvements from v1.0 to v1.1
Updates for the Dell Technologies v1.1 submission include:
- The v1.1 submission includes results from the PowerEdge R750xa server. The PowerEdge R750xa server offers compelling performance, well suited for artificial intelligence, machine learning, and deep learning training and inferencing workloads.
- Our results include numbers for 10 GPUs with 80 GB A100 variants on the Dell EMC DSS8440 server. The results for 10 GPUs are useful because more GPUs in a server help to train the model faster, if constrained in a single node environment for training.
Fig 2: Performance comparison of BERT between v1.0 and v1.1 across Dell EMC DSS8440 and PowerEdge XE8545 servers
We noticed the performance improvement of v1.1 over v1.0 with the BERT model, especially with the PowerEdge XE8545 server. While many deep learning workloads were similar in performance between v1.0 and v1.1, the many results that we submitted help customers understand the performance difference across versions.
- Our number of submissions was significant (51 submissions). They help customers observe performance with different Dell EMC servers across various configurations. A higher number of results helps customers understand server performance that enables a faster time to solution across different configuration types, benchmarks, and multinode settings.
- Among other submissions that include NVIDIA accelerator-based submissions, we are one of three submitters that encompass multinode results. It is imperative to understand scaling performance across multiple servers as deep learning compute needs continue to increase with different kinds of deep learning models and parallelism techniques.
- PowerEdge XE8545 x 4A100-SXM-80GB server results include all the models in the MLPerf v1.1 benchmark.
- PowerEdge R750xa server results were published for this round; they offer excellent performance.
In future blogs, we plan to compare the performance of NVLINK Bridged systems with non-NVLINK Bridged systems.
Related Blog Posts
Quantifying Performance of Dell EMC PowerEdge R7525 Servers with NVIDIA A100 GPUs for Deep Learning Inference
Tue, 17 Nov 2020 18:30:15 -0000|
Read Time: 0 minutes
The Dell EMC PowerEdge R7525 server provides exceptional MLPerf Inference v0.7 Results, which indicate that:
- Dell Technologies holds the #1 spot in performance per GPU with the NVIDIA A100-PCIe GPU on the DLRM-99 Server scenario
- Dell Technologies holds the #1 spot in performance per GPU with the NVIDIA A100-PCIe on the DLRM-99.9 Server scenario
- Dell Technologies holds the #1 spot in performance per GPU with the NVIDIA A100-PCIe on the ResNet-50 Server scenario
In this blog, we provide the performance numbers of our recently released Dell EMC PowerEdge R7525 server with two NVIDIA A100 GPUs on all the results of the MLPerf Inference v0.7 benchmark. Our results indicate that the PowerEdge R7525 server is an excellent choice for inference workloads. It delivers optimal performance for different tasks that are in the MLPerf Inference v0.7 benchmark. These tasks include image classification, object detection, medical image segmentation, speech to text, language processing, and recommendation.
The PowerEdge R7525 server is a two-socket, 2U rack server that is designed to run workloads using flexible I/O and network configurations. The PowerEdge R7525 server features the 2nd Gen AMD EPYC processor, supports up to 32 DIMMs, has PCI Express (PCIe) Gen 4.0-enabled expansion slots, and provides a choice of network interface technologies to cover networking options.
The following figure shows the front view of the PowerEdge R7525 server:
Figure 1. Dell EMC PowerEdge R7525 server
The PowerEdge R7525 server is designed to handle demanding workloads and for AI applications such as AI training for different kinds of models and inference for different deployment scenarios. The PowerEdge R7525 server supports various accelerators such as NVIDIA T4, NVIDIA V100S, NVIDIA RTX, and NVIDIA A100 GPU s. The following sections compare the performance of NVIDIA A100 GPUs with NVIDIA T4 and NVIDIA RTX GPUs using MLPerf Inference v0.7 as a benchmark.
The following table provides details of the PowerEdge R7525 server configuration and software environment for MLPerf Inference v0.7:
AMD EPYC 7502 32-Core Processor
512 GB (32 GB 3200 MT/s * 16)
2x 1.8 TB SSD (No RAID)
CentOS Linux release 8.1
NVIDIA A100-PCIe-40G, T4-16G, and RTX8000
Other CUDA-related libraries
TensorRT 7.2, CUDA 11.0, cuDNN 8.0.2, cuBLAS 11.2.0, libjemalloc2, cub 1.8.0, tensorrt-laboratory mlperf branch
Other software stack
Docker 19.03.12, Python 3.6.8, GCC 5.5.0, ONNX 1.3.0, TensorFlow 1.13.1, PyTorch 1.1.0, torchvision 0.3.0, PyCUDA 2019.1, SacreBLEU 1.3.3, simplejson, OpenCV 4.1.1
For more information about how to run the benchmark, see Running the MLPerf Inference v0.7 Benchmark on Dell EMC Systems.
MLPerf Inference v0.7 performance results
The MLPerf inference benchmark measures how fast a system can perform machine learning (ML) inference using a trained model in various deployment scenarios. The following results represent the Offline and Server scenarios of the MLPerf Inference benchmark. For more information about different scenarios, models, datasets, accuracy targets, and latency constraints in MLPerf Inference v0.7, see Deep Learning Performance with MLPerf Inference v0.7 Benchmark.
In the MLPerf inference evaluation framework, the LoadGen load generator sends inference queries to the system under test, in our case, the PowerEdge R7525 server with various GPU configurations. The system under test uses a backend (for example, TensorRT, TensorFlow, or PyTorch) to perform inferencing and sends the results back to LoadGen.
MLPerf has identified four different scenarios that enable representative testing of a wide variety of inference platforms and use cases. In this blog, we discuss the Offline and Server scenario performance. The main differences between these scenarios are based on how the queries are sent and received:
- Offline—One query with all samples is sent to the system under test. The system under test can send the results back once or multiple times in any order. The performance metric is samples per second.
- Server—Queries are sent to the system under test following a Poisson distribution (to model real-world random events). One query has one sample. The performance metric is queries per second (QPS) within latency bound.
Note: Both the performance metrics for Offline and Server scenario represent the throughput of the system.
In all the benchmarks, two NVIDIA A100 GPUs outperform eight NVIDIA T4 GPUs and three NVIDIA RTX800 GPUs for the following models:
- ResNet-50 image classification model
- SSD-ResNet34 object detection model
- RNN-T speech recognition model
- BERT language processing model
- DLRM recommender model
- 3D U-Net medical image segmentation model
The following graphs show PowerEdge R7525 server performance with two NVIDIA A100 GPUs, eight NVIDIA T4 GPUs, and three NVIDIA RTX8000 GPUs with 99% accuracy target benchmarks and 99.9% accuracy targets for applicable benchmarks:
- 99% accuracy (default accuracy) target benchmarks: ResNet-50, SSD-Resnet34, and RNN-T
- 99% and 99.9% accuracy (high accuracy) target benchmarks: DLRM, BERT, and 3D-Unet
99% accuracy target benchmarks
The following figure shows results for the ResNet-50 model:
Figure 2. ResNet-50 Offline and Server inference performance
From the graph, we can derive the per GPU values. We divide the system throughput (containing all the GPUs) by the number of GPUs to get the Per GPU results as they are linearly scaled.
The following figure shows the results for the SSD-Resnet34 model:
Figure 3. SSD-Resnet34 Offline and Server inference performance
The following figure shows the results for the RNN-T model:
Figure 4. RNN-T Offline and Server inference performance
99.9% accuracy target benchmarks
The following figures show the results for the DLRM model with 99% and 99.9% accuracy:
Figure 5. DLRM Offline and Server Scenario inference performance – 99% and 99.9% accuracy
For the DLRM recommender and 3D U-Net medical image segmentation (see Figure 7) models, both 99% and 99.9% accuracy have the same throughput. The 99.9% accuracy benchmark also satisfies the required accuracy constraints with the same throughput as that of 99%.
The following figures show the results for the BERT model with 99% and 99.9% accuracy:
Figure 6. BERT Offline and Server inference performance – 99% and 99.9% accuracy
For the BERT language processing model, two NVIDIA A100 GPUs outperform eight NVIDIA T4 GPUs and three NVIDIA RTX8000 GPUs. However, the performance of three NVIDIA RTX8000 GPUs is a little better than that of eight NVIDIA T4 GPUs.
For the 3D-Unet medical image segmentation model, only the Offline scenario benchmark is available.
The following figure shows the results for the 3D U-Net model Offline scenario:
Figure 7. 3D U-Net Offline inference performance
For the 3D-Unet medical image segmentation model, since there is only offline scenario benchmark for 3D-Unet the above graph represents only Offline scenario.
The following table compares the throughput between two NVIDIA A100 GPUs, eight NVIDIA T4 GPUs, and three NVIDIA RTX8000 GPUs with 99% accuracy target benchmarks and 99.9% accuracy targets:
2 x A100 GPUs vs 8 x T4 GPUs
2 x A100 GPUs vs 3 x RTX8000 GPUs
With support of NVIDIA A100, NVIDIA T4, or NVIDIA RTX8000 GPUs, Dell EMC PowerEdge R7525 server is an exceptional choice for various workloads that involve deep learning inference. However, the higher throughput that we observed with NVIDIA A100 GPUs translates to performance gains and faster business value for inference applications.
Dell EMC PowerEdge R7525 server with two NVIDIA A100 GPUs delivers optimal performance for various inference workloads, whether it is in a batch inference setting such as Offline scenario or Online inference setting such as Server scenario.
In future blogs, we will discuss sizing the system (server and GPU configurations) correctly based on the type of workload (area and task).
Inference Results Comparison of Dell Technologies Submissions for MLPerf™ v1.0 and MLPerf™ v1.1
Wed, 17 Nov 2021 17:51:05 -0000|
Read Time: 0 minutes
The Dell Technologies HPC & AI Innovation Lab recently submitted results to the MLPerf Inference v1.1 benchmark suite. These results provide our customers with transparent information about the performance of Dell EMC servers. This blog highlights the enhancements between the MLPerf™ Inference v1.0 and MLPerf Inference v1.1 submissions from Dell Technologies. These enhancements include improved GPU performance and new software to extract performance. Also, this blog compares server and GPU configurations from the MLPerf Inference v1.0 and v1.1 submissions.
The MLPerf Inference submissions focus was on outperforming the expectations outlined by MLPerf. For an introduction to the MLPerf Inference v1.0 performance results, we recommend that you read this blog published by Dell Technologies.
The following table provides the software stack configurations from the two submissions for the closed division benchmarks:
Table 1: MLPerf Inference v1.0 and v1.1 software stacks
The following table shows the Dell EMC servers used for the MLPerf Inference v1.0 and v1.1 submissions:
Table 2: Servers used for the MLPerf Inference v1.0 and v1.1 submissions
10 x A100-PCIe-40GB
10 x A40
10 x NVIDIA A100-PCIE-80GB
8 x A30 (TensorRT)
8 x A30 (Triton)
3 x Quadro RTX 8000
2 x A100-PCIe-40GB
3 x A100-PCIe-40GB
3 x A100-PCIE-40GB
3 x A30
3 x GRID A100-40C
3 x NVIDIA A100-PCIe-40GB
4 x A100-PCIe-40GB
4 x A100-PCIE-40GB, MaxQ
4 x A100-PCIE-80GB-MIG-7x1g.10gb
4 x A100-PCIE-80GB (TensorRT)
4 x A100-PCIE-80GB (Triton)
4 x T4
2 x A10
4 x A100-SXM-40GB
4 x A100-SXM-80GB
4 x A100-SXM-80GB-7x1g.10gb
4 x A100-SXM-80GB (TensorRT)
4 x A100-SXM-80GB (Triton)
2 x A10
Besides the upgrades in the software stack that are detailed in the preceding table and the results from the latest hardware, differences between the MLPerf Inference v1.0 and v1.1 submissions include:
- The Multistream scenario has been deprecated in MLPerf v1.1.
- The total number of submitters increased from 17 to 21.
- There were 1725 total submissions to MLCommons™ in v1.1.
MLPerf Inference v1.0 compared to MLPerf Inference v1.1
We compared the MLPerf v1.0 and v1.1 submissions by looking at results from an identical server and the same GPU configurations used in both rounds of submission. For both submissions, Dell Technologies submitted results for the Dell EMC PowerEdge XE8545 server configured with four A100 SXM 80 GB GPUs. The PowerEdge XE8545 servers used a combination of the latest AMD CPUs and powerful NVIDIA A100 Tensor Core GPUs. The PowerEdge XE8545 Spec Sheet provides additional details about the server.
The following figure shows nearly level performance across the two submissions, which allows for a fair comparison between the submissions. Also, it shows that we need to be aware of the software upgrades listed in Table 1, no matter how minimal.
Figure 1: Relative performance comparison of PowerEdge XE8545 4 x A100 SXM 80 GB in MLPerf v1.0 and v1.1
Dell EMC systems improvements for MLPerf Inference v1.1
This section provides detailed comparisons of various GPUs across the MLPerf Inference v1.0 and v1.1 submissions to show an expansion of Dell EMC server and GPU configurations that are available.
A100 40 GB GPU compared with A100 80 GB GPU
Dell EMC DSS 8440 server
The Dell EMC DSS 8440 server delivers high performance at a lower cost compared to our competitors. By offering support for four, eight, or 10 GPUs, this server excels in processing capacity along with a flexible infrastructure. The DSS 8440 server delivers high performance for machine learning workloads. The DSS 8440 Spec Sheet provides more details about the server.
The following figure compares two DSS 8440 servers configured with NVIDIA A100 Tensor Core GPUs. For the v1.0 submission, the DSS 8440 server was configured with the A100 40 GB GPU (shown in blue). For the v1.1 submission, the DSS 8440 server was configured with the A100 80 GB GPU (shown in orange). Across the different models, the performance improvement was between three percent to 20 percent, favoring the system with the A100 80 GB GPU. The more than 10 percent performance improvement can be attributed to the frequency of each card; the A100 80 GB GPU is a 300W card whereas the A100 40 GB GPU is 250W card.
Figure 2: Relative performance comparison of DSS 8440 10 x A100 PCIe 40 GB and 80 GB in MLPerf v1.0 and v1.1
Dell EMC PowerEdge R750xa server
The PowerEdge R750xa server is ideal for Artificial Intelligence (AI)/Machine Learning (ML)/Deep Learning (DL) training and inferencing, high performance computing, and virtualization. See the Dell EMC PowerEdge R750xa Spec Sheet for more information about the server.
For this comparison, the server for both submissions was consistent. For the MLPerf v1.0 submission, the PowerEdge R750xa server was configured with four A100 40 GB GPUs. For the MLPerf v1.1 submission, the PowerEdge R750xa server was configured with four A100 80 GB GPUs. The following figure shows that for the MLPerf v1.1 submission, extra performance was extracted from the system. Across the various models, the MLPerf v1.1 results are seven percent to 22 percent better than the results from the MLPerf v1.0 submission. In the Resnet50 benchmark, the MLPerf v1.1 results are an impressive 15 and 19 percent better in the Offline and Server scenarios respectively.
Figure 3: Relative performance of PowerEdge R750xa 4 x A100 40 GB GPU and 80 GB in MLPerf v1.0 and v1.1 respectively
Dell EMC PowerEdge XE8545 server
For the MLPerf v1.0 submission, the PowerEdge XE8545 server was configured with the A100 SXM4 40 GB GPU (shown in blue in figures 4 and 5) and the A100 SXM4 80 GB GPU (shown in orange in figures 4 and 5). For the MLPerf v1.1 submission, the PowerEdge XE 8545 server was configured with the A100 SXM4 80 GB GPU (shown in gray in figures 4 and 5). It was expected that for the MLPerf v1.0 submission, the A100 SXM4 80 GB GPU would outperform the A100 SXM4 40 GB GPU. Across the models in the MLPerf v1.1 submission, the A100 SXM4 80 GB GPU performed between negative one percent (a negative value indicates a performance deficit, noted for SSD ResNet34 in Figure 5) and eight percent better than the identical system in the MLPerf v1.0 submission. Interestingly, for the SSD Resnet-34 benchmark, the A100 GPU in the MLPerf v1.0 submission slightly outperformed the A100 GPU in the MLPerf v1.1 submission.
Figure 4: Performance of PowerEdge XE8545 4 x A100 40 GB and 80 GB in MLPerf v1.0 and 80 GB in MLPerf v1.1 for ResNet50 and RNNT
Figure 5: Performance of PowerEdge XE8545 4 x A100 40 GB and 80 GB in MLPerf v1.0 and 80 GB in MLPerf v1.1 for BERT and SSD ResNet34
NVIDIA A30 GPU compared with NVIDIA A40 GPU
This comparison considers the NVIDIA A40 and NVIDIA A30 Tensor Core GPU. For a fair comparison between the two GPUs, the DSS 8440 server configuration was consistent across the two submissions. For the MLPerf v1.0 submission, the DSS 8440 server was configured with ten A40 GPUs. For the MLPerf v1.1 submission, the server was configured with eight A30 GPUs. For a clear interpretation of the two GPUs, the results in Figure 6 are presented as the per card performance numbers, which means that the throughput results from the A40 GPU have been divided by ten and the results from the A30 GPU have been divided by eight.
The system configured with the A30 GPU performed 15 to 111 percent better than the A40 GPU across the various benchmarks. The A30 GPU is ideal for inference as it is configured with a High Bandwidth Memory (HBM2) and a higher GPU frequency. The A40 GPU is positioned more for Virtual Desktop Infrastructure (VDI) and other workloads.
Figure 6: Per card relative performance comparison of the DSS 8440 server with A30 and A40 GPUs in MLPerf v1.0 and v1.1
Comparison of NVIDIA T4, A30, and A10 GPUs
This comparison considers three submissions on three different servers. The numbers are divided to display per card performance.
The Dell EMC PowerEdge XE2420 server is a specialty edge server that supports demanding applications at the edge, retail applications and analytics, manufacturing and logistics applications, and 5G cell processing. See the PowerEdge XE2420 Spec Sheet for more information. Our lab configured the system with four NVIDIA Tesla T4 GPUs that have been optimized for high utilization while also performing in an energy-efficient manner. The results from this system were published in the MLPerf Inference v1.0 Results.
The second server in this comparison is the DSS 8440 server, which was configured with eight NVIDIA A30 GPUs. The final server in this comparison is the PowerEdge XE2420 server, which was configured with two NVIDIA A10 GPUs.
The three cards in this comparison have different form factors; the A10 and A30 GPUs are larger than the T4 GPU. The following figure shows that the A30 GPU performed better than the other two GPUs. Across the various benchmarks, the A30 GPU performed between 204 and 360 percent better than the T4 GPU and between five percent and 57 percent better than the A10 GPU.
Figure 7: Comparison of T4, A30, and A10 GPUs for DLRM
Figure 8: Comparison of T4, A40, and A10 GPUs for ResNet50, RNNT, and SSD ResNet34
Comparison of NVIDIA T4 GPU, A30 Multi-Instance GPU (MIG), and A100 MIG
This comparison also considers three submissions on three different servers. The results from the Resnet50 and SSD Resnet34 benchmarks have been divided to display per card performance.
The PowerEdge XE2420 server was configured with four NVIDIA Tesla T4 GPUs. The results for this system are from the MLPerf v1.0 submission. The PowerEdge R7525 server was configured with three NVIDIA A30 GPUs. MIG was enabled on all these GPUs with a profile of 1g6gb. We did not publish the A30 MIG results on the PowerEdge R7525 server to MLCommons, but the results are compliant.
The PowerEdge R750xa server was configured with four NVIDIA A100 80 GB GPUs, which support Multi-Instance GPU (MIG) and Peripheral Component Interconnect Express (PCIe). MIG is an enhancement for NVIDIA GPUs with the Ampere architecture that allows for seven secure partitions of GPU instances. This architecture is beneficial because it allows for increased parallelism. The results from this system were submitted in the MLPerf Inference v1.1 submission. There are different sizes of MIG slices. The configuration for the A30 and A100 GPUs used the smallest slice possible. For example, the A100 GPU was divided into seven slices and the A30 GPU into four slices.
The following figures show results across the MLPerf v1.0 and v1.1 submissions from Dell Technologies for ResNet50 and SSD ResNet34. Figure 9 shows per physical GPU results. For the ResNet50 Offline benchmark, the A30 GPU performed 232 percent better than the T4 GPU, while the A100 GPU performed 76 percent better than the A30 GPU. In the ResNet 50 Server mode, the A30 GPU outperformed the T4 GPU by 50 percent and the A100 GPU outperformed the A30 GPU by 23 percent. We observed a similar trend across the Offline and Server modes where the A100 GPU outperformed the A30 GPU, which outperformed the T4 GPU.
Figure 9: Per card performance of the T4 GPU, A30 MIG, and A100 MIG for ResNet50
In the SSD ResNet34 benchmark, we observed a similar trend where the performance of the A100 GPU was better than the performance of the A30 GPU, which performed better than the T4 GPU. In the Offline mode of the SSD ResNet34 benchmark, the A30 GPU performed 243 percent better than the T4 GPU, and the A100 GPU performed 77 percent better than the A30 GPU. In the Server mode, the A100 GPU outperformed the A30 GPU by 93 percent and the A30 GPU performed 198 percent better than the T4 GPU.
Figure 10: Per card performance of the T4 GPU, A30 MIG, and A100 MIG for SSD ResNet34
This blog has provided a brief introduction to MLPerf Inference benchmarking and a summary of the Dell Technologies submission from MLPerf Inference v1.0. Also, it highlighted the differences in the software stack between the MLPerf v1.0 and v1.1 submissions. This blog quantified results from various server and GPU configurations across the two rounds of MLPerf submissions and displayed noteworthy and relevant performance comparisons.
When comparing the A100 40 GB to the A100 80 GB GPUs on the Dell EMC DSS 8440 server, the latter exhibited an 11 percent increase in performance. On the Dell EMC PowerEdge R750xa server, the A100 PCIe 80 GB GPU performed 12 percent better than the A100 PCIe 40 GB GPU. The Dell EMC PowerEdge XE8545 server confirmed this result for the MLPerf v1.1 submission; the A100 SXM 80 GB GPU performed three percent better than an identical system from the MLPerf v1.0 submission.
The A30 and A40 GPU comparison showed that the former achieved a notable 42 percent performance improvement while maintaining the Dell EMC DSS 8440 server.
The comparison between the T4, A30, and A10 GPUs revealed that the A30 GPU performed significantly better than the T4 GPU and is considered a good upgrade for your ML workloads. The T4 GPU, A30 MIG, and A100 MIG were compared based on results from the ResNet50 and SSD-ResNet34 benchmarks.