Performance is almost linearly proportional to the number of GPU cards. Examine Figures 1 to 4 and compare the performance of the R7515_T4x4 and R7525_T4x8 or R7525_A100x2 and R7525_A100x3.
Performance significantly tracks the number of GPU cards. The Relative performance of the R7525_T4x8 is about 2.0 for most benchmarks and It has twice the number of T4 GPUs than the reference system. Therefore, the number of GPUs have a significant impact on performance.
The more expensive GPUs provide better price/performance. From figure 5, the cost of the R7525_A100x3 configuration is 3x the cost of the reference configuration R7515_T4x4 but its relative price/performance is 0.61x.
The price of the RTX8000 is 2.22x of the price of the Tesla T4 as searched from the Dell website. The RTX8000 can be used with fewer GPU cards at a lower cost, 3 compared to 8 x T4. From Figure 5, the R7525_RTX8000x3 configuration is 0.8333 x of the cost of the R7525_T4x8 but it posts better price/performance and performance.
Dell Technologies provides the flexibility to deploy customer inference workloads on systems that match their requirements.
The NVIDIA T4 is a low profile, lower power GPU option that is widely deployed for inference due to its superior power efficiency and economic value.
With 48 GB of GDDR6 memory, the NVIDIA Quadro RTX 8000 is designed to work with memory intensive workloads like creating the most complex models, building massive architectural datasets, and visualizing immense data science workloads. Dell Technologies is the only vendor that submitted results using NVIDIA Quadro RTX GPUs.
NVIDIA A100-PCIe-40G is a powerful platform popularly used for training state-of-the-art Deep Learning models. For customers who are not on a budget and have heavy Inference computational requirements, its initial high cost is more than offset by the better price/performance.