Based on the architecture described in this white paper, the Integrated Solutions team used the Dell Technologies HPC & AI Innovation Lab to conduct performance studies of virtualized GPUs on VMware vSphere. The goal of the performance study is to understand the performance difference between bare-metal and virtualized GPUs, and to characterize the benefits of MIG capability in NVIDIA Ampere GPUs in VMware vSphere.
We compared the performance of bare-metal GPUs with virtualized GPUs using the ResNet model training algorithm. The results show less than a five percent model training performance penalty on virtualized GPUs compared to bare metal. This penalty is not significant when compared to the gains of increased operational efficiency through virtualization.
We also characterized the performance of MIGs using both ResNet model training and inference. The results proved that MIG partitions provided predictable performance, regardless of how the other partitions are being used. The results also showed increased GPU use when using GPU partitions for workloads, such as inference, that does not require an entire GPU.
For more information about the performance studies, see the Virtualizing GPUs for AI with VMware and NVIDIA design guide.