Our results and observations are as follows:
For our recommendations, we explicitly assessed both performance and operational flexibility. Our performance comparisons were completed with the MIG feature enabled on the A100 GPUs, which resulted in at most a -5 percent impact on the workload performance. The advantage of the MIG-enabled GPU is that you can configure and reconfigure vGPU profiles on one or more VMs with no operational downtime for the host server. Profiles assigned to VMs that are not running jobs can be reassigned or modified while other VMs are running workloads on other partitions of a shared GPU.
The partitions perform relative to the size of the dedicated resources available for that partition. We are unable to run ResNet training on partitions grid_a100-2-10c and grid_a100-1-5c. These partitions are suited for inference only and not recommended for neural network training.
The following table describes the eight scenarios:
A single VM configured with grid_a100-7-40c profile
A single VM configured with grid_a100-4-20c profile
A single VM configured with grid_a100-3-20c profile
A single VM configured with grid_a100-2-10c profile
A single VM configured with grid_a100-1-5c profile
Three VMs configured with grid_a100-4-20c, grid_a100-2-10c, and grid_a100-1-5c profiles
Three VMs, each configured with grid_a100-2-10c, and a fourth VM configured with grid_a100-1-5c profiles
Seven VMs each configured with grid_a100-1-5c profiles
The following figure shows the results of the eight scenarios:
The results show that the ResNet inference job is unable to fully use the grid_a100-7-40c and grid_a100-4-20c GPU. Therefore, the observed images/second are not high when compared to other profiles.
These results show the benefit of using partitions with a GPU. Machine learning jobs such as inference do not consume all the GPU resources. This consumption is clearly seen in scenarios 1 to 3, which yields low images per second when compared to scenario 8. Scenarios 1 to 3 do not fully use the GPUs because inference workloads do not consume all the GPU resources. Partitioning the GPU and assigning the partitions to VMs enables administrators to run multiple and mixed workloads, greatly increasing the use of GPUs. In our study, scenario 8 provides the highest combined throughput for the inference workload.
The MIG partitions provide predictable performance, regardless of how the other partitions are used. For example, VMs configured with grid_a100-2-10c show similar results whether the other profiles are idle or used.