Our results and observations are as follows:
Figure 10. Performance comparison between virtualized GPUs and bare-metal GPUs
For our recommendations, we explicitly assessed both performance and operational flexibility. Our performance comparisons were completed with the MIG feature enabled on the A100 GPUs that resulted in at most a -5 percent impact on the workload performance. The MIG-enabled GPU enables you to configure and reconfigure vGPU profiles on one or more VMs with no operational downtime for the host server. You can reassign and modify profiles assigned to VMs that are not running jobs while other VMs are running workloads on other partitions of a shared GPU.
Figure 11. MIG profiles performance comparison using ResNet training
The partitions performed relative to the size of the dedicated resources available for that partition. We are unable to run ResNet training on partitions grid_a100-2-10c and grid_a100-1-5c. These partitions are suited for inference only and are not recommended for neural network training.
The following table describes the eight scenarios:
Table 9. Table 4. Scenarios tested and vGPU profiles used
Scenario number | Description |
Scenario 1 | A single VM configured with grid_a100-7-40c profile |
Scenario 2 | A single VM configured with grid_a100-4-20c profile |
Scenario 3 | A single VM configured with grid_a100-3-20c profile |
Scenario 4 | A single VM configured with grid_a100-2-10c profile |
Scenario 5 | A single VM configured with grid_a100-1-5c profile |
Scenario 6 | Three VMs configured with grid_a100-4-20c, grid_a100-2-10c, and grid_a100-1-5c profiles |
Scenario 7 | Three VMs, each configured with grid_a100-2-10c, and a fourth VM configured with grid_a100-1-5c profiles |
Scenario 8 | Seven VMs each configured with grid_a100-1-5c profiles |
The following figure shows the results of the eight scenarios:
Figure 12. Tested scenario results
The results show that the ResNet inference job is unable to use the grid_a100-7-40c and grid_a100-4-20c GPU fully. The observed images/second are not high when compared to other profiles.