These results show the benefit of using partitions with a GPU. Machine learning jobs such as inference do not consume all the GPU resources. This consumption is clearly seen in scenarios 1 to 3, which yield low images per second when compared to scenario 8. Scenarios 1 to 3 do not fully use the GPUs because inference workloads do not consume all the GPU resources. Partitioning the GPU and assigning the partitions to VMs enables administrators to run multiple and mixed workloads, greatly increasing the use of GPUs. In our study, scenario 8 provides the highest combined throughput for the inference workload.
The MIG partitions provide predictable performance, regardless of how the other partitions are used. For example, VMs configured with grid_a100-2-10c show similar results whether the other profiles are idle or used.