Microsoft offers two options for using graphics processing units (GPUs) with clustered virtual machines (VMs) running on the Azure Stack HCI operating system to provide GPU acceleration to workloads:
- Discrete Device Assignment (DDA), also referred to as GPU pass-through
- GPU Partitioning.
In GPU pass-through mode, an entire physical GPU is directly assigned to one VM, bypassing any host drivers. In this mode of operation, the GPU is accessed exclusively by the GPU driver running in the VM to which it is assigned. The GPU is not shared among VMs.
GPU partitioning allows you to share a physical GPU device with multiple virtual machines (VMs). With GPU partitioning or GPU virtualization, each VM gets a dedicated fraction of the GPU instead of the entire GPU.
The NVIDIA A16 GPUs installed in our test cluster support DDA and GPU Partitioning configurations. Neither option is currently supported with Live migration. VMs configured with GPUs must be placed where host GPU resources are available and restarted during cluster maintenance or in the event of a failure.
We chose to configure the allocation of GPUs to VMs using GPU partitioning. We used NVIDIA vGPU software, a graphics virtualization platform that provides virtual machines (VMs) access to NVIDIA GPU technology. The releases of the NVIDIA vGPU Manager and guest VM drivers that you install must be compatible. Installing an incompatible guest VM driver release for the release of the vGPU Manager will result in the NVIDIA vGPU's failure to load.
The following NVIDIA vGPU Manager and guest VM driver release combinations are compatible.
- NVIDIA vGPU Manager with guest VM drivers from the same release
- NVIDIA vGPU Manager with guest VM drivers from different releases within the same major release branch
- NVIDIA vGPU Manager from a later major release branch with guest VM drivers from the previous branch
You must also use a compatible version of the NVIDIA License System with the chosen release of NVIDIA vGPU software.