Download and install the latest Nvidia driver package to each GPU equipped cluster node. For free CUDA drivers (see NVIDIA Driver Downloads for the latest versions)
Create the cluster from a machine running HCI operating system 21H2 at or greater than the nodes. It is recommended to create the cluster from the node.
Download and install the latest INF package from NVIDIA GPU Passthrough Support. Place the correct INF for the GPU on the system. Ensure that all the cluster nodes install it. To install, run the PNPUTIIL command. For example:
Note: This step is applicable only if you are still running Azure Stack HCI 21H2. The INF file is no longer required on versions 22H2 or later.
Dell Technologies supports GPU integration through Discrete Device Assignment (DDA) on Azure Stack HCI. DDA allows the GPU to be assigned to a virtual machine (VM) directly as a hardware component.
Note: This technology does not support GPU partitioning.
DDA is accomplished either through the Windows Admin Center (WAC) or PowerShell command line on the node.
Create one VM per physical GPU on the cluster. Install a GPU supported operating system (see supported operating system list on Nvidia website), complete installation, and power off the VMs.
Example with 16 GB A2s (1 per server) -> 2 x 16 x 1 = 32 GB ~ 32768 MB
Example with 24 GB A30s (2 per server) -> 2x24x2=96 GB ~ 98304 MB
Locate the correct PCI location for the card.
Disable the GPU.
Dismount the GPU.
This removes the PCI device from the host server.
Assign PCI resource (GPU) to VM.
Warning: Multiple cards mean multiple device locations. Be careful not mixing them.
Set the VM configuration (MMIO limits).
Log in to the node through RDP or iDRAC.
Run the following script one line at a time.
#Configure the VM for a Discrete Device Assignment $vm = "ddatest1" #Set automatic stop action to TurnOff Set-VM -Name $vm -AutomaticStopAction TurnOff #Enable Write-Combining on the CPU Set-VM -GuestControlledCacheTypes $true -VMName $vm #Configure 32 bit MMIO space Set-VM -LowMemoryMappedIoSpace 3Gb -VMName $vm #Configure Greater than 32 bit MMIO space Set-VM -HighMemoryMappedIoSpace 33280Mb -VMName $vm
#Find the Location Path and disable the Device #Enumerate all PNP Devices on the system $pnpdevs = Get-PnpDevice -presentOnly #Select only those devices that are Display devices manufactured by NVIDIA $gpudevs = $pnpdevs |where-object {$_.Class -like "Display" -and $_.Manufacturer -like "NVIDIA"} #Select the location path of the first device that's available to be dismounted by the host. $locationPath = ($gpudevs | Get-PnpDeviceProperty DEVPKEY_Device_LocationPaths).data[0] #Disable the PNP Device Disable-PnpDevice -InstanceId $gpudevs[0].InstanceId
#Dismount the Device from the Host Dismount-VMHostAssignableDevice -force -LocationPath $locationPath
#Assign the device to the guest VM. Add-VMAssignableDevice -LocationPath $locationPath -VMName $vm
It is critical that you iterate the device number when you install the second GPU.
Mapping a GPU though WAC (recommended).
Go to the main settings on the upper right corner and click Extensions.
The Extensions page is displayed.
Select GPUs and click install.
This installs the GPU controls at a cluster level.
Go to the cluster and perform the following:
Ensure that the VM is turned off.
Under Extensions, click the GPUs tab.
Under the GPUs tab, all the GPUs are listed per node.
Note: The Nvidia driver package on installed to each server does not include the '3D Video Controller' INF driver. After the GPU is dismounted from the host, the Nvidia driver package that is installed on the VM will properly recognize the full Nvidia device ID.
Click the GPU Pools tab to create the GPU pools. Enter the following details:
Servers – Enter the server details.
GPU pool name – Enter the name of the GPU pool.
Select GPUs - Only one GPU per node per pool is allowed.
Select Assign without mitigation driver (not recommended) - This is because the mitigation driver is not available in the current release.
Select the Assign VM to GPU Pool to assign the GPU to the VM.
Select the server, pool and virtual machine.
Click advance and enter input the memory requirements.
Low memory mapped I/O space (in MB)
High memory mapped I/O space (in MB) – Adjust the maximum memory mapped I/O space to match your particular GPU. The formula is 2 x GPU RAM per attached GPU. So, a VM with two 16 GB A2s will need ((2x16GB) x 2) = 64 GB memory.
There must be a free GPU on the destination node to migrate a GPU VM otherwise the VM stops.
This only works with the February ’22 Windows Update – earlier versions are not supported.
Note: Microsoft February 8, 2022 Security update (KB5010354) or later is required for proper VM migration functionality. For more information, see February 8, 2022 Security update (KB5010354)
Perform the following steps to migrate a VM between nodes when there is a per-node-pool:
Turn off the VM.
Remove the GPU pool assignment inside WAC.
Quick/Live Migrate VM to the new node.
Assign the GPU pool on the destination node to the VM (if none exists, create one).
Power on the VM.
Log in to ensure that the VM is created successfully.
Perform the following steps to migrate a VM if the GPUs per node are all combined into one pool:
Turn off the VM.
Remove the GPU pool assignment inside WAC.
Quick/Live Migration is possible, and the GPU assignment must change (automatic assignment of GPU on the correct node).
This triggers a driver failure in the VM.
Unassign the VM, and then reassign the VM to the correct GPU in the pool.
Power on the VM.
Log in to ensure that the VM is created successfully.
Migrating the VMs with attached GPU is not currently supported – only failover.
Linux VMs
Linux VMs behave in the same way both through WAC and DDA.
Run the command “lspci” to reveal the pci device that is attached to the VM.