With VMware vSphere support for virtualized GPUs, IT administrators can run AI workloads such as neural network training, inference, or model development alongside their standard data center applications.
The following figure shows the high-level architecture for this validated design with PowerEdge R760 and R7625 servers, each with two NVIDIA A100 GPUs and a ConnectX network adapter, as part of a VMware vSphere cluster:
Figure 1. High-level architecture showing two scenarios for running AI workloads: virtual machines (VMs) and pods
Customers can run AI workloads on VMware vSphere and access virtualized NVIDIA GPUs using two scenarios:
The following figure shows the high-level steps for deploying the solution:
Figure 2. Steps in deploying AI Workloads as VMs
The following figure shows the high-level steps for deploying the solution:
Figure 3. Steps in deploying AI workloads as Kubernetes pods
Also, vSphere with Tanzu allows running VMs in Tanzu namespaces. Implementing this scenario requires combining steps from the preceding two methods and is outside the scope of this implementation guide.