Some of the key features of this validated design include:
- GPU virtualization and allocation—VMware vSphere 7 and later supports virtualization for NVIDIA Ampere GPUs. The virtualized GPUs can be assigned to virtual machines (VMs) and containers through Single-Root Input/Output Virtualization (SR-IOV). Also, vSphere supports:
- Partitioning of GPUs using NVIDIA Multi-Instance GPU (MIG) technology, which increases GPU use. MIG-partitioned virtual GPU (vGPU) instances are fully isolated with an exclusive allocation of high-bandwidth memory, cache, and compute. A common use case is for administrators to partition available GPUs into multiple units for allocation to individual data scientists through VMs or containers. Each data scientist can be assured of predictable performance due to the isolation and Quality of Service guarantees of the vGPU partitioning technology.
- GPU aggregation allowing multiple virtual GPUs to be assigned to VMs and containers to allow deep learning jobs that are compute intensive. GPUDirect RDMA from NVIDIA provides more efficient data exchange between GPUs that perform multinode training at scale. It enables a direct peer-to-peer data path between the memory resources of two or more GPUs using ConnectX network adapter ports on the host.
- Support for GPU virtualization with Tanzu container orchestration—Virtualized GPUs can now be made available to enterprise-grade Kubernetes container orchestration through Tanzu. Administrators can provision AI workloads as Kubernetes pods or through Helm deployments, which use virtualized GPUs.
- Availability and continuous maintenance using VMware vSphere vMotion—vSphere enables live migration (using vSphere vMotion) for NVIDIA vGPU-powered VMs, simplifying infrastructure maintenance such as consolidation, expansion, or upgrades, and enabling nondisruptive operations.
With the Distributed Resource Scheduler (DRS), vSphere provides automatic initial workload placement for AI infrastructure at scale for optimal resource consumption and to avoid performance bottlenecks.
- Support for VM suspend and resume operations with virtual GPUs multinode training—GPUDirect RDMA from NVIDIA enables a direct peer-to-peer data path between the GPU memory and ConnectX network adapters. This path provides a significant decrease in GPU-to-GPU communication latency and completely offloads the CPU, removing it from all GPU-to-GPU communications across the network. GPUDirect RDMA from NVIDIA enables near bare-metal performance on multinode training.