Tanzu Kubernetes Grid 1.4 or later supports GPU instances allowing data scientists to run AI workloads that require accelerated hardware on vSphere clusters. Virtual GPUs, either partitioned or aggregated, can be made available to VMs and containers. For containerized workloads, Kubernetes worker nodes are created with virtual GPU resources on vSphere. Now, AI workloads can be deployed as pods on deployment services running on Tanzu Kubernetes clusters. Administrators can create and manage the life cycle of GPU-enabled clusters in Tanzu Kubernetes Grid on PowerEdge servers.
The following figure below shows these concepts:
The relevant Tanzu concepts include:
Note: GPU resource limits cannot be assigned to a namespace
You can add multiple VM classes to a single namespace. Different VM classes serve as indicators of different levels of service. If you publish multiple VM classes, DevOps users can select between all custom and default classes when creating and managing virtual machines in the namespace.
A vSphere administrator can add a VM class to one or more namespaces on a Supervisor Cluster. When you add a VM class to a namespace, you make the class available to DevOps users so that they can start self-servicing VMs in the Kubernetes namespace environment.
Worker nodes in Tanzu Kubernetes clusters run as VMs in vSphere. The worker nodes that make up Tanzu Kubernetes clusters also use the VM classes that you assign to the namespace. vSphere with Tanzu offers several default VM classes. However, administrators who want to provision virtual GPUs must create custom VM classes.
One or more NVIDIA GPUs are added as PCI devices to the VM Class and are configured with options for GPU sharing, either using MIG or temporal partitioning. For multinode training using GPUDirect RDMA, NVIDIA ConnectX-6 Network adapters can be added to the VM Class.
We recommend only one VM Class with associated GPU resources per Tanzu Kubernetes cluster.
When administrators create a VM Class and a namespace, they can provision a Tanzu Kubernetes cluster. Administrators can specify the number of worker nodes on the Kubernetes cluster. vSphere automatically deploys and configures the worker nodes.
When the worker nodes are deployed, administrators deploy NVIDIA GPU Operator. NVIDIA Network Operator is optional, but is required for GPUDirecet RDMA. NVIDIA GPU Operator uses the operator framework within Kubernetes to automate the deployment and management of all NVIDIA software components needed to provision the GPU. These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plug-in for GPUs, the NVIDIA Container Runtime, automatic node labeling, DCGM-based monitoring, and others. Network Operator deploys and manages networking-related components, to enable fast networking, RDMA, and GPUDirect for workloads in a Kubernetes cluster.
Scaling up the Kubernetes cluster can be easily accomplished through the command-line interface. Administrators can request addition of the new worker nodes. vSphere automatically provisions these worker nodes. Meanwhile, NVIDIA operators automatically configure the worker nodes and make them available for AI applications.
AI workloads requiring GPU resources can be provisioned on the Tanzu Kubernetes Cluster using the standard Kubernetes API or through the Helm package manager. Applications request GPU resources as part of their specification attributes, and Tanzu provisions the application if the resource requirements are met.
The following example shows a GPU application that adds two vectors:
- name: cuda-vectoradd
In the preceding example, the application requests one GPU resource (nvidia.com/gpu: 1) and provisions it accordingly. The application cannot specify the GPU profile (MIG profile). The type of GPU profile is abstracted from the worker nodes and applications.
We recommend only one VM Class with associated GPU resources per Tanzu Kubernetes cluster. When a single namespace has multiple VM Classes each with different GPU resources associated with it, the worker nodes and the applications cannot differentiate between the GPU profiles.
With this validated design for AI, data scientists and administrators can run AI workloads such as neural network training, inference, or model development with their standard data center applications. Resource requirements, specifically GPU resource requirements, vary between these applications. For each of these heterogenous workloads, we recommend provisioning a different Tanzu Kubernetes cluster with each appropriate GPU resource configured for its VM Class.
The following figure shows an example deployment for training, model development, and inference scenarios:
Each of the workloads requires different GPU requirements:
To support this scenario, the administrator must create three VM Classes:
These VM Classes are assigned to three namespaces and Tanzu Kubernetes clusters are provisioned. The three workloads can now be deployed on the corresponding Tanzu Kubernetes clusters.
GPU aggregation on a single system requires that the GPUs must be connected through NVLink. While you can assign more than one vGPU when creating a VM class, vSphere does not enforce that these multiple vGPUs are connected through NVLink. Therefore, GPU aggregation is not supported with VMware Tanzu.
Similarly, multinode training with GPUDirect RDMA requires that both the GPU and ConnectX adapter reside on the same NUMA node. While you can assign a vGPU and ConnectX adapter when creating a VM class, vSphere does not enforce that they are connected to the same NUMA node. Therefore, multinode training is not supported with VMware Tanzu.