Tanzu Kubernetes Grid 1.4 or later supports GPU instances allowing data scientists to run AI workloads that require accelerated hardware on vSphere clusters. Virtual GPUs, either partitioned or aggregated, can be made available to VMs and containers. For containerized workloads, Kubernetes worker nodes are created with virtual GPU resources on vSphere. Now, AI workloads can be deployed as pods on deployment services running on Tanzu Kubernetes clusters. Administrators can create and manage the life cycle of GPU-enabled clusters in Tanzu Kubernetes Grid on PowerEdge servers.
In this section, we briefly cover some of the relevant concepts of VMware Tanzu. For more detailed information, see the VMware Tanzu documentation.
The following figure below shows these concepts:
Figure 2. Tanzu concepts
The relevant Tanzu concepts include:
You can add multiple VM classes to a single namespace. Different VM classes serve as indicators of different levels of service. If you publish multiple VM classes, DevOps users can select between all custom and default classes when creating and managing virtual machines in the namespace.
A vSphere administrator can add a VM class to one or more namespaces on a Supervisor Cluster. When you add a VM class to a namespace, you make the class available to DevOps users so that they can start self-servicing VMs in the Kubernetes namespace environment.
Worker nodes in Tanzu Kubernetes clusters run as VMs in vSphere. The worker nodes that make up Tanzu Kubernetes clusters also use the VM classes that you assign to the namespace. vSphere with Tanzu offers several default VM classes. However, administrators who want to provision virtual GPUs must create custom VM classes. Guidelines for creating VM classes include:
When administrators create a VM Class and a namespace, they can provision a Tanzu Kubernetes cluster. Administrators can specify the number of worker nodes on the Kubernetes cluster. vSphere automatically deploys and configures the worker nodes.
When the worker nodes are deployed, administrators deploy NVIDIA GPU Operator. NVIDIA Network Operator is optional, but is required for GPUDirect RDMA. NVIDIA GPU Operator uses the operator framework within Kubernetes to automate the deployment and management of all NVIDIA software components needed to provision the GPU. These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plug-in for GPUs, the NVIDIA Container Runtime, automatic node labeling, DCGM-based monitoring, and others. Network Operator deploys and manages networking-related components, to enable fast networking, RDMA, and GPUDirect for workloads in a Kubernetes cluster. GPUDirect RDMA is outside the scope of this guide.
Scaling up the Kubernetes cluster can be easily accomplished through the command-line interface. Administrators can request addition of the new worker nodes. vSphere automatically provisions these worker nodes. Meanwhile, NVIDIA operators automatically configure the worker nodes and make them available for AI applications.
AI workloads requiring GPU resources can be provisioned on the Tanzu Kubernetes Cluster using the standard Kubernetes API or through the Helm package manager. Applications request GPU resources as part of their specification attributes, and Tanzu provisions the application if the resource requirements are met.
The following example, available in the NGC catalog, shows a GPU application that adds two vectors:
cat << EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
name: vectorAdd
spec:
restartPolicy: OnFailure
containers:
- name: vectorAdd
image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda10.2
resources:
limits:
nvidia.com/gpu: 1
EOF
In the preceding example, the application requests one GPU resource (nvidia.com/gpu: 1) and provisions it accordingly. The application cannot specify the GPU profile (MIG profile). The type of GPU profile is abstracted from the worker nodes and applications.
When a single namespace has multiple VM Classes each with different GPU resources associated with it, the worker nodes and the applications cannot differentiate between the GPU profiles. We recommend node labels to identify and differentiate worker nodes with different GPU resources. Application pods can then use these node labels, so that they can be deployed on the worker node with the requested GPU profile.