VMware vSphere 8
VMware vSphere 8 includes the following features to support AI and machine learning workloads:
- Support for the latest generation of GPUs from NVIDIA, including support for the spatial partitioning-based NVIDIA MIGs.
- Enhanced performance of device-to-device communication, building on the existing NVIDIA GPUDirect functionality by enabling Address Translation Services (ATS) and Access Control Services (ACS) at the PCIe bus layer in the ESXi kernel.
- Support for device groups for multi-GPU and multinode training. Device groups enable virtual machines to consume complementary hardware devices more easily. NVIDIA Smart NICs and GPU devices are supported in vSphere 8. Device groups are added to virtual machines by using the existing Add New PCI Device workflows. Device groups aid in automatic configuration of virtual machine and creation of worker nodes in Tanzu. vSphere DRS and vSphere HA are aware of device groups and places VMs appropriately to satisfy the device group.
- VMware vSphere licensing per CPU socket. Licensing is available for the following editions:
- vSphere Standard
- vSphere Enterprise Plus
- vSphere Essentials
- vSphere Essentials Plus
This validated design requires the vSphere Enterprise Plus editions. NVIDIA vGPU and distributed virtual switches (required for load balancing in Tanzu) require the Enterprise Plus edition.
VMware vSphere with Tanzu
vSphere with Tanzu enables administrators to transform vSphere into a platform for running Kubernetes workloads natively on the hypervisor layer. When enabled on a vSphere cluster, vSphere with Tanzu provides the capability to run Kubernetes workloads directly on ESXi hosts and to create upstream Kubernetes clusters in dedicated resource pools.
vSphere administrators can enable existing vSphere clusters for Workload Management, to create a Tanzu Kubernetes cluster in the ESXi hosts that are part of the cluster. The Tanzu Kubernetes cluster is a full distribution of the open-source Kubernetes container orchestration platform that is built, signed, and supported by VMware. Tanzu Kubernetes Grid (TKG) Service provisions and operates Tanzu Kubernetes cluster on vSphere.
Tanzu Kubernetes Grid (TKG), available with VMware vSphere 8, supports virtualizing NVIDIA GPUs through NVIDIA AI Enterprise. With TKG, virtual GPUs are automatically provisioned and configured on the Tanzu Kubernetes Cluster worker nodes and made available to AI workload containers.
VMware vSphere with Tanzu can be licensed through vSphere+ or Tanzu Kubernetes Operations. For more information, see the VMware vSphere Product Line Comparison and VMware Tanzu for Kubernetes Operations Documentation.
VMware Kubernetes ecosystem
VMware offers several products under the Tanzu portfolio to enhance the capabilities of vSphere on Tanzu. These products enable administrators to build, run, and manage the AI workload along with modern applications and continuously deliver value to customers. Depending on the Tanzu edition, these software products are bundled with VMware vSphere with Tanzu and are fully supported by VMware. Some key products that are applicable to this validated design include:
- Harbor—An open-source, trusted, cloud native container registry that stores, signs, and scans content. Harbor extends the open-source Docker distribution by adding functionalities such as security, identity control, and management.
- Tanzu Kubernetes Grid—Includes signed binaries for Harbor that you can deploy on a shared services cluster to provide container registry services for other Tanzu Kubernetes clusters.
- Prometheus—An open-source systems monitoring and alerting toolkit. Prometheus collects and stores metrics as time series data, that is, metrics information is stored with the timestamp at which it was recorded, along with optional key-value pairs. Tanzu Kubernetes Grid includes signed binaries for Prometheus that you can deploy on Tanzu Kubernetes clusters to monitor cluster health and services.
- Grafana—Open-source software that allows you to visualize and analyze metrics data collected by Prometheus on Tanzu Kubernetes clusters. Tanzu Kubernetes Grid includes a Grafana package that you can deploy on the clusters.
- VMware NSX Advanced Load Balancer—NSX Advanced Load Balancer (formerly known as Avi Networks) with Cloud Services has multicloud load balancing, web application firewall, and container ingress services. The software-defined, scale-out architecture of NSX Advanced Load Balancer provides on-demand autoscaling of elastic load balancers. The distributed software load balancers and the backend applications can scale up or down in response to real-time traffic monitoring.
NSX Advanced Load Balancer provides network access and load balancing for Tanzu Kubernetes clusters. You can use it to load balance AI use cases such as Machine Learning Operation applications or inference workloads.
- Tanzu Mission Control—A centralized hub for simplified, multicloud, multicluster Kubernetes management. Tanzu Mission Control provides centralized policy management that enables administrators to apply consistent policies, such as for access and security, to a fleet of clusters and namespaces at scale. It provides life cycle management for Kubernetes clusters enabling administrators to provision, scale, upgrade, and delete Tanzu Kubernetes Grid clusters.
The following additional software is available from VMware to manage and orchestrate container workloads. These software tools address general-purpose application development and are not validated as part of this validated design.
- VMware Tanzu Application Platform is a modular, application-aware platform that provides a rich set of developer tools and a path to production to build and deploy software quickly and securely on any compliant public cloud or on-premises Kubernetes cluster.
- Tanzu Observability enables Kubernetes monitoring with full-stack visibility of nodes, pods, and containers. It provides instant insight into Tanzu Application Service platform health across foundations and the impact of code in production.
- Tanzu Service Mesh provides advanced, end-to-end connectivity, security, and insights for modern applications—across application end-users, microservices, APIs, and data—enabling compliance with Service Level Objectives and data protection and privacy regulations.
- VMware application catalog is a customizable selection of trusted, prepackaged open-source application components that are continuously maintained and verifiably tested for use in production environments.
- Tanzu Build Service automates container creation, management, and governance at enterprise scale while boosting security and reducing risk from Common Vulnerability Exposure.
Tanzu Data Services is a portfolio of on-demand caching, messaging, and database software on VMware Tanzu for development teams building modern applications.
VMware vSAN 8
vSAN is a software-defined storage solution from VMware, built from the ground up for vSphere VMs. It abstracts and aggregates locally attached disks in a vSphere cluster to create a storage solution that you can provision and manage from vCenter and the vSphere client. vSAN is embedded in the hypervisor, therefore, storage and compute for VMs are delivered from the same x86 server platform running the hypervisor.
vSAN is the market leader in HCI infrastructure. Traditional applications such as Microsoft SQL Server and SAP HANA, and next-generation applications such as AI workloads can run on vSAN. Paradigms associated with traditional infrastructure deployment, operations, and maintenance include various disaggregated tools and often specialized skill sets. The hyperconverged approach of vSphere and vSAN simplifies these tasks using familiar tools to deploy, operate, and manage private-cloud infrastructure.
vSAN 8 Express Storage Architecture (ESA) is the latest major enhancement available for vSphere 8 clusters. vSAN 8 ESA uses a file system that is optimized to take full advantage of certified NVMe storage devices and 25 Gbps+ networking to greatly improve performance and capacity over previous versions. vSAN 7 is now referred to as Original Storage Architecture (OSA).
VMware vSAN is licensed per CPU socket. It is available in the following editions: Standard, Advanced, Enterprise, and Enterprise Plus. For this validated design, we recommend vSAN Enterprise license. vSphere Enterprise Plus, and VMware Tanzu Standard are required to use the Data Persistence platform. The Data Persistence platform is available in vSAN Enterprise and Enterprise Plus only.