VMware vSphere 7
VMware vSphere 7 includes the following features to support AI and machine learning workloads:
- Support for the latest generation of GPUs from the NVIDIA Ampere family, including A100 and A30 GPUs. The A100 GPU delivers up to 20-times improved performance from the previous generation of GPU.
- Support for the latest spatial partitioning-based NVIDIA MIGs:
- vSphere is the only virtualization platform that enables live migration (using vSphere vMotion) for NVIDIA MIG vGPU-powered VMs. Live migration enables nondisruptive operations during maintenance operations such as consolidation, expansion, or upgrades.
- With the DRS, vSphere provides automatic initial workload placement for AI infrastructure at scale for optimal resource consumption and prevention of performance bottlenecks.
- Enhanced performance of device-to-device communication, building on the existing NVIDIA GPUDirect functionality by enabling Address Translation Services (ATS) and Access Control Services (ACS) at the PCIe bus layer in the ESXi kernel.
- VMware vSphere licensing per CPU socket. Licensing is available in the following editions: vSphere Standard, Enterprise Plus, Essentials, and Essentials Plus. For this validated design, vSphere Enterprise Plus is required. NVIDIA vGPU and distributed virtual switch (required for load balancing in Tanzu) require Enterprise Plus.
VMware vSphere with Tanzu
vSphere with Tanzu enables administrators to transform vSphere into a platform for running Kubernetes workloads natively on the hypervisor layer. When enabled on a vSphere cluster, vSphere with Tanzu provides the capability to run Kubernetes workloads directly on ESXi hosts and to create upstream Kubernetes clusters in dedicated resource pools.
vSphere administrators can enable existing vSphere clusters for Workload Management, to create a Tanzu Kubernetes cluster in the ESXi hosts that are part of the cluster. The Tanzu Kubernetes cluster is a full distribution of the open-source Kubernetes container orchestration platform that is built, signed, and supported by VMware. Tanzu Kubernetes Grid (TKG) Service provisions and operates Tanzu Kubernetes cluster on vSphere.
Tanzu Kubernetes Grid 1.3, available with VMware vSphere 7.0 U3, supports virtualizing NVIDIA GPUs through NVIDIA AI Enterprise. With TKG 1.3, virtual GPUs are automatically provisioned and configured on the Tanzu Kubernetes Cluster worker nodes and made available to AI workload containers.
VMware Kubernetes ecosystem
VMware offers several products under the Tanzu portfolio to enhance the capabilities of vSphere on Tanzu. These products enable administrators to build, run, and manage the AI workload along with modern applications and continuously deliver value to customers. Depending on the Tanzu edition, these software products are bundled with VMware vSphere with Tanzu and are fully supported by VMware. Some key products that are applicable to this validated design include:
- Harbor—An open-source, trusted, cloud native container registry that stores, signs, and scans content. Harbor extends the open-source Docker distribution by adding functionalities such as security, identity control, and management.
- Tanzu Kubernetes Grid—Includes signed binaries for Harbor that you can deploy on a shared services cluster to provide container registry services for other Tanzu Kubernetes clusters.
- Prometheus—An open-source systems monitoring and alerting toolkit. Prometheus collects and stores metrics as time series data, that is, metrics information is stored with the timestamp at which it was recorded, along with optional key-value pairs. Tanzu Kubernetes Grid includes signed binaries for Prometheus that you can deploy on Tanzu Kubernetes clusters to monitor cluster health and services.
- Grafana—Open-source software that allows you to visualize and analyze metrics data collected by Prometheus on Tanzu Kubernetes clusters. Tanzu Kubernetes Grid includes a Grafana package that you can deploy on the clusters.
- VMware NSX Advanced Load Balancer—NSX Advanced Load Balancer (formerly known as Avi Networks) with Cloud Services has multicloud load balancing, web application firewall, and container ingress services. The software-defined, scale-out architecture of NSX Advanced Load Balancer provides on-demand autoscaling of elastic load balancers. The distributed software load balancers and the backend applications can scale up or down in response to real-time traffic monitoring.
NSX Advanced Load Balancer provides network access and load balancing for Tanzu Kubernetes clusters. You can use it to load balance AI use cases such as Machine Learning Operation applications or inference workloads.
- Tanzu Mission Control—A centralized hub for simplified, multicloud, multicluster Kubernetes management. Tanzu Mission Control provides centralized policy management that enables administrators to apply consistent policies, such as for access and security, to a fleet of clusters and namespaces at scale. It provides life cycle management for Kubernetes clusters enabling administrators to provision, scale, upgrade, and delete Tanzu Kubernetes Grid clusters.
The following additional software is available from VMware to manage and orchestrate container workloads. These software tools address general-purpose application development and are not validated as part of this validated design.
- VMware Tanzu Application Platform is a modular, application-aware platform that provides a rich set of developer tools and a path to production to build and deploy software quickly and securely on any compliant public cloud or on-premises Kubernetes cluster.
- Tanzu Observability enables Kubernetes monitoring with full-stack visibility of nodes, pods, and containers. It provides instant insight into Tanzu Application Service platform health across foundations and the impact of code in production.
- Tanzu Service Mesh provides advanced, end-to-end connectivity, security, and insights for modern applications—across application end-users, microservices, APIs, and data—enabling compliance with Service Level Objectives and data protection and privacy regulations.
- VMware application catalog is a customizable selection of trusted, prepackaged open-source application components that are continuously maintained and verifiably tested for use in production environments.
- Tanzu Build Service automates container creation, management, and governance at enterprise scale while boosting security and reducing risk from Common Vulnerability Exposure.
- Tanzu Data Services is a portfolio of on-demand caching, messaging, and database software on VMware Tanzu for development teams building modern applications.
Licensing
VMware Tanzu is available in three editions that enable administrators to run Kubernetes workloads across different cloud providers, or to create an enterprise-grade environment for application deployment. The three editions—Basic, Standard, and Advanced—are described in the following table:
Table 1. VMware Tanzu editions
Kubernetes Runtime | VMware Tanzu Kubernetes Grid Service | VMware Tanzu Kubernetes Grid Service | VMware Tanzu Kubernetes Grid Service |
Observability | Fluent Bit | - Fluent Bit
- Prometheus
- Grafana
| - Fluent Bit
- Prometheus
- Grafana
- VMware Tanzu Observability by Wavefront
|
Networking | - VMware NSX Advanced Load Balancer Essentials
- Container Networking with VMware Antrea CNI or Calico
| - VMware NSX Advanced Load Balancer Essentials
- Container Networking with Antrea CNI or Calico
| - VMware NSX Advanced Load Balancer Enterprise
- Container Networking with Antrea CNI or Calico
- VMware Tanzu Service Mesh
|
Image Management | - Harbor
- vSphere Registry Service with NSX-T Data Center
| - Harbor
- vSphere Registry Service with NSX-T Data Center
| - Harbor
- vSphere Registry Service with NSX-T Data Center
|
Management with Tanzu Mission Control | None | VMware Tanzu Mission Control Standard | VMware Tanzu Mission Control Advanced |
Container Build | None | None | VMware Tanzu Build Service |
Developer Framework | None | None | - VMware Tanzu Application Platform
- VMware Spring Runtime
|
Data Services | None | None | VMware Tanzy SQL |
VMware Tanzu Community Edition is a full-featured, easy-to-manage Kubernetes platform for learners and users. It is a freely available, community supported, open-source distribution of VMware Tanzu.
VMware vSAN 7
vSAN is a software-defined storage solution from VMware, built from the ground up for vSphere VMs. It abstracts and aggregates locally attached disks in a vSphere cluster to create a storage solution that you can provision and manage from vCenter and the vSphere client. vSAN is embedded in the hypervisor, therefore, storage and compute for VMs are delivered from the same x86 server platform running the hypervisor.
vSAN is the market leader in HCI infrastructure. Traditional applications such as Microsoft SQL Server and SAP HANA, and next-generation applications such as AI workloads can run on vSAN. Paradigms associated with traditional infrastructure deployment, operations, and maintenance include various disaggregated tools and often specialized skill sets. The hyperconverged approach of vSphere and vSAN simplifies these tasks using familiar tools to deploy, operate, and manage private-cloud infrastructure.
VMware vSAN is licensed per CPU socket. It is available in the following editions: Standard, Advanced, Enterprise, and Enterprise Plus. For this validated design, we recommend vSAN Enterprise license. vSphere Enterprise Plus, and VMware Tanzu Standard are required to use the Data Persistence platform. The Data Persistence platform is available in vSAN Enterprise and Enterprise Plus only.