The following NVIDIA GPUs are among the NVIDIA acceleration components used in this generative AI solution architecture.
The NVIDIA H100 Tensor Core GPU delivers unprecedented performance, scalability, and security for every workload. With NVIDIA fourth-generation NVLInk Switch System, the NVIDIA H100 GPU accelerates AI workloads with a dedicated Transformer Engine for trillion parameter language models. The NVIDIA H100 GPU uses breakthrough innovations in the NVIDIA Hopper architecture to deliver industry-leading conversational AI, speeding up large language models by 30 times over the previous generation.
For small jobs, the NVIDIA H100 GPU can be partitioned to right-sized Multi-Instance GPU (MIG) partitions. With Hopper Confidential Computing, this scalable compute power can secure sensitive applications on shared data center infrastructure. The inclusion of the NVIDIA AI Enterprise software suite reduces time to development and simplifies deployment of AI workloads and makes NVIDIA H100 GPU the most powerful end-to-end AI and HPC data center platform.
The NVIDIA L40 GPU accelerator is a full height, full-length (FHFL), dual-slot 10.5-inch PCI Express Gen4 graphics solution based on the latest NVIDIA Ada Lovelace Architecture. The card is passively cooled and capable of 300 W maximum board power.
The NVIDIA L40 GPU supports the latest hardware-accelerated ray tracing, revolutionary AI features, advanced shading, and powerful simulation capabilities for a wide range of graphics and compute use cases in data center and edge server deployments. This support includes NVIDIA Omniverse, cloud gaming, batch rendering, virtual workstations, and deep learning training as well as inference workloads.
As part of the NVIDIA OVX server platform, the NVIDIA L40 GPU delivers the highest level of graphics, ray tracing, and simulation performance for NVIDIA Omniverse. With 48 GB of GDDR6 memory, even the most intense graphics applications run with the highest level of performance.
The NVIDIA Ada Lovelace L4 Tensor Core GPU delivers universal acceleration and energy efficiency for video, AI, virtualized desktop and graphics applications in the enterprise, in the cloud, and at the edge. With NVIDIA’s AI platform and full-stack approach, the NVIDIA L4 GPU is optimized for inference at scale for a broad range of AI applications, including recommendations, voice-based AI avatar assistants, generative AI, visual search, and contact center automation to deliver the best personalized experiences.
The NVIDIA L4 GPU is the most efficient NVIDIA accelerator for mainstream use. Servers equipped with the NVIDIA L4 GPU power up to 120 times higher AI video performance and 2.5 times more generative AI performance over CPU solutions, as well as over four times more graphics performance than the previous GPU generation. The NVIDIA L4 GPU’s versatility and energy-efficient, single-slot, low-profile form factor make it ideal for global deployments, including edge locations.
NVIDIA NVLink is a fast, scalable interconnect that enables multinode, multi-GPU systems with seamless, high-speed communication between every GPU. The fourth generation of NVIDIA NVLink technology provides 1.5 times higher bandwidth and improved scalability for multi-GPU system configurations. A single NVIDIA H100 Tensor Core GPU supports up to 18 NVLink connections for a total bandwidth of 900 gigabytes per second (GB/s), over seven times the bandwidth of PCIe Gen5.
For even greater scalability, NVIDIA NVSwitch builds on the advanced communication capability of NVIDIA NVLink to deliver higher bandwidth and reduced latency for compute-intensive workloads. To enable high-speed, collective operations, each NVIDIA NVSwitch has 64 NVIDIA NVLink ports equipped with engines for NVIDIA Scalable Hierarchical Aggregation Reduction Protocol (SHARP) for in-network reductions and multicast acceleration.