Home > AI Solutions > Gen AI > Guides > Design Guide—Generative AI in the Enterprise – Model Customization > Dell PowerEdge servers and NVIDIA GPUs
Dell Technologies provides a diverse selection of acceleration-optimized servers with an extensive portfolio of accelerators featuring NVIDIA GPUs. In this design, we showcase three Dell PowerEdge servers with several GPU options tailored for generative AI purposes:
In this section, we describe the configuration and connectivity options for NVIDIA GPUs, and how these server-GPU combinations can be applied to various LLM use cases.
NVIDIA GPUs support various options to connect two or more GPUs, offering various bandwidths. GPU connectivity is often required for model customization, especially when higher performance and lower latency are crucial. Model customization is compute-resource intensive, requiring multiple GPUs. These GPUs require high-speed connectivity between them.
NVIDIA NVLink is a high-speed interconnect technology developed by NVIDIA for connecting multiple NVIDIA GPUs to work in parallel. It allows for direct communication between the GPUs with high bandwidth and low latency, enabling them to share data and work collaboratively on compute-intensive tasks.
NVIDIA NVSwitch is a high-performance and scalable switch technology that connects multiple NVLinks to provide all-to-all GPU communication at full NVLink speed within a single node and between nodes. Both are designed to facilitate high-bandwidth and low-latency data transfers, ideal for large-scale AI applications. The NVSwitch technology provides a bandwidth of 900 GB/s between any two GPUs.
The following figure illustrates the NVIDIA GPU connectivity options for the PowerEdge servers used in this design:
Figure 3. NVIDIA GPU connectivity in PowerEdge servers
The connectivity options include:
The PowerEdge R760xa server supports four NVIDIA H100 GPUs; NVLink bridge can connect each pair of GPUs. The NVIDIA H100 GPU supports an NVLink bridge connection with a single adjacent NVIDIA H100 GPU. Each of the three attached bridges spans two PCIe slots for a total maximum NVLink Bridge bandwidth of 600 Gbytes per second.
The NVIDIA SXM form factor enables multiple GPUs to be tightly interconnected in a server, providing high-bandwidth and low-latency communication between the GPUs. NVIDIA's NVLink technology, which allows for faster data transfers compared to traditional PCIe connections, facilitates this direct GPU-to-GPU communication. The NVLink technology provides a bandwidth of 900 GB/s between any two GPUs.
The PowerEdge R70xa server with NVIDIA L40S does not support NVLink. The communication between GPUs is through the PCIe bus.