Home > AI Solutions > Gen AI > Guides > Design Guide—Generative AI in the Enterprise – Model Customization > System configurations for validation
The following tables list the system configurations and software stack used for the validation efforts in this design, one with InfiniBand High Data Rate (HDR) and the other with Next Data Rate (NDR):
Component | Configuration 1 | Configuration 2 | Configuration 3 | Configuration 4 |
Compute server for model customization | 2 x PowerEdge XE9680 servers | 2 x PowerEdge XE9680 servers | 2 x PowerEdge XE8640 servers | 4 x PowerEdge R760xa servers |
GPUs per server | 8 x NVIDIA H100 SXM GPUs | 8 x NVIDIA H100 SXM GPUs | 4 x NVIDIA H100 SXM GPUs | 4 x NVIDIA L40S PCIe GPUs |
Ethernet Network adapters | 2 x NVIDIA ConnectX-6 DX Dual Port 100 GbE | 2 x NVIDIA ConnectX-6 DX Dual Port 100 GbE | 1x NVIDIA ConnectX-6 DX Dual Port 100 GbE, OCP NIC 3.0 | 1 x NVIDIA ConnectX-6 DX Dual Port 100 GbE |
Ethernet Network switch | 2 x PowerSwitch S5232F-ON | 2 x PowerSwitch S5232F-ON | 2 x PowerSwitch S5232F-ON | 2 x PowerSwitch S5232F-ON |
InfiniBand Network adapter | 8 x NVIDIA ConnectX-6 Single Port HDR200 VPI InfiniBand Adapter PCIe (for HDR)
| 8 x NVIDIA ConnectX-7 Single Port NDR OSFP PCIe, No Crypto, Full Height | 4 x NVIDIA ConnectX-6 Single Port HDR200 VPI InfiniBand Adapter PCIe (for HDR)
| 2 x NVIDIA ConnectX-7 Single Port NDR OSFP PCIe, No Crypto, Full Height (for NDR) |
InfiniBand Network switch | QM8790 (for HDR)
| QM9790 (for NDR) | QM8790 (for HDR)
| QM9790 |
Component | Details |
Operating system | Ubuntu 22.04.1 LTS |
Cluster management | NVIDIA Base Command Manager Essentials 10.23.12 |
Slurm cluster | Slurm 23.02.4 |
Kubernetes Cluster | Version 1.27.6 |
AI framework | NVIDIA NeMo Framework v23.11 |