The following table provides the recommended configurations for customer usage scenarios:
Table 5. Recommended configurations
Configuration | Mainstream performance | Mainstream performance with HCI appliance | High performance |
Compute server | PowerEdge R7525 or PowerEdge R750 | VxRail 670F | PowerEdge R750xa |
GPUs | 1 x NVIDIA A30 | 2 x NVIDIA A30 | 4 x NVIDIA A100 |
Number of nodes in a cluster | Minimum 3 hosts. 4 ESXi hosts are recommended when using vSAN for resiliency during patching and upgrading. | ||
Network adapter | ConnectX-5 Dx 25 GbE or ConnectX-6 Lx 25 GbE | ConnectX-5 Dx 25 GbE or ConnectX-6 Lx 25 GbE | ConnectX-5 Dx 25 GbE or ConnectX-6 Lx 25 GbE |
Network switch | 2 x Dell S5248F-ON | 2 x Dell S5248F-ON | 2 x Dell S5248F-ON For multinode training: 2 x Dell S5232F-ON or 2 x Mellanox SN3700 (data) |
Out-of-band (OOB) switch | 1 x Dell PowerSwitch N3248TE-ON or 1 x Dell PowerSwitch S4148T-ON | ||
VMware vSphere |
| ||
Internal storage for ESXi | BOSS controller card with 2 M.2 Sticks 480 GB (RAID 1) | ||
Storage controller or hard drive configuration for vSAN |
| ||
Storage |
| ||
NVIDIA software | NVIDIA AI Enterprise 1.1 |
We recommend two configurations for mainstream performance, one with PowerEdge servers and the other with an HCI appliance. Mainstream performance includes model development and inference. It includes use cases like conversational AI, recommendation systems, and language processing. A high-performance configuration includes the PowerEdge R750xa server, which is designed for accelerators such as NVIDIA A100 GPUs. High-performance use cases include neural network training of large and complex models. High-performance configurations also include network architecture for multinode training if customers want to harness GPU resources across multiple servers.