Design considerations for the machine learning compute servers include:
- PowerEdge server models—PowerEdge R7625 or PowerEdge R760 servers are ideal for mainstream performance. These servers are NVIDIA-Certified Systems, which means that they have been validated to provide optimal performance, scalability, and security for accelerated workloads. The R760 and R7625 servers are certified ReadyNode for vSphere 8. See the VMware Compatibility Guide for the latest information about vSAN certification for PowerEdge servers.
- Number of compute servers—We have validated up to four servers per VMware vSphere cluster. VMware supports a maximum of 64 nodes per cluster.
- Processor and memory—We recommend Intel Xeon Platinum or Gold processors for PowerEdge R760 and AMD EPYC processor for PowerEdge R7625. We recommend at least 512 GB memory for memory-intensive AI workloads.
- GPUs—The PowerEdge servers can be configured with NVIDIA A100 or A30 GPUs. The A100 GPU is recommended for deep learning and training of complex neural networks. The A30 GPU is recommended for AI inference, natural language processing, conversational AI, and recommendation systems.
- Storage for VMware ESXi—For better reliability, higher performance, and increased isolation from VM data, we recommend that you install VMware ESXi on the BOSS controller.
- Storage controller and hard drives for vSAN—A Dell HBA355i controller is used as the storage controller for vSAN. Customer workloads and VM requirements determine the hard drive requirements. We recommend 800 GB SSD SAS Mixed-Use for the vSAN cache tier and 960 GB SSD SAS Read Intensive for the vSAN capacity tier.
- Network adapters—As shown in Table 5, the 25 GbE design uses the ConnectX-6 network adapter.
- VMware vCenter Server can be deployed as a VM in your data center. vCenter is critical to the deployment, operation, and maintenance of a vSAN environment. For this reason, Dell Technologies recommends vCenter be deployed on a highly available management cluster, which exists outside of the compute cluster.