Selecting the appropriate server configuration for generative AI inference is crucial to ensure adequate resources are allocated for both management and inference tasks. The following tables provide example configurations for both management and compute workloads:
Table 3. PowerEdge R660 head node and Kubernetes control plane configuration
Component | Head node | Kubernetes Control plane |
Server Model | 1 x PowerEdge R660 | 1 x PowerEdge R660 |
CPU | 1x Intel Xeon Gold 5416S 2G, 16C/32T | 1x Intel Xeon Gold 5416S 2G, 16C/32T |
Memory | 8x 16 GB DDR5 4800 MT/s RDIMM | 8x 16 GB DDR5 4800 MT/s RDIMM |
RAID Controller | PERC H755 with rear load Brackets | PERC H755 with rear load Brackets |
Storage | 4 x 960 GB SSD SATA Read Intensive 6 Gbps 512 2.5in Hot-plug AG Drive, 1 DWPD (RAID 10) | 2 x 960 GB SSD SATA Read Intensive 6 Gbps 512 2.5in Hot-plug AG Drive, 1 DWPD (RAID 10) |
PXE Network | Broadcom 5720 Dual Port 1 GbE Optional LOM | Broadcom 5720 Dual Port 1 GbE Optional LOM |
PXE/K8S Network | NVIDIA ConnectX-6 Lx Dual Port 10/25GbE SFP28, OCP NIC 3.0 | NVIDIA ConnectX-6 Lx Dual Port 10/25GbE SFP28, OCP NIC 3.0 |
K8S/Storage Network | 1 x NVIDIA ConnectX-6 Lx Dual Port 10/25GbE SFP28 Adapter, PCIe (optional) | 1 x NVIDIA ConnectX-6 Lx Dual Port 10/25GbE SFP28 Adapter, PCIe (optional) |
Because both the head node and control plane node do not require heavy computing, a single-processor server is sufficient. For the head node, we recommend opting for a storage-rich configuration, as this configuration will facilitate convenient storage of images and other essential tools.
Table 4. PowerEdge R760xa GPU worker node
Component | Details |
Head Node | R760xa 2.5" Chassis with up to 8 SAS/SATA Drives, Front PERC 11 |
CPU | 2 x Intel Xeon Gold 6430 2.1G, 32C/64T |
Memory | 16 x 32 GB DDR5 4800 MT/s RDIMM |
RAID Controller | PERC H755 with rear load Brackets |
Storage | 2 x 960 GB SSD SATA Read Intensive 6 Gbps 512 2.5in Hot-plug AG Drive, 1 DWPD (RAID 1) |
PXE Network | Broadcom 5720 Dual Port 1 GbE Optional LOM |
K8S/Storage Network | 1 x Mellanox ConnectX-6 DX Dual Port 100 GbE QSFP56 Network Adapter (optional) |
GPU | 2 x or 4 x NVIDIA L40, 48 GB PCIe GPU or 2 x or 4 x NVIDIA Hopper H100, 80 GB, PCIe GPU with Bridge Boards |
The following figure shows the network architecture. It shows the network connectivity for one compute server. The compute cluster can consist of multiple PowerEdge servers and all the servers will have similar network connectivity. The figure also shows one PowerEdge head node, which incorporates NVIDIA’s cluster manager. The network connectivity for the Kubernetes control plane that can be deployed on a PowerEdge R660 server is also shown.
This validated design requires the following networks to manage the cluster and facilitate communication and coordination between different components and nodes within the cluster: