The state of the art in the industry for host and GPU connectivity is based on RDMA (Remote Direct Memory Access) protocol, which can be implemented either via Ethernet or InfiniBand using RoCE standards.
This is supported by high-speed switching, currently with 800G port speeds as current benchmark and then supported with sophisticated load balancing and congestion management techniques to ensure lossless, low latency data transfers to ensure maximum utilization of the compute resources to achieve best training times and inferencing responses.
Dell Technologies currently supports and recommends three different fabric solutions, each with different trade-offs relative to the specific goals and requirements of customers.
Factors that influence the final selection of a fabric technology can include the desire for multi-vendor support, open standards, cost and supply constraints, or specific performance or compatibility concerns.
These solutions are Dell PowerSwitch Ethernet (based on the SONiC NOS), NVIDIA Spectrum-X switching, or finally NVIDIA InfiniBand. These options support 400G bandwidth, with the Ethernet version extending to 800G speeds with support for RDMA transmission and associated load balancing and congestion management techniques to support GenAI traffic requirements.
The final recommendation of a given solution must be given in the context of the specific customer requirements. Please refer to vendor document and related whitepapers for detailed discussions of each of the specific solution options.