Training complex AI models, such as ResNet, often requires the processing power of multiple GPUs to complete in a reasonable time. Data scientists can use technologies like Horovod with TensorFlow to perform distributed training of neural networks when multiple GPUs are available. GPUDirect RDMA from NVIDIA provides more efficient data exchange between GPUs for customers that perform multinode training at scale. It enables a direct peer-to-peer data path between the memory resources of two or more GPUs using ConnectX network adapter ports on the host. This direct path provides a significant decrease in GPU-GPU communication latency and eliminates the extra data transfer overhead incurred when using CPU resources for GPU-to-GPU communications across the network. GPUDirect RDMA from NVIDIA enables near bare-metal performance on multimode training in virtualized environments.
The following are key requirements of GPUDirect RDMA:
For information about more detailed requirements, see the NVIDIA Multi-Node Deep Learning Training with TensorFlow – AI Partitioner’s Guide.