Home > Storage > PowerScale (Isilon) > Industry Solutions and Verticals > Analytics > Dell Technologies Solution: Distributed Deep Learning Infrastructure for Autonomous Driving > NVIDIA V100 GPU
The NVIDIA V100 GPU is the most advanced data center GPU ever built to accelerate AI, high performance computing (HPC), data science and graphics. Powered by NVIDIA Volta™ GPU architecture, V100 GPUs enable data scientists, researchers, and engineers to tackle challenges that were once difficult. With 640 Tensor Cores, the V100 GPU is the first GPU to break the 100 teraflops (TFLOPS) barrier of DL performance.
In the PCIe version of the V100 GPU, all GPUs communicate with each other over PCIe buses. With the V100-SXM2 model, all GPUs are connected by NVIDIA NVLink™ technology. In use-cases where multiple GPUs are required, the NVLink interconnect used by V100 cards provide the advantage of faster GPU-to- GPU communication when compared to PCIe. V100 GPUs provide six NVLink lanes per GPU for bi- directional communication. The bandwidth of each NVLink lane is 25 GB/s in uni-direction and all four GPUs within a node can communicate simultaneously, therefore the theoretical peak bandwidth is 6*25*4=600 GB/s in bi-direction. However, the theoretical peak bandwidth using PCIe is only 16*2=32 GB/s as the GPUs can only communicate serially, which means the communication cannot be done in parallel. In theory the data communication with NVLink lane could be up to 600/32=18x faster than PCIe. Because of this advantage, the PowerEdge C4140 compute node in the DL solution uses V100 GPUs instead of PCIe-based V100 GPUs. In this solution, both 16 GB and 32 GB V100 GPUs are supported.