Training results

Thank you for your feedback!

Overall, our MLPerf Training results show that the PowerEdge C4140 server with four NVIDIA V100 GPUs performs well when compared with an NVIDIA DGX-1 with eight V100 GPUs. Further, running the benchmarks on OpenShift Container Platform does not add significant overhead.

The following table shows the results that we achieved when we ran MLPerf Training. The logs are available in our GitHub repository.

Table 2. MLPerf Training v0.6 results: C4140 worker node with four V100 GPUs

Model types	Object detection—heavy weight	Object detection—light weight
Data source	COCO	COCO
Dataset size	21 GB	21 GB
Framework	PyTorch	PyTorch
Model	Mask-RCNN	SSD w/ ResNet-34
Result (minutes)	383.48	45.92

The following figure shows our results compared to the NVIDIA published results. The relative performance shows some variation depending on the benchmark used. As expected, our times are approximately double compared to the NVIDIA times, based on the number of GPUs used.

Figure 9. MLPerf Training 0.6 results: PowerEdge C4140 with four V100 GPUs compared to NVIDIA DGX-1 with eight V100 GPUs

For Mask-RCNN, our result is less than double, and it is near double for SSD w/ResNet-34. Results show that the Transformer benchmark is a little slower than expected, and performance is worse for the GNMT benchmark, although still within a respectable timeframe. These results indicate that the relative efficiency of the hardware is replicated and that the addition of OpenShift to the software stack has little impact on overall performance of these benchmarks.

Due to the differences between our system and the NVIDIA DGX-1 system, we compare results on a per-GPU basis to assess performance based on cost effectiveness. The overall performance between the systems is on a per-GPU basis, as shown in the following figure:

Figure 10. Per GPU results for MLPerf Training v0.6: PowerEdge C4140 compared to Nvidia DGX-1

As shown, the C4140 server performs better for the Mask-RCNN benchmark but is slightly less efficient for the other three benchmarks. The GNMT benchmark is relatively slower, possibly due to the use of an initial random seed in the benchmark that can result in time variation.

Your Browser is Out of Date

Training results

Training results