Home > AI Solutions > Artificial Intelligence > White Papers > Performance of Dell Servers Running with NVIDIA Accelerators on MLPerf™ Training v2.0 > Overview
MLCommons™ is an open engineering consortium of ML experts whose focus is to improve the end-to-end components of the machine learning process. It was founded by experts from different areas such as large-scale companies, startups, academics, and research institutes. MLCommons hosts MLPerf with the objective of producing fair and effective benchmarks. These benchmarks aim to map to the real-world use cases that our customers take on more often.
The MLPerf Training benchmark aims to be a representation of the workload that not only requires high throughput and high compute, but also reaches the target convergence metric. It measures the time to convergence.
MLPerf training has two different divisions: the closed division and the open division. The open division enables new model architectures, model update mechanisms, and other research efforts that allow you to reach the target quality of service.
Our submission to the MLPerf Training benchmarks is for the MLPerf closed division. This submission ensures that comparisons can be made like-for-like among other closed division submitters. These closed divisions require following the same dataset preprocessing, model, training method, and quality target as the reference implementation. For example, the hyperparameters must be the same as the reference implementation. Hyperparameters include the optimizer used and values like the regularization norms and weight decays.
The following table lists the available benchmarks to which you can submit and the corresponding expected quality target. For a submission to be valid, the accuracy for that specific model must converge to the specified quality target.
Area | Benchmark | Dataset | Quality target | Reference implementation model |
Vision
| Image classification | ImageNet | 7.90% classification | ResNet-50 v1.5 |
Image segmentation (medical) | KiTS19 | 0.908 Mean DICE score | 3D U-Net | |
Object detection (lightweight) | Open Images | 34.0% mAP | RetinaNet | |
Object detection (heavy weight) | COCO | 0.377 Box min AP and 0.339 Mask min AP | Mask R-CNN | |
Language
| Speech recognition | LibriSpeech | 0.058 Word Error Rate | RNN-T |
NLP | Wikipedia 2020/01/01 | 0.72 Mask-LM accuracy | BERT-large | |
Commerce | Recommendation | 17B Click Logs | 0.8025 AUC | DLRM |
Research | Reinforcement learning | Go | 50% win rate compared to checkpoint | Mini Go (based on Alpha Go paper |
Source: https://mlcommons.org/en/training-normal-20/
The submissions expect a different number of runs for each benchmark. The difference in the number of runs for each benchmark is because of the variance of the benchmark result, the cost of each run, and the likelihood of convergence.
Area | Problem | Minimum number of runs |
Vision | Image classification | 5 |
Image segmentation (medical) | 40 | |
Object detection (lightweight) | 5 | |
Object detection (heavy weight) | 5 | |
Language | NLP | 10 |
Speech recognition | 10 | |
Commerce | Recommendation | 5 |
Research | Reinforcement learning | 10 |
The MLPerf Training suite uses a compliance checker to ensure that each submitter is submitting to the benchmark fairly. The compliance checker is the RCP checker. Reference convergence points (RCPs) ensure that the convergence of the submission does not deviate from the convergence of the reference. Its purpose is to avoid cases where submission convergence is faster than the reference. Reference implementation convergence sets a lower bound on epoch convergence that a valid submission must not surpass.