Overview

Thank you for your feedback!

MLCommons™ is an open engineering consortium of ML experts whose focus is to improve the end-to-end components of the machine learning process. It was founded by experts from different areas such as large-scale companies, startups, academics, and research institutes. MLCommons hosts MLPerf with the objective of producing fair and effective benchmarks. These benchmarks aim to map to the real-world use cases that our customers take on more often.

The MLPerf Training benchmark aims to be a representation of the workload that not only requires high throughput and high compute, but also reaches the target convergence metric. It measures the time to convergence.

MLPerf training has two different divisions: the closed division and the open division. The open division enables new model architectures, model update mechanisms, and other research efforts that allow you to reach the target quality of service.

Our submission to the MLPerf Training benchmarks is for the MLPerf closed division. This submission ensures that comparisons can be made like-for-like among other closed division submitters. These closed divisions require following the same dataset preprocessing, model, training method, and quality target as the reference implementation. For example, the hyperparameters must be the same as the reference implementation. Hyperparameters include the optimizer used and values like the regularization norms and weight decays.

The following table lists the available benchmarks to which you can submit and the corresponding expected quality target. For a submission to be valid, the accuracy for that specific model must converge to the specified quality target.

Table 1. Available benchmarks in MLPerf Training v2.0

Area	Benchmark	Dataset	Quality target	Reference implementation model
Vision	Image classification	ImageNet	7.90% classification	ResNet-50 v1.5
	Image segmentation (medical)	KiTS19	0.908 Mean DICE score	3D U-Net
	Object detection (lightweight)	Open Images	34.0% mAP	RetinaNet
	Object detection (heavy weight)	COCO	0.377 Box min AP and 0.339 Mask min AP	Mask R-CNN
Language	Speech recognition	LibriSpeech	0.058 Word Error Rate	RNN-T
Language	NLP	Wikipedia 2020/01/01	0.72 Mask-LM accuracy	BERT-large
Commerce	Recommendation	17B Click Logs	0.8025 AUC	DLRM
Research	Reinforcement learning	Go	50% win rate compared to checkpoint	Mini Go (based on Alpha Go paper

Source: https://mlcommons.org/en/training-normal-20/

The submissions expect a different number of runs for each benchmark. The difference in the number of runs for each benchmark is because of the variance of the benchmark result, the cost of each run, and the likelihood of convergence.

Table 2. Minimum number of expected runs for a valid benchmark in the closed division

Area	Problem	Minimum number of runs
Vision	Image classification	5
	Image segmentation (medical)	40
	Object detection (lightweight)	5
	Object detection (heavy weight)	5
Language	NLP	10
Language	Speech recognition	10
Commerce	Recommendation	5
Research	Reinforcement learning	10

The MLPerf Training suite uses a compliance checker to ensure that each submitter is submitting to the benchmark fairly. The compliance checker is the RCP checker. Reference convergence points (RCPs) ensure that the convergence of the submission does not deviate from the convergence of the reference. Its purpose is to avoid cases where submission convergence is faster than the reference. Reference implementation convergence sets a lower bound on epoch convergence that a valid submission must not surpass.

Your Browser is Out of Date

Overview

Overview