The MLPerf inference benchmark measures how fast a system performs ML inference using a trained model with new data in various deployment scenarios. See Table 1 below for the list of seven mature models included in the official v0.7 release.
Table 1 Inference Suite v0.7
Model |
Reference application |
Dataset |
resnet50-v1.5 |
vision/classification and detection |
ImageNet (224 x 224) |
ssd-mobilenet 300 x 300 |
vision/classification and detection |
COCO (300 x 300) |
ssd-resnet34 1200 x 1200 |
vision/classification and detection |
COCO (1200 x 1200) |
bert |
language |
squad-1.1 |
dlrm |
recommendation |
Criteo Terabyte |
3d-unet |
vision/medical imaging |
BraTS 2019 |
rnnt |
speech recognition |
OpenSLR LibriSpeech Corpus |
The above models serve in various critical inference applications or use cases that are known as “scenarios.” Each scenario requires different metrics, demonstrating production environment performance in real practice. MLPerf Inference consists of four evaluation scenarios: single-stream, multi-stream, server, and offline.
Table 2 Deployment scenarios
Scenario |
Sample use case |
Metrics |
Single-stream |
Cell phone augmented reality |
Latency in milliseconds |
Multi-stream |
Multiple camera driving assistances |
Number of streams |
Servers |
Translation sites |
QPS |
Offline |
Photo sorting |
Inputs/second |
These scenarios illustrate some of the different use cases under which a hardware platform may be used. For more details, refer to the official MLPerf Inference research paper, available at https://arxiv.org/abs/1911.02549 and the MLPerf Inference website.