MLPerf performance

Thank you for your feedback!

MLPerf is a benchmarking suite focused on deep learning developed by the MLCommons community. More information about MLPerf can be found on MLCommons.
AI inference models in production are typically run as single-node instances. For example, Large Language Models (LLMs) are typically deployed as a single instance per node, which has a GPU memory requirement depending on the parameter count and data type (int/float and precision). Popular LLMs have their parameter counts specifically tuned (such as 7B, 13B, etc) to fit their memory footprint within the capacity of a single consumer-grade GPUs.
Multiple users are typically time-sliced, where queries are served first-in-first-out. As user counts grow (typically 10-50 concurrent users per instance, before performance is deemed unacceptable), the LLM service is typically scaled out by deploying additional isolated instances of the same model to support additional users, rather than scaling up additional nodes to the existing instance(s). This limits latency penalties and an additional advantage to approach is that failed instances can be discarded and restarted without affecting other users. As a result, the MLPerf Inference Suite run in single-node configuration was selected as a representative benchmark.
The MLPerf Inference Suite is a subset of benchmark models assumed to be in a previously trained state, and therefore inferencing is primarily concerned with output performance and accuracy. The different scenarios and use cases include:
- Resnet: Image classification, defining what an image represents.
- Retina net: Object detection against a background and foreground (but no classification)
- RNN-T: Speech recognition (speech-to-text) and natural language processing (extract the meaning or command).
- 3D-Unet: Medical image segmentation, such as detecting organs from a 3D scan.
- BERT: Bi-directional language processing, predicting missing words in a sentence based on context
- GPT-J: Large language model, responding to prompts (including previous words in the same reply)
- DLRM v2: Recommendation engines, online retailer suggestions based on past purchases and browsing history

Your Browser is Out of Date

MLPerf performance

MLPerf performance