Unveiling the World’s First MLPerf 4.1 Performance Results for AMD Instinct MI300X on PowerEdge XE9680
Tue, 01 Oct 2024 18:27:22 -0000
|Read Time: 0 minutes
Introduction
The artificial intelligence (AI) landscape is evolving at an unprecedented pace, driven by groundbreaking advancements in machine learning (ML). As researchers, developers, and companies race to push the boundaries of what's possible, there is a growing need for standardized benchmarks to objectively measure and compare the performance of ML models and hardware performance. This is where MLPerf comes in.
MLPerf is a comprehensive benchmarking suite designed to evaluate the performance of ML hardware, software, and algorithms. It provides a standardized set of tests that cover a wide range of tasks, from image recognition to natural language processing (NLP), and reinforcement learning. These benchmarks are crucial for helping the AI community understand how different systems perform under various conditions, ensuring that innovations are practical and efficient.
The Dell PowerEdge XE9680 is a leader in high-performance computing, delivering power, efficiency, and versatility. Equipped with eight AMD Instinct MI300X GPUs and AMD ROCm AI software, this server is purpose-built to maximize AI throughput. Enterprises can depend on this highly refined, systemized, and scalable platform to achieve breakthroughs in NLP, recommender systems, data analytics, and generative AI. With support for large CPU RAM volumes (up to 4TB), the XE9680 provides a competitive edge in handling AI workloads. The AMD Instinct MI300X accelerator, with its unparalleled memory footprint, empowers the Dell PowerEdge XE9680 to train and fine-tune large AI models, delivering exceptional performance and accelerated outcomes for our customers. Let’s dive in.
Why MLPerf Matters
In the world of AI, performance is critical. Whether it's training a model faster, reducing the energy consumption of a neural network, or improving the accuracy of predictions, every improvement counts. However, comparing the performance of different AI systems has traditionally been a challenge. Different models, datasets, and hardware configurations can lead to inconsistent results, making it difficult to determine which system excels.
MLPerf addresses this challenge by providing a level playing field. Its benchmarks are designed to be fair, representative, and applicable to real-world scenarios. By using MLPerf, organizations can make informed decisions about which hardware and software solutions best suit their needs, whether developing new AI applications or scaling existing ones.
The Latest MLPerf Results
Dell’s inaugural MLPerf 4.1 inference benchmarking on the Dell PowerEdge XE9680 server equipped with AMD Instinct MI300X Accelerators demonstrated exceptional LLM performance. This cutting-edge hardware configuration signifies a leap in AMD's AI computational power, and Dell is the first OEM to publish these results.
The PowerEdge XE9680’s integration of AMD MI300X accelerators showcases the potential of next-generation AI accelerators to advance neural network capabilities, particularly in generative AI workloads.
PowerEdge XE9680 AMD Instinct MI300X Platform
The following results from MLPerf 4.1 Datacenter Inference offer a glimpse into the future of open ecosystem AI computing.
The graph includes unverified MLPERF 4.1 results collected after the MLPerf submission deadline. Verified results are available under ID 4.1-0022 and are as follows: Llama2-70b Model Server queries/s is 19,886.10, and Offline is 22,677.60.
Performance Drivers for AMD Instinct MI300X
MI300X’s Large Memory Size: With 192 GB of HBM3 memory, the MI300X has the highest memory capacity of any GPU on the market, enabling:
- Running the Llama2-70B chat model with TP1 on a single GPU eliminates the need for model parallelism and reduces inter-GPU communication overhead.
- Leveraging larger batch sizes for more efficient processing.
FP8 Quantization
- Quark, developed by AMD, is a toolkit that offers easy-to-use APIs for quantizing models.
- FP8 support extends to LLM weights and activations and KV cache, reducing the need for costly FP16 calculations.
- In addition to FP8 support in the AMD repository, vLLM has been enhanced with an upgraded scheduler, enhancing overall performance.
Improved Kernels and Libraries
- Significant effort has been invested in profiling and optimizing essential computational kernels.
- The latest hipBLASLt library now includes FP8 support, further boosting performance.
For more detailed technical information, visit the AMD blog.
Key Takeaways
Performance Excellence: The PowerEdge XE9680’s performance in the MLPerf 4.1 benchmark underscores the server’s ability to efficiently handle large LLM Generative AI workloads, marking a significant milestone for Dell’s AI infrastructure.
Hardware Synergy: Dell and AMD's collaboration highlights the synergy possible when combining powerful servers with advanced accelerators, paving the way for rapid AI research and application advancements.
Open Source Software: Open-source software plays a crucial role in the AI ecosystem, fostering collaboration, transparency, and rapid innovation. Deploying AMD ROCm and AI frameworks on Instinct MI300X is seamless, allowing researchers and developers to leverage the full potential of open-source frameworks with optimized performance. This ease of deployment ensures that organizations can quickly integrate and scale their AI solutions, driving faster progress in AI research and development.
Benchmarking Leadership: As a leader in MLPerf Benchmarking, Dell Technologies and the PowerEdge XE9680 with AMD Instinct MI300X Accelerators set a high bar for competitors and drive the industry towards more significant innovation.
Conclusion
The MLPerf 4.1 benchmarks are more than just numbers; they testify to our commitment to advancing AI technology and giving our customers the insights they need to make informed AI infrastructure decisions. The PowerEdge XE9680 server’s impressive results reflect our dedication to providing cutting-edge solutions that meet the ever-growing demands of AI computing.
Server Configuration
Server Model | GPU | CPU | Software |
PowerEdge XE9680 | 8x AMD Instinct MI300X | 2x Intel Xeon Platinum 8460Y+ | ROCm 6.1 |
Resources
- https://www.delltechnologies.com/asset/en-us/products/servers/briefs-summaries/dell-poweredge-amd-rocm-ebook.pdf
- https://www.dell.com/en-us/shop/ipovw/poweredge-xe9680
- https://mlcommons.org/benchmarks/inference-datacenter/
Author(s): Delmar Hernandez, Frank Han
Testing conducted by Dell in July of 2024. Performed on PowerEdge XE9680 with 8x AMD MI300X GPUs. MLPerf v4.1 Inference results in Llama2 70b model. Result ID is 4.1-0022. The MLPerf name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information. Individual results may vary.