Memory bandwidth

Thank you for your feedback!

BabelStream is a synthetic GPU benchmark measuring memory transfer rates to and from global device memory. Formerly known as GPU-STREAM, it is based on the STREAM benchmark for CPUs that the High-Performance Computing Group at the University of Bristol maintains. See https://github.com/UoB-HPC/BabelStream or the official repository.
Running BabelStream
The following commands perform the compiling steps:
git clone https://github.com/UoB-HPC/BabelStream.git
cd BabelStream
CXX=hipcc cmake -Bbuild -H. -DMODEL=hip -DHIP_ROOT_DIR=/opt/rocm-6.1.0 -DUSE_CUDA=OFF -DUSE_ROCM=ON -DHIP_PATH=/opt/rocm-6.1.0
cmake --build build
The following command runs the benchmark:
./build/hip-stream
Interpreting results
The HBM3 memory in the MI300X accelerator primarily drives the memory performance. Because the MI210 accelerator uses HBM2e memory, and the memory capacity of the MI300X shows an increase from the MI210 accelerator’s 64 GB to 192 GB. We observed a 2.9 to 3.4 times faster memory throughput, which aligns with the 3.3 times improvement in specifications. The increased memory capacity and throughput are crucial for training, inference, and fine-tuning LLMs and other generative AI models such as Stable Diffusion, as they enable handling larger parameters, datasets, and caches more efficiently.
Table 9. Comparison of BabelStream benchmarking on MI300X and MI210X accelerators
Benchmark Function
R760xa_MI210 (in TB/s)
XE9680_MI300X (in TB/s)
Speedup of MI300X over MI210
Peak bandwidth
1.6
5.3
3.31x
Copy
1.36
4.01
2.95x
Mul
1.37
3.99
2.91x
Add
1.25
4.09
3.27x
Triad
1.24
4.03
3.25x
Dot
1.27
4.32
3.40x

Benchmark Function	R760xa_MI210 (in TB/s)	XE9680_MI300X (in TB/s)	Speedup of MI300X over MI210
Peak bandwidth	1.6	5.3	3.31x
Copy	1.36	4.01	2.95x
Mul	1.37	3.99	2.91x
Add	1.25	4.09	3.27x
Triad	1.24	4.03	3.25x
Dot	1.27	4.32	3.40x

Your Browser is Out of Date

Memory bandwidth

Memory bandwidth