Home > AI Solutions > Gen AI > Guides > Generative AI in the Enterprise with AMD Accelerators > Memory bandwidth
BabelStream is a synthetic GPU benchmark measuring memory transfer rates to and from global device memory. Formerly known as GPU-STREAM, it is based on the STREAM benchmark for CPUs that the High-Performance Computing Group at the University of Bristol maintains. See https://github.com/UoB-HPC/BabelStream or the official repository.
Running BabelStream
The following commands perform the compiling steps:
git clone https://github.com/UoB-HPC/BabelStream.git
cd BabelStream
CXX=hipcc cmake -Bbuild -H. -DMODEL=hip -DHIP_ROOT_DIR=/opt/rocm-6.1.0 -DUSE_CUDA=OFF -DUSE_ROCM=ON -DHIP_PATH=/opt/rocm-6.1.0
cmake --build build
The following command runs the benchmark:
./build/hip-stream
Interpreting results
The HBM3 memory in the MI300X accelerator primarily drives the memory performance. Because the MI210 accelerator uses HBM2e memory, and the memory capacity of the MI300X shows an increase from the MI210 accelerator’s 64 GB to 192 GB. We observed a 2.9 to 3.4 times faster memory throughput, which aligns with the 3.3 times improvement in specifications. The increased memory capacity and throughput are crucial for training, inference, and fine-tuning LLMs and other generative AI models such as Stable Diffusion, as they enable handling larger parameters, datasets, and caches more efficiently.
Table 9. Comparison of BabelStream benchmarking on MI300X and MI210X accelerators
Benchmark Function | R760xa_MI210 (in TB/s) | XE9680_MI300X (in TB/s) | Speedup of MI300X over MI210 |
Peak bandwidth | 1.6 | 5.3 | 3.31x |
Copy | 1.36 | 4.01 | 2.95x |
Mul | 1.37 | 3.99 | 2.91x |
Add | 1.25 | 4.09 | 3.27x |
Triad | 1.24 | 4.03 | 3.25x |
Dot | 1.27 | 4.32 | 3.40x |