HPL-AI

HPL-MxP (also known as HPL-AI) benchmark demonstrates the convergence of HPC and AI workloads by solving a system of linear equations using innovative mixed-precision algorithms. See HPL-MxP for more information.

HPL-MxP performance improves with larger matrix sizes, optimizing the use of computational resources. In most cases, the benchmark is configured to maximize GPU memory utilization by adjusting the global matrix size. Fine-tuning these parameters depends on the specific hardware setup. For instance, on NVIDIA GPUs, adjusting matrix dimensions, block size, and process grid dimensions is crucial to achieving optimal performance tailored to the system's architecture.

Bar chart showing HPL-MxP scaling performance for 1, 2, and 4 nodes. — Figure 9. HPL-MxP Node Scaling Performance Using 1, 2, 4 Nodes

Table 4. Speedup comparison between mixed-precision workload and HPC FP64 precision workload
Benchmark (TFLOPS)	One node	Two nodes	Four nodes
NVIDIA HPL	368	752	1489
NVIDIA HPL-MxP	1435	3494	8199
SPEEDUP: HPL-MxP/HPL	3.90	4.65	5.51

The HPL-MxP benchmark integrates HPC and AI workloads by solving systems of linear equations using innovative mixed-precision algorithms. This benchmark provides a more realistic representation of mixed-precision tasks for modern deep learning and large language models. Table 4 illustrates the speedup that is achieved when transitioning from a pure 64-bit (FP64) workload to the HPL-MxP mixed-precision workload.

Your Browser is Out of Date

HPL-AI