Streaming read and write throughput

Thank you for your feedback!

Streaming reads and streaming writes are critical for large AI infrastructures. Streaming reads supply data to the GPUs during training and fine-tuning operations. Streaming writes are critical for saving checkpoints to protect against unexpected failures during long training runs of large AI models.
To demonstrate the capabilities of the solution for streaming reads and writes, two tests were executed to show the streaming performance. The first was run with 16 Dell PowerEdge servers to show scale; the second was run with two NVIDIA DGX systems to show the RDMA performance to the DGX. For each test, FIO – Flexible IO Tester, was used. FIO has been used throughout the industry for testing different IO profiles running within a compute and storage infrastructure. It has proven to be a useful tool for compute-to-storage and storage-to-compute performance characterization.
The following settings were used for this testing:
Ioengine = posixio

Blocksize = 1024
Direct = 1

Iodepth = 64
The following graph shows the read and write performance as the clients are scaled from one to 16 Dell PowerEdge servers. The results show linear scaling for both reads and writes.
Note: Each client was limited by the two 100Gb network ports. The scaling shows that these ports were fully saturated by the workload.
The next test was to show the performance of NFSoRDMA with two DGX systems. The results shown in the graph demonstrate linear scaling from one to two DGX systems, as PowerScale nodes and DGX systems scale during the FIO tests.
Note: For this testing, the maximum bandwidth to a DGX H100 is achieved by using the 4xQSFP ports configured as north-south storage fabric ports. Maximum bandwidth required by GPUs during inferencing, fine-tuning, and training will be significantly different and depend on the model and development method used.

Ioengine = posixio	Blocksize = 1024
Direct = 1	Iodepth = 64

Your Browser is Out of Date

Streaming read and write throughput

Streaming read and write throughput