Storage Performance Requirements

Thank you for your feedback!

Performance requirements for high-speed storage greatly depend on the types of AI models and data formats to be used. DGX SuperPOD is designed to manage a variety of large-scale AI workloads both today and in the future. However, if systems are going to focus on a specific workload, such as natural language processing (NLP), it may be possible to better estimate the performance needs of the storage system.

To enable customers to characterize their own performance requirements, some general guidance on common workloads and datasets is shown in Table 3.

Table 3. Characterizing different I/O workloads

Storage Performance Level Required	Example Workloads	Dataset Size
Good	NLP	Most to all datasets fit in cache
Better	Image processing with compressed images, ImageNet/ResNet-50	Many to most datasets fit within the local system’s cache.
Best	Training with 1080p, 4K, or uncompressed images, offline inference, ETL	Datasets are too large to fit into cache, massive first epoch I/O requirements, workflows that only read the dataset once.

Performance estimates for the storage system necessary to meet the guidelines in Table 3 are in:

Table 6 of the NVIDIA DGX SuperPOD Reference Architecture—DGX H100 Systems.
Table 8 of the NVIDIA DGX SuperPOD Reference Architecture—DGX A100 Systems.
Table 6 of the NVIDIA DGX SuperPOD Reference Architecture—DGX B200 Systems

Achieving these performance characteristics may require the use of optimized file formats such as TFRecord, RecordIO, or HDF5.

The high-speed storage provides a shared view of an organization’s data to all systems. It needs to be optimized for small, random I/O patterns, and provide high peak system performance. The storage must also provide high aggregate file system performance to meet the variety of training workloads an organization may encounter.

Your Browser is Out of Date

Storage Performance Requirements

Storage Performance Requirements