Overview

Thank you for your feedback!

NVIDIA DGX SuperPOD™ with NVIDIA DGX™ H100 or NVIDIA DGX™ A100 or NVIDIA DGX™ B200 systems is an artificial intelligence (AI) supercomputing infrastructure, which provides the computational power necessary to train today’s state-of-the-art deep learning (DL) models and to fuel future innovation. DGX SuperPOD delivers groundbreaking performance, deploys as a fully integrated system, and is designed to solve the world’s most challenging computational problems.
This storage reference architecture (RA) for NVIDIA DGX SuperPOD is the result of collaboration between AI scientists, application performance engineers, and system architects to build a system capable of supporting the widest range of deep learning (DL) workloads. The performance delivered by DGX SuperPOD enables the rapid training of DL models at great scale. The integrated approach of provisioning, management, compute, networking, and fast storage, enables a diverse system that can span data analytics, model development, and AI inference.
PowerScale OneFS network-attached storage (NAS) was evaluated for suitability in supporting AI workloads when connected to NVIDIA DGX SuperPOD. PowerScale OneFS is a clustered NAS system consisting of nodes that each provide storage, compute, and memory. The nodes cluster together to create a massively performant file system with a single pane of glass management and robust data services. NVMe-based PowerScale F710 nodes were used in this qualification. At every level, PowerScale makes data management and access simple and robust for powering AI training and inferencing at scale.
PowerScale storage appliances support the NVIDIA SuperPOD architecture with benefits such as:
- NVIDIA GPUDirect Storage and NFS-over-RDMA protocol support
- Grow PowerScale storage clusters with no downtime
- Multiprotocol data access using NFS, SMB, S3, and other protocols

Your Browser is Out of Date

Overview

Overview