Home > Storage > ObjectScale and ECS > Industry Solutions and Verticals > Scalability Guidelines for Deep Learning with High-Speed Object Storage > Conclusion
This document presents a high-performance architecture for DL by combining Dell PowerEdge R7525 systems with NVIDIA A100 Tensor Core GPUs, Dell S5248F-ON and Z9332F-ON switches, and Dell ECS EXF900 all-flash object storage. We discuss key features of ECS that makes it a powerful persistent storage for DL solutions. This new architecture extends the commitment that Dell Technologies has to make AI simple and accessible to every organization. We offer our customers informed choice and flexibility in how they deploy high-performance DL at scale. Throughout the benchmark we validated that the ECS EXF900 object storage could keep pace and linearly scale performance with NVIDIA A100 Tensor Core GPUs.
It is important to point out that DL algorithms have a diverse set of requirements with various compute, memory, I/O, and disk capacity profiles. That said, the architecture and the performance data points presented in this white paper can be used as the starting point for building DL solutions tailored to varied sets of resource requirements. More importantly, all the components of this architecture are linearly scalable and can be independently expanded to provide DL solutions that can manage tens of PBs of data.
While the solution presented here provides several performance data points and highlights the effectiveness of ECS in handling large scale DL workloads, there are several other operational benefits of persisting data for DL on ECS:
In summary, ECS-based DL solutions deliver the capacity, performance, and high concurrency to eliminate the I/O storage bottlenecks for AI. This provides a rock-solid foundation for large scale, enterprise-grade DL solutions with a future proof scale-out architecture that meets your AI needs of today and scales for the future.