In the solution architecture, the functional modules can be scaled according to the use cases and the capacity requirements. For example, the minimum training module unit for large model training consists of eight PowerEdge XE9680 servers with 64 NVIDIA H100 GPUs.
As a theoretical example, the training module with an InfiniBand module could train a 175B parameter model in 112 days. To illustrate the scalability, six copies of these modules could train the same model in 19 days. As another example, if you are training a 40B parameter model, then two copies of the training module are sufficient to train the model in 14 days.
There is a similar scalability concept for the InfiniBand module. For example, one module with two QM9700 switches can support up to 24 PowerEdge XE9680 servers. If you double the InfiniBand module, in a fat-tree architecture, you can scale up to 48 PowerEdge XE9680 servers. The Ethernet module and Inference modules work similarly.
The Data module is powered by scale-out storage architecture storage solutions, which can linearly scale to meet performance and capacity requirements, as you increase the number of servers and GPUs in your Training and Inference modules.
Scalability and modularity are intrinsic to the Dell and NVIDIA design for generative AI across the board.