Summary

In the realm of digital innovation, GPU acceleration is pivotal for powering a broad spectrum of demanding applications, particularly AI. The Liqid SmartStack MX for Dell PowerEdge MX7000 enables the on demand allocation of enterprise-grade PCIe GPUs to Dell PowerEdge compute sleds. This integration provides a modular infrastructure that not only accelerates AI-driven workloads but also ensures organizations remain at the forefront of technological advancements in their chosen server form factor.

GPU Acceleration for PowerEdge MX7000

The Dell PowerEdge MX7000 Modular Chassis stands at the forefront of modernizing dynamic data centers, streamlining the deployment and management of complex workloads. Pivotal to this modernization is the Liqid SmartStack MX, which introduces GPU acceleration capabilities essential for compute-intensive environments, including AI and graphics-intensive applications.

By adding SmartStack MX, customers can dynamically connect and scale enterprise-grade GPUs from NVIDIA, AMD, and Intel to the Dell PowerEdge compute sleds, at the bare metal. These sleds, which connect directly to pools of 10, 20, or 30 GPUs via PCIe, leverage the Liqid Matrix software to dynamically attach up to 20 GPUs enterprise GPUs. An example of this is the NVIDIA L40S connected to a single MX760c compute sled. This solution also facilitates the reallocation of GPUs between compute sleds as workload demands evolve and supports provisioning to Dell PowerEdge R- and C-series rackmount servers. This enables migration between platforms for better utilization, agility, and investment protection.

Figure 1. SmartStack MX20 Complete System

Liqid SmartStack MX are fully validated, composable solutions designed to meet your most challenging GPU requirements. It includes several key components, one of which is the Liqid EX-4410 PCIe expansion chassis, capable of holding up to 10 FHFL double-width GPUs. Another key component is the Liqid Director, which houses the Liqid Matrix software essential for GPU provisioning, also known as composing. Lastly is the Liqid 48-port PCIe Gen 4.0 switch, utilized in the SmartStack MX20, MX30, and MX30+ systems. Depending on the system configuration, either four or eight PCIe HBAs are housed within the CoreModuleXL, an expansion module provided by Amulet Hotkey. This module occupies the B1 and B2 fabrics in the MX7000 and serves as the direct connection between the Dell compute sleds and the Liqid PCIe fabric and GPUs. The SmartStack MX supports attaching GPUs to the following Dell PowerEdge compute sleds: MX760c, MX750c, and MX740c.

Liqid SmartStack MX Series Technical Specifications


	SmartStack MX10	SmartStack MX20	SmartStack MX30	SmartStack MX30+
Description	10 GPU / 4 Host Capacity	20 GPU / 8 Host Capacity	30 GPU / 8 Host Capacity	30 GPU / 16 Host Capacity
No. of MX7000 Chassis	1x MX7000	1x MX7000	1x MX7000	2x MX7000
Max GPUs per MX7000 Enclosure	10x Full-height, full-length (FHFL) 10.5”, dual-slot	20x Full-height, full-length (FHFL) 10.5”, dual-slot	30x Full-height, full-length (FHFL) 10.5”, dual-slot	30x Full-height, full-length (FHFL) 10.5”, dual-slot
Supported Device Types	GPU, NVMe, FPGA, DPU	GPU, NVMe, FPGA, DPU	GPU, NVMe, FPGA, DPU	GPU, NVMe, FPGA, DPU
Max Hosts	4x Compute Sleds	8x Compute Sleds	8x Compute Sleds	16x Compute Sleds
PCIe Expansion Chassis	1x Liqid EX-4410 - 10-slot	2x Liqid EX-4410 - 10-slot	3x Liqid EX-4410 - 10-slot	3x Liqid EX-4410 - 10-slot
PCIe Fabric Switch	Integrated Switch	1x 48 Port Switches	1x 48 Port Switches	2x 48 Port Switches
PCIe Fabric HBA	1x Fabric B CoreModuleXL w/ 4x PCIe Gen4 x16 HBAs	1x Fabric B CoreModuleXL w/ 8x PCIe Gen4 x16 HBAs	1x Fabric B CoreModuleXL w/ 8x PCIe Gen4 x16 HBAs	2x Fabric B CoreModuleXL w/ 16x PCIe Gen4 x16 HBAs
Rack Units	5U	10U	14U	15U
Composable Devices	Go to liqid.com/resources/library, for a current hardware compatibility list of composable PCIe devices

Table 1. Liqid SmartStack Solutions

Implementing GPU Expansion for MX

Figure 2. Liqid Matrix User Interface

First, install GPUs into the PCIe expansion chassis. Supported GPUs can be found on the Liqid HCL (Hardware Compatibility List). Then connect the HBAs in CoreModuleXL from MX7000 Fabrics B1 and B2 to the Liqid PCIe fabric. Liqid Matrix software is connected to the fabric via the Liqid Director and is used to provision resources. Additionally, Liqid supports the provisioning of other PCIe resources to Dell PowerEdge compute sleds, including Liqid NVMe SSD, NIC, and DPU.

Software Defined GPU Deployment

Once PCIe devices are connected to the MX7000, Liqid Matrix software enables the dynamic allocation of GPUs to MX compute sleds at the bare metal level and supports GPU hot-plug. Up to 20 GPUs can be added to a single MX760c compute sled via the Liqid UI or a RESTful API to meet end-user workload requirements. The MX750c supports up to 20 GPUs and the MX740c supports up to16 GPUs per compute sled. To the operating system, the GPUs are presented as local resources directly connected to the MX compute sled over PCIe. Most operating systems are supported including Linux, Windows, and VMware. Liqid also has a SLURM and Kubernetes plug-in. As workload needs change it is simple to add or remove resources on the fly, including GPU, NVMe SSD and FPGA via software.

Enabling GPU Peer-2-Peer Capability

A key feature included with the SmartStack MX is that RDMA Peer-2-Peer (P2P) communication is support between GPU devices in a single chassis and also across multiple Liqid expansion chassis; it is also available between GPUs and SSDs. Utilizing direct RDMA transfers, this feature dramatically enhances both throughput and response time (latency), which is critical for the highest performing GPU-centric applications. Performance improvements include up to a tenfold increase in throughput, significantly boosting bandwidth and reducing latency. This enhancement is crucial as it allows for bypassing the x86 processor, enabling direct communication between GPUs, and now also between GPUs and NVMe SSDs. This setup optimizes data transfer rates and minimizes response times, facilitating rapid, efficient inter-device communication even in complex, multi-chassis configurations. The Liqid GPU expansion chassis is PCIe Gen4, thus the P2P traffic for the MX760c, MX750c, and MX740c will be at PCIe Gen4 levels. The accompanying chart (Figure 3) and table (Table 2) provide an overview of how PCIe Peer-2-Peer functions are enabled. They also demonstrate the expected performance enhancements, when GPUs are composed to a single node with GPU RDMA Peer-2-Peer, is enabled.

Figure 3. Peer-2-Peer Modes and Performance

Table 2. Comparing Performance with Peer-2-Peer Disabled vs. Enabled

Application Performance

RDMA Peer-2-Peer is a crucial enhancement in GPU scaling for Artificial Intelligence, particularly for machine learning-based applications. Figure 4 presents performance data obtained using MLPerf on the MX7000 equipped with SmartStack MX. It showcases strong scalability from 4-GPU to 20-GPU configurations on a single compute sled. This data, represented in queries/second, demonstrates high scaling efficiency across a variety of MLPerf 3.1 workloads, achieved with the implementation of composable PCIe GPUs and Peer-2-Peer technology. The results illustrate a near-linear growth pattern in performance, highlighting the robust capabilities of Liqid's technology, which can allocate up to 20 GPUs to an application running on a single compute sled. Such scalability ensures optimal performance and resource utilization, critical for demanding AI computations.

Figure 4. GPU Performance Scaling Comparison – MX7000 with SmartStack MX (NVIDIA L40S), with Peer-2-Peer enabled.

Conclusion

The Liqid SmartStack MX represents a transformative solution, enabling advanced AI and graphics-intense workloads to be executed efficiently on Dell PowerEdge servers. Through a strategic collaboration with Dell Technologies Design Solutions, Liqid has enhanced the PowerEdge MX compute sleds with powerful GPU additions. This partnership not only accelerates applications but also ensures that enterprises can leverage cutting-edge AI capabilities with unprecedented scalability and flexibility. The integration of Liqid’s innovative technology with Dell Technologies’ robust infrastructure exemplifies a commitment to pushing the boundaries of what is possible in data center performance, setting new standards for enterprise computing.

Learn More | See a Demo | Get a Quote

This reference architecture is available as part of the Dell Technologies Design Solutions.

Ask your Dell account team for more details or Contact a Liqid Expert contact liqid

Your Browser is Out of Date