Home > Storage > Data Storage Essentials > Storage Admin > The Next Generation PowerMax - Balancing Performance with Efficiency and Sustainability > PowerMax Dynamic Fabric
At its core, the next generation PowerMax is a multi-node, high-performance computing cluster for data processing and storage operations. Efficient cluster architecture has an ability to share resources between the nodes and not leave critical resources such as memory and physical capacity stranded and unused. These unused cluster resources still require power to operate. Sharing resources results in far more efficient operations as it saves the cluster from having to duplicate and allocate additional resources for node operations, saving on the power that would be required for these additional resources.
The next generation PowerMax platform is a true share everything architecture as each node can access and share each other nodes memory and storage resources. The key enabler for this share everything architecture is that both the PowerMax 2500 and 8500 use full end to end internal NVMe (Non-Volatile Memory Express) over Fabric (NVMe-oF) topologies between the compute nodes and DMEs. These NVMe-oF topologies are referred to as the PowerMax Dynamic Fabric.
The Dynamic Fabric turns the compute and backend storage elements into individual independent endpoints on a large internal storage fabric. Each endpoint in this fabric is dual ported with each port being connected to a physically isolated Fabric A or Fabric B fabric for redundancy. These individual compute and storage endpoints can be placed into shared resource pools, which in turn disaggregates the storage and compute in the system. In this architecture, all compute node endpoints can access all shared Memory in all the other compute nodes using the system’s high speed Remote Direct Memory Access (RDMA) protocol, and all compute node endpoints can access all storage endpoints and SSDs in the DMEs using the system’s high speed NVMe-oF protocol, creating a true active / active and share everything system architecture. This system disaggregation decouples the compute and storage so that they can be scaled and provisioned independently of each other to meet application requirements rather than adhering to strict system architecture requirements. To allow for this shared everything decoupled architecture, the Dynamic Fabric provides several key elements to the system:
High-speed interconnection between cluster components
Ability to share memory access between the nodes
End to end data consistency checking based on SCSI T10-DIF Protection Information that protects against erroneous data transmission and data corruption
The next generation PowerMax uses an active / active share everything architecture, including node memory by using Remote Direct Memory Access (RDMA). RDMA is the ability to read from and write to memory on an external compute node without interrupting the processing of the CPU(s) on the source node. RDMA permits high-throughput, low-latency data transfer between the memory of two compute nodes which is essential in high performing clustered computing systems like PowerMax. Using RDMA, all PowerMax nodes can access the memory contents of any other node in the system like it was its own. The ability for each node to access other nodes memory using RDMA turns the total capacity of cache in the PowerMax systems into a truly shared resource pool called PowerMax Global Cache.
PowerMax RDMA communications are subject to fabric latency thereby making it a Non-Uniform Memory Access (NUMA) system. A key requirement for PowerMax Global Cache (and NUMA systems in general) is that the fabric which transports the internode RDMA communications must be extremely low latency and have high bandwidth. For this reason, PowerMax uses the InfiniBand (IB) protocol as its primary internode fabric. InfiniBand is a fabric technology and set of protocols which natively provides:
A highly energy efficient, high bandwidth, and low latency fabric: In the next generation PowerMax, the IB fabric is run in connected mode where a single lane can transmit 100 Gbps with MTU sizes of 2 MB. Larger single lane data transfer is more efficient than smaller multilane data transfer as more data is transferred using a single clock cycle – consuming far less compute resources to transfer data. Studies have shown that when run in connected mode, IB is over 70% more energy efficient per byte sent than other RDMA fabric choices such as Ethernet: IEEE Study - Evaluating Energy Efficiency of GbE and IB in Data Centers
Scalability: IB can support tens of thousands of fabric endpoints in a flat single subnet network. This ability allows for the next generation PowerMax’s disaggregated architecture to scale by adding independent compute and storage endpoints into the fabric without having to add more subnets incurring additional latency penalties.
High Security: The protocol is implemented in hardware and the communication attributes are configured centrally in a way that does not enable software applications to gain control over them and maliciously spoof or change those attributes.
Resiliency: RDMA data transfers can be protected with SCSI T10-DIF protection information. This allows detection of data corruption from many sources, and with the PowerMax redundant architecture, the correct data can always be referenced.
Using RDMA and NVMe-oF with an IB fabric architecture (NVMe/IB), the PowerMax Dynamic Fabric delivers the high levels of performance, scalability, efficiency, and security to its enterprise customers require. How the Dynamic Fabric has been implemented to achieve this differs between the PowerMax 2500 and PowerMax 8500.
A PowerMax 2500 can scale up to two node pairs. Each node pair comes with its own direct attached PCIe (Peripheral Component Interface Express) DME. Starting with PowerMaxOS 10.1, the PowerMax 2500 can be configured with two nodes pairs and a single DME. This configuration is called the 2x1.
A key architectural part of the PowerMax 2500 Dynamic Fabric is the use of PCIe Multihost technology. In the case of the PowerMax 2500, PCIe Multihost allows both nodes to share each other's internal PCIe IB 100 Gb Host Channel Adapter (HCA), creating a high-speed switchless fabric between the nodes for low latency RDMA communication. Data transfer between the nodes and direct attached PCIe DMEs uses a x32 lane NVMe/PCI fabric.
The Dynamic Fabric configuration on the PowerMax 2500 allows it to deliver performance and scalability in a compact package. It allows for higher levels of efficiency as the PowerMax 2500 can store up to 7x more capacity in half the rack space (over 4 PBe in 5U) compared with the earlier generation PowerMax 2000. Along with its compact design, the 2500 supports the full complement of rich data services for open systems, mainframe, file, and virtual environments.
The PowerMax 8500 Dynamic Fabric is a fully switched 100 Gbps per lane NVMe/IB fabric that is used by all RDMA communications and NVMe data transfers. This differs from the PowerMax 2500 where the NVMe data transfers from the node to its PCIe connected DME uses NVMe/PCIe. On the PowerMax 8500 each compute node and storage DME are treated as unique endpoints on the fabric, allowing them to be added into the fabric independently of each other while being able to access all other endpoints. The nodes and DMEs of the system are fully disaggregated and dynamically connected allowing the system to scale up compute to 8 node pairs while scaling out storage to 8 DMEs - providing over 18 PBe in a single system.
Another difference with the Dynamic Fabric on the PowerMax 8500 is the use of intelligent DMEs. The DMEs connect into the NVMe/IB fabric using dual Link Control Cards (LCC) each with dual 100 Gb IB ports. Each LCC board has its own NVIDIA Bluefield Data Processing Unit (DPU) allowing it to perform critical storage management functions such as fabric offload. The LCCs with their DPUs essentially make each PowerMax 8500 DME a unique active / active dual controller NVMe storage subsystem on the fabric.
In summary, the Dynamic Fabric is what enables the PowerMax 2500 and 8500 to function as a true active / active, share everything architecture for enterprise storage. This native architecture provides customers with the following benefits:
Able to achieve high levels of fault tolerance and resiliency natively without the need for costly additional hardware and software
Requiring less power and resources to meet workload requirements
Easier scale up and scale out that requires no manual intervention for load balancing or other backend clustering operations since this is performed automatically by PowerMaxOS.
The Dynamic Fabric is the key component which allows for the low latency, high bandwidth connections required for RDMA communication between the nodes and for the extensive NVMe data processing and movement between the nodes and DMEs in the system. The Dynamic Fabric can be considered the backbone of the PowerMax 2500 and 8500 systems which allows them to deliver the kind of performance, security, efficiency, and scalability required by the modern data center.