Unlock the Power of RoCEv2 with Dell Enterprise SONiC: High-Speed Storage Transfer without Cost and Complexity
Mon, 22 May 2023 21:20:56 -0000
|Read Time: 0 minutes
In today's data-driven world, the demand for high-speed, low-latency data transfers is increasing exponentially. Traditional Ethernet networks are not designed to meet these demands, as they were primarily built for data integrity and networking reliability. As a result, organizations are turning to more advanced technologies like Remote Direct Memory Access (RDMA) to achieve faster data transfers.
Although RDMA is typically used in InfiniBand (IB) networks, which are complex and expensive to deploy, RDMA over Converged Ethernet (RoCE) has emerged as a more accessible solution.
In this blog, we will explore the benefits of RoCE and how Dell Enterprise SONiC can help you take advantage of this powerful technology. Keep reading if you want to:
- Understand what RoCEv2 is and how it can benefit your organization
- Find ways to optimize your storage networking and achieve faster data transfer speeds
- Learn how the power combination of RoCEv2 and Dell Enterprise SONiC will help you achieve unparalleled data transfer speeds and reliability
What is RoCE?
RDMA Consortium—a group of industry leaders that includes Dell Technologies—developed RoCE to address the need for faster, more efficient data transfers in modern data centers. The first version of RoCEv1 was released in 2010 and supported Layer2 ethernet encapsulation for RDMA.
RoCEv2 was introduced in 2014 and builds on the original RoCEv1 protocol. It added key features such as support for routing and congestion control, making RoCEv2 a more robust and scalable solution for modern data centers.
How does RoCEv2 work?
RoCEv2 enables RDMA over Ethernet networks by using the Ethernet header and payload to transport RDMA messages.
Below is an example of two RDMA-aware servers connected using an Ethernet fabric enabled with RoCEv2:
- The sending NIC prepares an RDMA message by copying data directly from the local memory into the NIC's send buffer. The message can be for example, to initiate an RDMA write operation, specifying the destination server’s memory address and the amount of data to be transferred.
- The RDMA message is translated into an Ethernet frame by encapsulation and sent over the Ethernet fabric to the receiving NIC.
- The receiving NIC receives the packet, decapsulates it from the Ethernet frame, and copies the data from the RDMA message directly to the remote memory.
Because RoCEv2 bypasses the CPU and operates directly on the memory, it can achieve lower latency and overhead than traditional network communication. This makes RoCEv2 ideal for high-performance computing applications that require fast data transfers such as machine learning, big data analytics, and scientific computing.
How does RoCE help with storage networking?
RoCEv2 is a technology that provides high-speed, low-latency data transfers by leveraging RDMA over Ethernet networks. As such, it is well suited for storage networking applications, where fast and reliable data transfers are critical.
Additionally, RoCEv2 is designed to be compatible with existing ethernet infrastructure. This makes it a powerful tool for organizations looking to improve storage networking performance without significant infrastructure upgrades or changes to their existing storage environment.
What is Dell Enterprise SONiC?
Dell Enterprise SONiC is based on an open-source network operating system, designed to provide a flexible and scalable platform for building and managing modern data center networks. SONiC can build efficient and customizable network infrastructures and supports various networking protocols such as Layer 2 and Layer 3 forwarding, BGP, OSPF, VXLAN, and EVPN. It can also extend functionality with custom modules through third-party container management.
A typical RoCEv2 deployment with Enterprise SONiC
In a typical Data Center topology based on Leaf/Spine architecture, the S-series S52xx-ON family of switches plays the role of Leaf while Z-series Z93xx-ON plays as Spine.
Leaf switches are connected to servers and storage devices, while Spine switches are used to interconnect the Leaf switches.
Servers and storage have NICs that are RoCEv2 enabled:
The Layer 2 connectivity of server and storage devices terminates at Leaf. L3 routing is enabled between leaf/spine and uses dynamic routing protocol like BGP to learn all the end points in the network.
The following protocols enable Switch fabric to provide the quality of service (QoS) needed for zero packet loss and to schedule higher priority switching for storage traffic:
- Priority Flow Control (PFC) – provides congestion management by avoiding buffer overflow and achieves zero-packet loss by generating priority-based pause towards the downstream switch
- Enhanced Transmission Control (ETS) – allocates specific bandwidth to each class of service to prevent a single class of traffic hogging the bandwidth
- Explicit Congestion Notification (ECN) – marks packets when the buffer overflow is detected; end hosts check the marked packet and slow down transmission
- Data center bridging protocol – operates with link layer discovery protocol to negotiate QoS capabilities between end points or switches
In terms of operations, Packets are classified and prioritized based on whether it is storage traffic or not at the ingress Leaf. They are tested for congestion at each subsequent switch in the path of the packet towards the destination server.
How does Enterprise SONiC take advantage of RoCEv2?
Enterprise SONiC supports a wide range of capabilities, including modular architecture based on docker container technology, easy automation, third-party container management (TPCM), and other comprehensive features that allow deployment at Enterprise, Edge, and Provider networks. Networks that combine Enterprise SONiC and RoCEv2 can help organizations build high performing, reliable, and scalable network infrastructures.
The most important benefit of RoCEv2 with SONiC is that it reduces the complexity of configuring RoCEv2 to a single CLI command (command: roce enable]. This addresses one of the major complaints against RoCEv2 as being difficult to configure as there are several features to handle like buffering, traffic classification, mapping, queuing, scheduling, priority flow control, congestion management.
Additionally, it is also necessary to ensure that these features and parameters align with switch hardware resources. Not only does the single CLI command address this concern, but it also lends a big differentiator to the solution.
TPCM allows flexibility for deploying any third party developed management tools to monitor RoCEv2 or any other applications.
Why Dell Technologies?
With its expertise in networking and storage, Dell Technologies is uniquely positioned to help customers unlock the full potential of RoCEv2. This end-to-end solution ensures a seamless and reliable deployment, from the server to the storage array, leveraging advanced features like ECN and PFC to minimize congestion and maximize performance.
As data centers continue to evolve, the demand for faster and more efficient data transfers will only increase. RoCEv2 is a catalyst of this transformation, and Dell Technologies is at the forefront. With its powerful combination of technology, expertise, and support, Dell Technologies is well positioned to help customers meet their business goals—today and in the future.
How do I learn more?
- See Enterprise SONiC Spec Sheet for more information about Dell Technologies Enterprise SONiC features.
- See Enterprise SONiC Networking Solutions to learn more about Enterprise SONiC Distribution by Dell Technologies.