Ethernet is a solid and familiar technology that has spawned key technologies. These include Local Area Network (LAN) or Wide Area Network (WAN), upon which other key technological innovations have been born such as Internet of Things (IoT), wireless communication, and more.
Ethernet has been adopted by most large enterprises as the de-facto network, and it is now fully embedded into our lives (that is, the Internet).
An Ethernet-based fabric accommodates various workloads that provide various services. Virtualization, high-performance computing, video, voice, and GenAI are all use cases that leverage Ethernet's ease of use and deployment.
An Ethernet fabric, depending on the use case, assumes different characteristics to deliver the services it has been designed for. For example, a data center providing virtualization service requires some level of performance and service assurance, but it does not need to deliver lossless data performance.
On the other hand, with AI (specifically GenAI), its composition predefines what the fabric or network needs to be and how it must behave. This serves as a benefit to a network architect when there is no ambiguity on what must be achieved.
To deploy a GenAI environment, the fabric needs to be:
- Lossless
- RDMA over Converged Ethernet (RoCEv2)
- Priority-based Flow Control (PFC)
- Data Center Quantized Congestion Notification (DCQCN)
To achieve a lossless fabric, RoCEv2 uses several mechanisms. The first (in no specific order) is priority flow control. In a high-performance environment such as GenAI, assigning priority to certain types of data traffic such as training, inferencing, or fine-tuning makes a difference on the overall results.
With PFC, the network can guarantee that GenAI traffic receives priority over any other data traffic in the fabric. However, in doing this, congestion can occur as the paused data traffic continues to consume ingress switch buffers which can spread through the network.
This brings DCQCN as the congestion prevention mechanism. DCQCN combines the traditional explicit congestion notification (ECN) and priority-based flow control (PFC) to achieve a balanced treatment of a lossless, high-performance, and latency sensitive application.
There are two scenarios where tuning the PFC configuration is required:- If PFC is triggered too early, ECN cannot respond fast enough to pause the offending data traffic type.
- If PFC is triggered too late, ECN would cause packet drops due to congestion creating buffer overflow.
To avoid this, specific thresholds need to be monitored and properly configured within the Network’s Operating System (NOS).
- RDMA over Converged Ethernet (RoCEv2)
- High-performance
- Enhanced hashing or data packet spray
- Cut-through switching
- Multi-path routing
- 1:1 non-blocking bandwidth
- RoCEv2
Achieving a lossless fabric is meaningless if the fabric is constantly congested due to poor fabric performance. To implement a high-performance fabric, several aspects should be considered if not implemented.
Enhanced hashing allows for better traffic load balancing. It is the mechanism that provides uniform use of the fabric preventing collisions.
Cut-Through switching is a different switching mode where all data packets are forwarded without fully storing the full packet in the switching buffer ASIC. This mode enhances the fabric throughput, especially with GenAI workloads.
Multi-path routing and non-blocking features bring multiple simultaneous alternative paths through the network and full access from the workloads into the switching fabric respectively.
RoCEv2 allows applications to bypass the CPU and access memory directly during packet processing. Since the CPU is bypassed, increased throughput, lower CPU utilization, and lower latency are achieved.
- Scalable
- Leaf and Spine architecture
One advantage offered by Ethernet is scalability. The technology is ubiquitous and has existed for several decades. It is an open-standards body of work created and made up by all major technology providers.
Ethernet's scalability allows for the deployment of a single switch (the smallest) to hundreds of switches (the largest). Many architectures have been based on Ethernet, one of which includes the leaf and spine Clos distributed architecture (see Figure 8). This architecture was named after Charles Clos, who created the multi-stage circuit-switching network architecture.
In a leaf and spine architecture, three aspects stand out:
- Nonblocking and oversubscription architectures are possible, providing greater flexibility.
- Inter-GPU communication is always minimized to three stages, providing greater control when latency is key.
- Fabric expansion is easily achievable by the addition of a spine if needed, providing scalability as needed.
Figure 9 shows a summary of the three key Ethernet characteristics that facilitate the deployment of a successful GenAI solution.
The following list shows that as Ethernet continues to progress, the industry seems to have reached an inflection point where Ethernet is the preferred choice for GenAI:
- Availability of high Radix switching platforms 400GE, 800GE and higher
- Improved congestion monitoring, flow-control, and transport (RoCEv2)
- Community driven adoption, such as Ultra Ethernet Consortium
- Elimination of vendor lock-in infrastructure
- Lower Total Cost of Ownership (TCO)
- Latency improvements with next-gen Silicon from 800 ns (Nanoseconds) to 200 ns.
GenAI is a revolutionary application where Ethernet and its current plus enhancements of quality-of-service feature set will help play a key role in its adoption and evolution.