Why DriveNets Leads in Ethernet-Based AI Networking

Getting your Trinity Audio player ready...

How do you support the high-performance needs of AI

InfiniBand, the specialized high-bandwidth technology by Nvidia, is frequently used with AI systems but is considered almost proprietary. Ethernet is emerging as the preferred choice for AI networking (even Nvidia hedged building an Ethernet-based solution), but will require some optimization to support the high-performance needs of AI. That was the goal behind the Ultra Ethernet Consortium (UEC) — established by leading industry players like AMD, Broadcom, cisco, Intel, Meta and others, to pave a path towards an Ethernet-based congestion-free fabric. But if you are building an AI infrastructure today, and you are looking for an Ethernet-based solution, the number of relevant options available right now is very small.

DriveNets offers one of these options. It’s Ethernet-based solution works today in high-scale AI Infrastructures. It is based on the company’s field-proven scheduled-fabric that is part of DriveNets Network Cloud solution.

To learn more download the white paper

Distributed Disaggregated Chassis (DDC) as an Effective Interconnect for Large AI Compute Clusters

Ethernet has a big opportunity in AI

One of the large potential impacts on networks from AI is the need for new architectures. The tried-and-true leaf-and-spine architecture popularized by webscale providers will not be abandoned, but is not optimized for AI.

Expect this debate on AI networking architectures to expand and draw demand for a wider variety of solutions. NVIDIA, the leader in AI GPUs and systems, has currently developed an advanced vertically integrated system for Large Language Models (LLMs), and it favors integrating this with systems it controls such as the NVLink interconnect and InfiniBand technology for connecting AI processing and servers.

But these systems address just one application for AI, and the reality is we don’t know how the market will evolve. As additional use cases develop, some customers will look at alternative solutions with more open characteristics based on open standards and commercial off-the-shelf (COTS) technology. As Ethernet standards evolve for AI, there will be many opportunities to support specialized LLMs as well as distributed edge AI.

The history of networking and other markets has always presented see-saw battles between integrated, proprietary systems and standards-based solutions. Standards-based solutions often win because of their characteristics of driving better economics and creating larger markets.

Wall Street analyst Simon Leopold, with financial analyst Raymond James, recently wrote a comprehensive note on AI system evolution called the “AI Bandwidth Opportunity: Impact on the Optical Market.” In that research note, Leopold estimates that scale-out architectures are likely to play a greater role as Ethernet solutions mature – and that they could become the go-to architecture for AI Inferencing. Inferencing, which is especially relevant to the edge and access markets, holds the potential to be a market equal to or even larger than LLMs.

Ethernet has a long history of success, so it’s probably going to be a mistake to discount it.

DriveNets ethernet-based AI scheduled fabric

The next Ethernet will need to be adapted to support AI, and this is where it gets interesting. DriveNets has already started addressing the market with Network Cloud-AI, initially announced in May 2023. The Network Cloud-AI has the advantage of being based on open networking standards, yet being optimized for AI and, most importantly – being ready for field deployment. Today.

DriveNets says that using an Ethernet Clos architecture, which aggregates switching ports into a traditional networking backbone, results in performance hits. Instead, it’s using a scheduled, cell-based fabric on its infrastructure backbone (i.e., the connectivity between leaf and spine nodes). This architecture is based on the Distributed Disaggregated Chassis (DDC) specification supported by the Open Compute Foundation (OCP), which DriveNets has built its Network Cloud solution on. The OCP DDC offers an open architecture for a massively scalable software router or switch that aren’t dependent on a single proprietary hardware chassis. DriveNets Network Cloud-AI series runs on white boxes based on the Jericho2C+ Ethernet switch ASIC from Broadcom, creating a distributed, cloud-based network for demanding generative AI workloads.

To Learn More:

Contact a DriveNets Networking Expert

DriveNets has also employed Broadcom’s new Jericho3-AI chip and Ramon 3 cell-based switching component, giving DriveNets’ DDC the ability to support 32,000 GPUs in a single cluster, each, connected with a 800-Gb/s Ethernet networking port. The scheduled fabric provides perfect load balancing and scheduling across the cluster which yields optimized performance, in terms of Job Completion Time (JCT), for deep AI training workloads. This fabric performs better than traditional Ethernet fabric switches, with lower latency and jitter and better reliability. Furthermore, Network Cloud-AI doesn’t need to be adjusted for specific LLMs, which DriveNets says can be an InfiniBand requirement.

An independent data center simulation lab, Scala Computing, set up a range of scenarios comparing Network Cloud-AI’s DDC with a leaf-spine Ethernet fabric. The results: DriveNets’ solution recently showed 10% to 30% improved job completion time (JCT) in a simulation of an AI training cluster with 2,000 GPUs. The improved JCT, DriveNets said, can lead to 100% system return on investment (ROI), since networking represents 10% of system cost.

DriveNets’ test results show that Ethernet can indeed become a viable standard for AI networking implementations. With the growth of AI infrastructure in a gestational stage, one should expect that Ethernet is going to play an expanding role in supporting AI.

This ongoing evolution of AI networking technologies and architectures will be one of the most exciting elements to watch as the AI wave unfolds.

FAQs for AI Networking

What technology is emerging to address the challenges of AI networking?
Ethernet is emerging as the preferred choice for AI networking (even Nvidia hedged building an Ethernet-based solution), but will require some optimization to support the high-performance needs of AI.
What Ethernet solution does DriveNets offer?
DriveNets offers an Ethernet-based solution that works today in high-scale AI Infrastructures. It is based on the company’s field-proven scheduled-fabric that is part of DriveNets Network Cloud solution.
What chipset does DriveNets use?
DriveNets employs Broadcom’s new Jericho3-AI chip and Ramon 3 cell-based switching component, giving DriveNets’ DDC the ability to support 32,000 GPUs in a single cluster, each, connected with a 800-Gb/s Ethernet networking port.

Additional Resources for AI Networking

Download white paper

Distributed Disaggregated Chassis (DDC) as an Effective Interconnect for Large AI Compute Clusters

Why DriveNets Can Lead in Ethernet-based AI Networking

How do you support the high-performance needs of AI

Ethernet has a big opportunity in AI

DriveNets ethernet-based AI scheduled fabric

FAQs for AI Networking

Additional Resources for AI Networking