June 7, 2023

Director of Product Marketing

Why InfiniBand Falls Short of Ethernet for AI Networking

Artificial intelligence (AI) has revolutionized various industries, driving the need for efficient networking solutions to support the massive data demands of AI applications. DriveNets’ new Network Cloud-AI solution utilizes Ethernet for AI networking. In this blog post, we will explore and compare InfiniBand and Ethernet, highlighting their respective strengths and weaknesses within the context of AI networking.

Why InfiniBand Falls Short of Ethernet for AI Networking

What is InfiniBand 

InfiniBand is a high-speed networking technology primarily designed for high-performance computing (HPC) environments. It offers extremely low latency and high bandwidth, making it suitable for applications that demand predictable and lossless fabric. Though InfiniBand is a powerful network technology, it is a practically proprietary protocol that comes with a hefty price tag and vendor lock-in on the networking and GPU levels. As a result, the industry is looking for alternative solutions that are more cost-effective and free of vendor lock-in. 

What is Ethernet 

Ethernet, on the other hand, is a widely adopted networking technology that has evolved over the years to meet the growing demands of data centers. It is the de facto networking standard today, with over 600M ports shipped annually. It offers flexibility, scalability and ease of use, making it a popular choice for various applications, including AI networking. With advancements such as data center bridging (DCB) for use with clustering and storage area networks, Ethernet has improved its performance characteristics, including reduced latency and enhanced quality of service (QoS). 

AI networking: Let’s Compare InfiniBand and Ethernet  

Until now, AI networks have been based on either Ethernet-based or semi-proprietary solutions. Traditional Ethernet leaf-and-spine architecture is not designed to support high-performance AI workloads at scale. Semi-proprietary solutions such as Nvidia’s InfiniBand do not support network interoperability and provide little flexibility for hyperscalers looking to avoid vendor lock-in.  

Bandwidth and latency

InfiniBand excels in terms of raw bandwidth, with its latest generations offering speeds up to 200 Gbps and beyond. This high throughput is beneficial for AI workloads that involve massive data transfers. However, Ethernet has also made significant strides, and modern Ethernet technologies such as 800 Gbps interfaces, which InfiniBand will not support for two more years, provide substantial bandwidth meeting the requirements of most AI applications. While InfiniBand traditionally offered lower latency, advancements in Ethernet technologies have significantly narrowed the gap, making it a viable option for low-latency AI workloads. 

Scalability and flexibility

Ethernet’s widespread adoption and compatibility make it highly scalable and flexible. It is compatible with existing data center infrastructure and supports a broad range of devices, making it easier to integrate into diverse network environments. In contrast, InfiniBand may require specific hardware and software configurations, limiting its scalability and interoperability. Ethernet’s compatibility and cost advantages over time give it an edge in traditional infrastructure clusters. 

Security and management

InfiniBand lacks Ethernet’s breadth of security and management features, which have been built by Ethernet vendors over multiple decades. Ethernet’s long-standing presence in enterprise and service networks has enabled the development of robust security protocols and comprehensive management capabilities. This makes Ethernet a more favorable choice for organizations that prioritize security and efficient network management in their AI deployments.  

Cost-effectiveness and industry adoption

Ethernet’s popularity and mass production have made it more cost-effective compared to InfiniBand. The widespread use of Ethernet components and equipment results in lower costs for deployment and maintenance, making it an attractive choice for organizations with budget constraints. Furthermore, IDC research suggests that Ethernet remains the protocol of choice for the vast majority of AI workloads, estimating that 90% of AI workloads will run on Ethernet in 2025. While InfiniBand may have some niche use cases in HPC-like workloads, Ethernet is well-positioned for both external connectivity and internal compute networks, for a variety of applications and AI workload types that are connected online. 

Best of both worlds: DriveNets Network Cloud-AI 

DriveNets Network Cloud-AI offers the best of both worlds. It supports up to 30% improvement in job completion time (JCT) of large-scale AI workloads compared to other Ethernet solutions, substantially improving resource utilization. It also supports standard Ethernet, which allows for vendor interoperability and choice. DriveNets Network Cloud-AI is an innovative artificial intelligence networking solution designed to maximize the utilization of AI infrastructures and improve the performance of large-scale AI workloads. Built on DriveNets Network Cloud, which is deployed in the world’s largest networks, DriveNets Network Cloud-AI has been validated by leading hyperscalers in recent trials as the most cost-effective Ethernet solution for AI networking. By utilizing Ethernet, DriveNets offers a cost-effective and scalable solution that integrates seamlessly with existing network infrastructure both for internal compute and external connectivity. 

The solution offers lossless connectivity, low latency and low jitter by collapsing multi-tier Clos architecture (ToR/leaf-spine/super-spine) into a flat, single-switch architecture with inter-rack connectivity running on a scheduled fabric (and not an Ethernet-based multi-hop interconnect).  

Read more about the DriveNets Network Cloud-AI vs. InfiniBand or traditional Ethernet

Watch a video

Effectively leveraging ethernet for AI networking with scheduled fabric 

InfiniBand and Ethernet are both powerful networking technologies with unique strengths in different contexts. While InfiniBand traditionally excelled in high-performance computing environments, Ethernet has evolved to meet the demands of modern data centers and AI applications. The DriveNets Network Cloud-AI solution demonstrates how Ethernet can be leveraged effectively for AI networking with scheduled fabric. DriveNets Network Cloud-AI presents a unique and innovative architecture that offers the high performance and scale of a fabric interconnect solution with the cost-effectiveness of an open, disaggregated cloud solution. 



Utilizing Distributed Disaggregated Chassis (DDC) for Back-End AI Networking Fabric

Read more