Blog
Artificial intelligence (AI) has revolutionized various industries, driving the need for efficient networking solutions to support the massive data demands...
Read moreDiscover the best alternative to InfiniBand – Ethernet with InfiniBand-level performance. DriveNets Network Cloud-AI: highest-performance Ethernet-based DDC scheduled fabric for back-end network of large-scale GPU clusters.
DriveNets Network Cloud-AI offers the highest-performance lossless Ethernet solution for AI networking back-end fabric. Its performance was tested and shown to equal that of InfiniBand, yet with standard Ethernet.
DriveNets Network Cloud-AI:
ByteDance deployed the world’s first 1K GPGPU production cluster powered by DDC scheduled Ethernet fabric in July 2024. The cluster handles a mixture of inference and training traffic from various applications. ByteDance’s existing operational toolkits, designed for non-scheduled fabrics, were easily ported to this cluster. The cluster has demonstrated excellent performance, as expected, and provided a smooth user experience.
AI fabric is the networking instance that connects the graphics processing units (GPUs) in a training or inference GPU cluster.
Traditionally, InfiniBand has been the technology of choice for AI fabric as it provides excellent performance for these kinds of applications.
InfiniBand drawbacks:
The obvious alternative to InfiniBand is Ethernet. Yet Ethernet is, by nature, a lossy technology that results in higher latency and packet loss, and cannot provide adequate performance for large clusters.
Ultra Ethernet, however, relies on algorithms running on the edges of the fabric, specifically on the smart network interface cards / controllers (SmartNICs) that reside in the GPU servers.
This means:
For instance, take a move from the ConnectX-7 NIC (a more basic NIC, even though it is considered a SmartNIC) to the BlueField-3 SmartNIC (also called a data processing unit or DPU); this translates into a ~50% higher cost (per end device) and a threefold growth in power consumption.
This is also the case with another alternative to InfiniBand coming from Nvidia, the Spectrum-X solution (based on their Spectrum-4 and future Spectrum-6 ASICs).
The best solution, in terms of both performance and cost, is the Distributed Disaggregated Chassis (DDC) scheduled fabric:
DriveNets Network Cloud-AI supports up to 32,000 GPUs (with 800Gbps connections) in a single cluster. With InfiniBand-level reliable connectivity, low latency, and practically zero jitter using the DDC cell-based, scheduled fabric technology, the solution maximizes network utilization and improves JCT performance by up to 30% compared to other Ethernet solutions. Moreover, it does not require expensive and power-hungry DPUs
DriveNets Network Cloud-AI offers an open architecture with high performance, adapting to changing models and network requirements. It ensures interoperability through Ethernet and remains vendor-agnostic across all hardware domains, allowing the use of any GPU and SmartNIC/DPU.
DriveNets is the sole vendor with a proven implementation of a large-scale scheduled fabric. Scheduled fabric is recognized as the highest performance solution by both Arista (DES) and Cisco (DSF). While their solutions are at early stages of deployment, DriveNets Network Cloud has powered the world’s largest DDC network for more than five years. DriveNets also has demonstrated remarkable AI workload performance and scalability in production implementations and field trials conducted with top hyperscalers.
Chassis or Clos AI fabric? What about both?
Distributed Disaggregated Chassis (DDC-AI) is the most proven architecture for building Ethernet-based open and congestion-free fabric for high-scale AI clusters.
DDC-AI offers:
Blog
Artificial intelligence (AI) has revolutionized various industries, driving the need for efficient networking solutions to support the massive data demands...
Read moreWhite Papers
Distributed Disaggregated Chassis guarantees lossless connectivity for a large-scale server array running high-bandwidth workloads free of flow discrimination and with...
Read moreCloudNets
Today we're going to talk again about AI networking, and we will provide the solutions for the challenges we mentioned...
Read more