InfiniBand Alternative: DriveNets Network Cloud-AI

DriveNets Network Cloud-AI offers the highest-performance lossless Ethernet solution for AI networking back-end fabric. Its performance was tested and shown to equal that of InfiniBand, yet with standard Ethernet and faster deployment time.

DriveNets Network Cloud-AI:

  • Scheduled fabric based on a Distributed Disaggregated Chassis (DDC) technology
  • Boosts job completion time (JCT) performance by up to 30% compared to standard Ethernet Clos
  • Pre-scheduled data flows ensure predictable performance and low latency – cell- based load-balancing across all spine layers and End to End VOQ traffic management
  • High Radix Spine supporting up to 32,000 GPU connections in a single AI cluster
  • Standard Ethernet – ensures interoperability and supports any optics and any NIC
  • Fast deployment – plug-and-play with no fine-tuning during bring-up and workload changes

Network Cloud AI Cluster

AI Fabric and JCT

AI fabric is the networking instance that connects the graphics processing units (GPUs) in a training or inference GPU cluster.

  • Needs to be predictable and lossless to avoid GPU idle time
  • Any hiccup in connectivity between GPUs significantly degrades the cluster and its workload performance in terms of job completion time (JCT)

World's first scheduled AI fabric deployment

  • 1280 GPUs production cluster
  • DDC scheduled Ethernet fabric – 20xNCPs and 20xNCFs, 400Gps ports
  • Ethernet based endpoints, cell based fabric

8,192 GPU cluster example

DriveNets Network Cloud-AI and the breakout capabilities of NCP5-AI leaf switches

  • Creates a highly scalable network foundation for an 8,192 GPU cluster.
  • Each GPU benefits from a dedicated 400Gbps connection for efficient communication.
  • The 256 leaf switches connect to 36 spine switches, ensuring redundancy and efficient traffic routing across the entire cluster.

AI Fabric building blocks

Network Cloud Packet Forwarder (NCP)

Supplied by variety of original design manufactures (ODMs)

38x800G (30.4T)8x800GE + 20x800GE Fabric

High Scale 30.4T @ 2RU
18×800 NIF + 20×800 Fabric
Low Power Single ASIC per White Box
100G SerDes
Native 800G Native 800G OSPF for NIF & Fabric (Supports 2x400G OSFP)
Hardware Specifications
Interfaces
Network 18 x 800G OSFP
Fabric 20 × 800G OSFP
Inband Mgmt. 2 x 25G SFP28
OOB Mgmt. 2 x 10G SFP, 1x IG RJ45
Performance
Switching Capacity 30.4 Tbps
HBM Deep Buffer 16GB
Physical
ASIC Broadcom JAI (BCM88892)
Processor 8 Cores (Intel Xeon D – 1734NT)
Memory 64GB DDR4 SODIMM|
Storage 240GB (2 x 120)
Chassis 2RU
Typical / Max (with optics) 1350W / 1900W (14.5W port)

 

 


Network Cloud Fabric (NCF)

Supplied by variety of original design manufactures (ODMs)

128x800G (102.4T)

High Scale 102.4T @ 6RU
128×800 / 256x400G
Low Power Support up to 256 Leaf
Native 800G Native 800G OSPF
(Supports 2x400G OSFP)
Hardware Specifications
Interfaces
Fabric 128 × 800G OSFP
Inband Mgmt. 2 x 25G SFP28
OOB Mgmt. 2 x 10G SFP, 1x IG RJ45
Performance
Switching Capacity 102.4 Tbps
Physical
ASIC 2x Broadcom R3 (BCM88920)
Processor 4 Cores (Intel Xeon D – 1713NT)
Memory 32GB DDR4 SODIMM|
Storage 240GB (2 x 120)
Chassis 6RU
Typical / Max (with optics) 3150W / 4600W (14.5W port)

AI Networking alternatives

Network Cloud-AI Solution Benefits

Network Cloud-AI white paper

DriveNets Network Cloud-AI supports up to 32,000 GPUs (with 800Gbps connections) in a single cluster. With InfiniBand-level reliable connectivity, low latency, and practically zero jitter using the DDC cell-based, scheduled fabric technology, the solution maximizes network utilization and improves JCT performance by up to 30% compared to other Ethernet solutions. Moreover, it does not require expensive and power-hungry DPUs

DriveNets Network Cloud-AI offers an open architecture with high performance, adapting to changing models and network requirements. It ensures interoperability through Ethernet and remains vendor-agnostic across all hardware domains, allowing the use of any GPU and SmartNIC/DPU.

DriveNets is the sole vendor with a proven implementation of a large-scale scheduled fabric. Scheduled fabric is recognized as the highest performance solution by both Arista (DES) and Cisco (DSF). While their solutions are at early stages of deployment, DriveNets Network Cloud has powered the world’s largest DDC network for more than five years. DriveNets also has demonstrated remarkable AI workload performance and scalability in production implementations and field trials conducted with top hyperscalers.

Related Content
InfiniBand vs Ethernet – Why Ethernet fits AI Networking needs

Artificial intelligence (AI) has revolutionized various industries, driving the need for efficient networking solutions to support the massive data demands...

Read more
Optimize Your Enterprise AI with the Right Cluster Fabric

ChatGPT was introduced on November 30, 2022. Since then, AI (artificial intelligence) has become the most used buzzword across virtually...

Read more
Season 3 Ep 3: Solutions for Challenges in AI Networking

Today we're going to talk again about AI networking, and we will provide the solutions for the challenges we mentioned...

Read more