AI Fabric Overview
For large clusters bundling hundreds or thousands of GPUs, the networking (or fabric) part of AI clusters, back-end networking is a crucial element, impacting the overall performance of the clusters and the efficient utilization of their compute resources. DriveNets’ AI fabric offers the highest-performance Ethernet-based DDC scheduled fabric as a strong alternative to InfiniBand for the back-end network of large-scale GPU clusters.

InfiniBand Drawbacks

Traditionally, InfiniBand has been the technology of choice for AI fabric as it provides excellent performance for these kinds of applications.

InfiniBand drawbacks:

  • Practically, a vendor-locked solution (controlled by Nvidia)
  • Relatively expensive
  • Requires a specific skillset and several fine-tuning efforts for each type of workload running on the cluster

Ethernet Challenges

The obvious alternative to InfiniBand is Ethernet. Yet Ethernet is, by nature, a lossy technology that results in higher latency and packet loss, and cannot provide adequate performance for large clusters.

  • The Ultra Ethernet Consortium (UEC) aims to resolve Ethernet’s drawbacks by adding congestion control and quality-of-service mechanisms to the Ethernet standards.
  • The emerging Ultra Ethernet standard, whose first version release is expected in late 2024, will allow hyperscalers and enterprises to use Ethernet with less performance compromise.

The costs of NIC-based solutions

Ultra Ethernet, however, relies on algorithms running on the edges of the fabric, specifically on the smart network interface cards / controllers (SmartNICs) that reside in the GPU servers.

This means:

  • A heavier compute burden on those SmartNICs, higher costs
  • Greater power consumption

For instance, take a move from the ConnectX-7 NIC (a more basic NIC, even though it is considered a SmartNIC) to the BlueField-3 SmartNIC (also called a data processing unit or DPU); this translates into a ~50% higher cost (per end device) and a threefold growth in power consumption.

This is also the case with another alternative to InfiniBand coming from Nvidia, the Spectrum-X solution (based on their Spectrum-4 and future Spectrum-6 ASICs).

  • Another Ethernet-based solution (like that of the UEC) that resolves congestion at the end devices
  • Also locked to Nvidia as a vendor

AI Cluster Reference Design Guide


DDC Scheduled Fabric– The Best Performing Solution

The best solution, in terms of both performance and cost, is the Distributed Disaggregated Chassis (DDC) scheduled fabric:

  • Not vendor-locked
  • Does not require heavy lifting of SmartNICs
  • Makes the AI infrastructure lossless and predictable without requiring additional technologies to mitigate the congestion

AI Fabric building blocks

Network Cloud Packet Forwarder (NCP)

Supplied by variety of original design manufactures (ODMs)

38x800G (30.4T)8x800GE + 20x800GE Fabric

High Scale 30.4T @ 2RU
18×800 NIF + 20×800 Fabric
Low Power Single ASIC per White Box
100G SerDes
Native 800G Native 800G OSPF for NIF & Fabric (Supports 2x400G OSFP)
Hardware Specifications
Interfaces
Network 18 x 800G OSFP
Fabric 20 × 800G OSFP
Inband Mgmt. 2 x 25G SFP28
OOB Mgmt. 2 x 10G SFP, 1x IG RJ45
Performance
Switching Capacity 30.4 Tbps
HBM Deep Buffer 16GB
Physical
ASIC Broadcom JAI (BCM88892)
Processor 8 Cores (Intel Xeon D – 1734NT)
Memory 64GB DDR4 SODIMM|
Storage 240GB (2 x 120)
Chassis 2RU
Typical / Max (with optics) 1350W / 1900W (14.5W port)

 

 

Network Cloud Fabric (NCF)

Supplied by variety of original design manufactures (ODMs)

128x800G (102.4T)

High Scale 102.4T @ 6RU
128×800 / 256x400G
Low Power Support up to 256 Leaf
Native 800G Native 800G OSPF
(Supports 2x400G OSFP)
Hardware Specifications
Interfaces
Fabric 128 × 800G OSFP
Inband Mgmt. 2 x 25G SFP28
OOB Mgmt. 2 x 10G SFP, 1x IG RJ45
Performance
Switching Capacity 102.4 Tbps
Physical
ASIC 2x Broadcom R3 (BCM88920)
Processor 4 Cores (Intel Xeon D – 1713NT)
Memory 32GB DDR4 SODIMM|
Storage 240GB (2 x 120)
Chassis 6RU
Typical / Max (with optics) 3150W / 4600W (14.5W port)

Distributed Disaggregated Chassis for AI (DDC-AI)


Chassis or Clos AI fabric? What about both?
Distributed Disaggregated Chassis (DDC-AI) is the most proven architecture for building Ethernet-based open and congestion-free fabric for high-scale AI clusters.

DDC-AI offers:

  • Standard Ethernet
  • AI scheduled fabric – predictable lossless cluster back-end network
  • Proven performance at par with InfiniBand
  • Highest AI scale (up to 32K GPUs at 800Gbps)
  • Up to 30%JCT improvement compared to standard Ethernet Clos
  • Over $50M total cost of ownership (TCO) reduction for an 8K-GPU cluster
  • No vendor lock – supports any GPU, any NIC

Related Content
Building an 8K GPU Cluster with High-Performance Ethernet Connectivity

Let’s go under the hood of DriveNets Network Cloud-AI…When building a large GPU cluster for artificial intelligence (AI) training purposes,...

Read more
Scala Computing DriveNets Simulations External Report

Independent testing by the leading scalable data center simulation lab Scala Computing validates that Network Cloud-AI improves Job Completion Time...

Read more
Season 4 Ep 4: Comparing the industry’s leading scheduled fabrics

What's the difference between DDC, DES, and DSF? ...

Read more