Blog
Let’s go under the hood of DriveNets Network Cloud-AI…When building a large GPU cluster for artificial intelligence (AI) training purposes,...
Read moreConnecting large GPU clusters for massive AI training is not simple. DriveNets’ AI fabric brings clear performance and scale advantages.
Traditionally, InfiniBand has been the technology of choice for AI fabric as it provides excellent performance for these kinds of applications.
InfiniBand drawbacks:
The obvious alternative to InfiniBand is Ethernet. Yet Ethernet is, by nature, a lossy technology that results in higher latency and packet loss, and cannot provide adequate performance for large clusters.
Ultra Ethernet, however, relies on algorithms running on the edges of the fabric, specifically on the smart network interface cards / controllers (SmartNICs) that reside in the GPU servers.
This means:
For instance, take a move from the ConnectX-7 NIC (a more basic NIC, even though it is considered a SmartNIC) to the BlueField-3 SmartNIC (also called a data processing unit or DPU); this translates into a ~50% higher cost (per end device) and a threefold growth in power consumption.
This is also the case with another alternative to InfiniBand coming from Nvidia, the Spectrum-X solution (based on their Spectrum-4 and future Spectrum-6 ASICs).
The best solution, in terms of both performance and cost, is the Distributed Disaggregated Chassis (DDC) scheduled fabric:
Supplied by variety of original design manufactures (ODMs) | |||||||||||||||||||||||||||||||
38x800G (30.4T)8x800GE + 20x800GE Fabric |
|
||||||||||||||||||||||||||||||
Hardware Specifications | |||||||||||||||||||||||||||||||
|
Supplied by variety of original design manufactures (ODMs) | |||||||||||||||||||||||||||||
128x800G (102.4T) |
|
||||||||||||||||||||||||||||
Hardware Specifications | |||||||||||||||||||||||||||||
|
Chassis or Clos AI fabric? What about both?
Distributed Disaggregated Chassis (DDC-AI) is the most proven architecture for building Ethernet-based open and congestion-free fabric for high-scale AI clusters.
DDC-AI offers:
Blog
Let’s go under the hood of DriveNets Network Cloud-AI…When building a large GPU cluster for artificial intelligence (AI) training purposes,...
Read moreWhite Papers
Independent testing by the leading scalable data center simulation lab Scala Computing validates that Network Cloud-AI improves Job Completion Time...
Read moreCloudNets
What's the difference between DDC, DES, and DSF? ...
Read more