Resources

PodcastsMay 30, 2023

DriveNets Aims To Make Ethernet AI-Friendly

Packet Pushers Network Break talks about DriveNets announcing Network Cloud-AI,  to support the networking for cloud based AI workloads. DriveNets is targeting hyperscalers that are running these massive GPU clusters for AI workloads and you want to minimize latency and GPU idle time as data gets moved around these clusters.


Source: Packet Pushers Network Break 432: DriveNets Aims to Make Ethernet AI-Friendly

Full Transcript

All right, let’s dive into some news.
The startup DriveNets, which provides virtualized networking for clouds, has announced Network Cloud-AI.
This is designed to support the networking for cloud based AI workloads. So DriveNets is targeting hyperscalers that are running these massive GPU clusters for AI workloads and you want to minimize latency and GPU idle time as data gets moved around these clusters.
So sometimes ethernet isn’t the best approach, but Hyperscalers may also not want to rely solely on InfiniBand because they’re specialized hardware and specialized talent.
So DriveNets is letting you use Ethernet nics and broadband white boxes, and essentially what they’re doing is breaking ethernet frames into cells that can then be evenly distributed across fabric ports in a leaf spine architecture.
So you get highly precise scheduling to ship data box to box.
DriveNets is taking advantage of an open compute project called Distributed Disaggregated Chassis to make this happen.
This is weird Drew, because I spent a couple of days last week thinking, that no way that InfiniBand can survive.
Surely not.
Then I went off and read a bunch of articles about InfiniBand and I dug into the Nvidia’s AI solution, which is heavily into InfiniBand.
But what I realized was ultimately InfiniBand only sort of scales to 100 or a couple of hundred or maybe a few hundred nodes at most.
I don’t think the lack of experience with InfiniBand is a big deal because Infinity band is a bit like fiber channel.
It’s mostly plug and play.
You don’t need to fuss with it.
It’s not like an ethernet network which
needs constant operation administration.
It was actually designed well at the start.
It’s a bit like the old FDDI and the token rings.
They’ve got self healing and a whole bunch of features in IB that way, way better than stupid ethernet and dumb IP by comparison.
But guess where the market is?
It’s not making InfiniBand chipsets and it’s not making InfiniBand nics anymore.
Everybody then volume is where the prices is.
And so what DriveNets is is tacked into this.
And their approach is to basically use cells.
So instead of transmitting ethernet frames across the backbone, you receive the ethernet frames from the server, cut them up into cells. And now any jitter as you move across the network because of overloaded, because of incast, or because of multicast, is now obviated because the cells have a fixed length. And so as you come in, you sit in the buffers of the switch fabric and you have a known time.
And so you can never oversubscribe to fabric.
You can move to something that’s approximating a synchronous type of an architecture.
And this is really important because in AI training and model generating, you’re using massively parallel processing to hand tasks into GPUs.
And then we’ve reached the limits of what’s inside the server.
So, for example, if you look at an Nvidia A100 chassis, they fit eight massive GPUs inside the server.
The whole of the inside of the server is GPUs plugged into a motherboard.
And not for display, just for processing.
Right?
A special type of GPU sort of optimized for training models and stuff like that.
If you can put a small scale AI, you need a dozen servers in a row.
I think they sell them in clusters of five.
You buy a rack set of five racks and so you can build InfiniBand out to that sort of model.
And because these things are so power hungry, you’re only putting like four or five of them in each rack.
But we’re now at the point where AI needs to scale up their entire second tier cloud providers buying AI hardware so that they can do AI, offer AI services via the Nvidia software stack to their clusters.
And Nvidia, I don’t know if you followed the financial news this week, but Nvidia reset their sales projections for the next quarter upwards by 35%.
So this means their sales went from something like, I can’t remember the exact number, but it was like the final number was 1.3 billion and it was up from a billion or something like that.
The market went bonkers.
They actually made $200 billion in the space of 30 minutes because the share price shot up.
Well, some don’t, don’t quote me on the numbers, go and look it up.
But it was just bonkers amount.
Right.
So Nvidia’s, you know, got this position, but InfiniBand for Internet is not going to scale much beyond 100, 200 types of clusters.
You can make it happen, but you have to do a lot of work to be able to do it.
Now, it’s true that InfiniBand has some specific features for direct memory transfers.
So you can actually transfer from a memory location on this server to a memory location on that server.
And InfiniBand has a specific set of software features to say, I’m not going
to transmit, and pass it down to the operating system, down to the nic, down over the network, back up the nic, reassemble it and hand it back up, which is very processor intensive and very latent, very slow.
So InfiniBand reduces that you can actually use the software mode.
So I reckon we’re going to see a lot more of this AI ethernet or AI data center networks.
So what DriveNets has done here is beat most of people to the market.
They’ve actually produced a cellular style of technology across the backbone using Jericho3s, I think, in the announcement, is that right?
Yes, Jericho3s and also the Ramon chipset.
Right.
So they’re using a specific set of white box equipment here to run their software on using a cellular backbone to give you a much jitter free ethernet network.
And this is roughly equivalent to what Google is doing with the Tequila protocol.
So I put a link to that in the show notes, where you can go and see what Google there.
Google published a little white paper, A Unified Low Latency Fabric for Data Center Networks.
And I mean all power to DriveNets for getting something out of the out of the box here really quickly and getting in the competition long before Arista, Juniper, Cisco have even talked about this, they’ve got a solution that’s shifting.
Well, I will say I’ve seen a Tech Field Day presentation from Arista talking about their ability to support AI workloads on an Ethernet fabric using protocols like RDMA and Rocky.
So I’ll see if I can find that link and drop it in the show notes.
So Arista is also making efforts here.
But yes, kudos to DriveNets for, I mean, obviously targeting hyperscalers at the outset because they’re the ones building these networks to process their own workloads and workloads for customers.
Rocky is one of those if your only tools a hammer, every problem looks like an owl.
Rocky is pretty much failed for everybody who’s used it.
And at the end of the day, the challenge is that if you can only build a network out of Rocky and have no other traffic, then in theory you can make that work.
But Rocky doesn’t do it quite so well and it’s very difficult to sustain and requires you to do a whole bunch of things.
RDMA over Ethernet is the software that you use to write from memory location to memory location, remote direct memory access, RDMA.
But that doesn’t remove the jitter, it doesn’t remove the latency that happens.
It doesn’t stop the buffers from pooping themselves and dropping frames or dropping packets.
Right, well, of course Arista would argue that it has the deep buffers to prevent those kinds of jitter issues.
But I’ll let you argue with Arista.
I think the whole deep buffer thing is overblown.
There is some value there, but that’s a bit like saying when you go to the petrol station, you buy the 95 octane, not the 92.
Ultimately, I think this is incredibly clever of DriveNets.
I can see there’s some plenty of room here for a better solution to come out because they’re not requiring any special Ethernet nics here.
Right.
So you can just go in and a lot of AI fabrics, a lot of AI data center fabrics are actually going to be pots of gold or patches of gold or a gold mine inside the brownfield.
You’re not going to wipe out your entire data center network and then put this in to replace it.
You’re just going to put in a bunch of switches and a bunch of AI servers and they’re going to be over there and there’ll be a cable going over to the AI sort of thing.
And this network won’t be used for anything else because you just want to maximize the performance of this.
So it’s a pot of gold in the middle of a brownfield.
So there is definitely room here for Direct DriveNets to get traction with customers who want an AI or a highest performing ethernet network.
It’s sort of like very different from HFT.
We talked a lot about low latency trading or high frequency trading networks that absolutely want the lowest latency, but it’s very different to that.
AI actually has to send traffic.
Those high HFT networks are very low volume.
All they want is fast transmission or fast propagation.
These networks need to be running at 100% throughput at consistent jitter.
Even better, it should be the fastest possible network that you can build.
And that’s where InfiniBand falls down, I think, to my uncertain knowledge, because I wasn’t able to get 100% confirmation, is that InfiniBand caps out at about 200 gigabits per second and Ethernet is already at 800 and they’re already talking about 1.6 terabit ethernet in the next two or three years.
So which one are you going to do?
Yeah.
Don’t bet against ethernet.
That’s the rule.
It may be cheap and nasty, but
it’s all that’s why everybody’s buying it.
Cheap and nasty.
Cheap and nasty.