Resources

๐—ง๐—ต๐—ฒ ๐—œ๐—ป๐—ณ๐—ฟ๐—ฎ๐˜€๐˜๐—ฟ๐˜‚๐—ฐ๐˜๐˜‚๐—ฟ๐—ฒ ๐—ฅ๐—ฎ๐—ฐ๐—ฒ

At The AI Summit in New York, DriveNetsโ€™ Inbar Lasser-Raab (Chief Marketing Officer) discussed how the role of the network is expanding as AI workload demand grows.

The Infrastructure Race

Transcript
My name is Inbar Lasser Raab. I’m the CMO for DriveNets. I’m probably the odd bird in the panel here.
So DriveNets builds networking infrastructures and we started working with service providers building high scale networks. And if you’re using AT&T today, you’re probably running on a DriveNets network more than the majority of AT&T traffic now runs on DriveNets. What we’ve done and why is it relevant for today, we actually took traditional routers and we broke them into software cloud based software running on white boxes in clusters and then we can connect any port to any port in the clusters.

If you think about an AI world where you build large GPU clusters, you need to connect those GPUs any to any. So we use the same technology that we use to connect networking now to build AI clusters. So why is it all relevant to you? Some of you probably use cloud based solutions, right? Some of you build your own on-prem AI infrastructure. It’s relevant to you because 20% to 40% of the performance of the GPU is actually dependent on the network. So if you, how many of you use on prem, so how many of you building your own clusters? So you use cloud based, right? Just nod.

So when you buy GPU hours from your cloud provider you need to check what network that cloud provider has. Because you may be paying for thousands of GPUs but only 40% or 60% of them are utilized. You have to check what network those providers are using. So that is our story. Just one analogy. If you think about a car, a car has an engine, that’s your GPU. But your transmission system, your fuel system, your roads, that’s the network. So the car will not go too far. If you only use, if you have a good engine, you need the full system to be good. And that’s what networking brings to the story.

Throwing more GPUs at the problem?

Transcript
So it’s beautiful because both of you talked about throwing more GPUs at the problem and trying to optimize the application.
And the other point you’re on, that is really optimizing the network and that is additional 40% savings or optimization that you can get.
And I think what most people don’t understand is that if you use full solution from Nvidia and you have Infiniband, Infiniband is a pretty good, I mean, that’s the highest standard for optimization of the network.
But a lot of companies are moving to Ethernet and Ethernet is a lossy protocol, so it really impacts the performance of the cluster. So there are ways to optimize that Ethernet is built by or using an Ethernet that is optimized to be lossless, 100% utilization.
But there’s also work you can do with optimizing the application to the network itself.
And there are multiple layers there. If you use AMD to be the RCCL or Nvidia the NCCL or the CUDA or the ROCm. So it’s actually optimization across the the full software stack from the NIC and the switch all the way up to your application that a lot of people are not aware. There is a lot of fine-tuning that should happen to optimize so you can throw more GPUs but you will not improve because you don’t have that full stack optimization.
And that’s something actually I don’t expect every developer to do. That’s something that your solution provider should collaborate between the GPU and the network provider to make sure we optimize it for you.
So when you talk about bottlenecks, I think that’s something a lot of people don’t talk about. It’s the time to first token, it’s how long does it take you to optimize? You spend hours or days or weeks fine tuning. That should be something much shorter. Because you spend money, GPU time and money just doing that optimization. You want to make sure that it’s truly optimized when you have that full stack optimization across the board

A lot of NeoClouds that are now coming into the market

Transcript
He’s right. There are so many parameters and there is no one way to do things and there’s so many moving parts.
And you talk about inference. It’s the same with training, which we’re more involved with.
But I have one comment before I answer that you have a lot of NeoClouds that are now coming into the market, so it’s not a hyperscaler’s game anymore. So you guys as buyers have more choices, evaluate them and look at the statistics that Nadav was talking about. You do have more options, so that’s one thing. But in terms of evolving standards to optimize performance, I can only speak from the network perspective.
There are a couple of elements here that are important.
One element is you can move around if you have an Nvidia GPU and an AMD GPU or another accelerator, because your application needs to be adapted as well. So moving around different provider becomes more complicated. So you have to plan on it. When you develop your application, you have to build one that will be able to move around different GPUs as well. So that’s a complication that was not in the cloud world. Now it added another. So from our perspective, Ethernet is evolving. So even Nvidia is now betting on Ethernet versus InfiniBand.
And you have multiple standards around it. There is the Ultra Ethernet Consortium, if you heard about it, that does optimization of Ethernet itself, ending congestion and ensuring that there is end to end telemetry and optimization. But as you know, those standards evolve, but there’s a lot of proprietary enhancements.
So you still need to check different providers to ensure that you get the most optimized solution. So that’s what we’re trying to do. We support both that consortium endpoint scheduling of Ethernet, but we also have a solution called Fabric Scheduled Ethernet that sprays packets equally or cells across all the GPUs. So we actually reduce a lot of that fine tuning for you where it works efficiently out of the box from your application, because we make sure that the load balancing across the GPU is optimized and then GPUs are utilized 100% of the time. So if you ran on a fabric scheduled Ethernet network for your GPUs, you’ll actually have to spend much less time optimizing your application. So that’s something to really understand. And you guys never don’t mention a lot the network. I think it’s almost like an unknown secret in this industry how much it will impact your work and spare you time.

Want to learn how DriveNets is reshaping AI networking infrastructure?

Explore DriveNets AI Networking Solution