Next-Gen System Design for AI Networking
Maximize ML Performance
Recently at the AI Hardware and Edge AI Summit, DriveNets appears on a panel discussing how to maximize ML performance with next gen system design for AI Networking. Listen to some of the highlights below.
From the Networking side
Full Transcript
I want to take another aspect of that because when we talk about technologies, I think across the board, when we see the AI market evolving and forming, it is doing so so fast that there are multiple technologies running in parallel.
We see from the networking side, there are a lot of hyperscalers and cloud providers that build multiple clusters with different technologies and simply as they go test the different technologies and then select which one will lead the back. From the networking perspective, we see in all perspectives, we see major dominance of Nvidia at this stage. But unlike HPC, which is a siloed approach infrastructure in which you build an HPC and it’s closed, in AI it is very much connected to the cloud and the data centers and the Internet because in inference, for instance, it needs to be heavily connected to the outside world.
And therefore, if you look at Nvidia’s InfiniBand, for instance, which is the dominant technology networking for HPC, it lacks this openness and this standardization when it comes to an open infrastructure like AI. So there is a race to take an open approach which is probably based on Ethernet, because this is a dominant solution for anything that runs in the Internet and bring Ethernet to the performance required for an AI infrastructure.
And there are multiple initiatives to do it. We represent one of them which actually works and provides a very InfiniBand-like or high performance connectivity which allows operators to use Ethernet
without being held back by the loss in nature of it. But as I mentioned, everything is running in parallel and there are multiple initiatives and there are multiple testing. And I think, as you mentioned,
we live in a very exciting time because everything is happening super fast and it’s exploding. And I think if we’ll meet in a year from now, we will look back at the fear and say, wow, this was a crazy ride,
but we will be much more certain as to which technologies will prevail and will take us forward.
If you’re not visiting now, you’re probably in the wrong career. Yeah, we see that on the cold play side a lot as well.
So we have chip manufacturers, designers coming to us, we have server manufacturers and users coming to us, and we have even Rack and CDU partners coming to
us.
Networking resources are lagging
Full Transcript
Yeah. What I wish both the application guides and infrastructure guys will know or will bear in mind is something that has to do with a very famous slide by Alexis from Meta, and we saw a similar version
of it from Microsoft this morning here that the fact that the compute power or the compute capability is growing very rapidly tends to make them forget that the underlying infrastructure, specifically networking resources, power, cooling, et cetera, is lagging behind.
And if they don’t put their mind into it, they will have a huge processing power which is standing idle or standing or shut down because the underlying infrastructure is not there to support it and to enable it.
So this is something that they need to keep in mind, i think it’s common to all of us.
From the cold plate side, the thing I wish that clients knew and designers out there knew is that, of course, it’s not just TDP that matters.
It’s heat per unit area. It’s heat flux.
So we actually see lots of different chip designs come through, and we’ve seen some designs where all of the high
power transistors are clumped together.
It’s really hard to cool that hotspot.
The rest of it’s great, but you kind of compromised the performance with a hotspot.
If you can intersperse compute pockets that are higher power with areas that are lower power around it, you can actually get some heat spreading in the chip. And it actually makes by the time the heat gets to
the top of the surface case of the chip, it’s easier to actually dissipate, even though it’s still hard.
So on a design level, even where you place the compute banks as you’re drawing the circuits, think about thermals. It matters.