Getting your Trinity Audio player ready...
|
It was an exhausting week here in San Jose at Nvidia GTC 2025. The conference is definitely becoming the main event for everything AI, and it is no longer just the “GPU Technology Conference” (from which it derived its initials).
This week was so busy with meeting, talking and walking, that I actually had very little time left for attending the sessions. My recap focuses on Nvidia, in general, and on networking infrastructure for AI and HPC, in particular.
Here are my thoughts on each subject…
Nvidia #1 – Dominance
Not news, I know. But the fact that this vendor-specific event is practically the industry’s main event this year (comparable to MWC in the service provider domain) further proves Nvidia’s dominance in the AI infrastructure space. I believe that this is the first time since the happy days of IBM (in the second half of the 20th century) that a single company has ruled an important technology domain at this magnitude.
You could argue that this is not a healthy market situation, and you would be absolutely right. But it is, nonetheless, very impressive to experience.
Nvidia #2 – Jensen
You cannot deny it – Nvidia CEO Jensen Huang is a rock star!
- First, in terms of onstage performance, he delivered a great joke – “May I talk to a human?” when a video clip got stuck during his keynote.
- Second, and most interestingly, there was a level of admiration that I have never, ever, seen towards someone in the hi-tech industry.
The bottom line is that Jensen is as dominant within Nvidia as Nvidia is dominant within the AI market. So Nvidia is, practically, Jensen!
Once again, you could argue that this is not a healthy market situation, and you would be absolutely right. But it is, yet again, very impressive to experience.
Nvidia #3 – It’s the network, stupid!
One thing Jensen mentioned in his keynote address really caught my attention – Nvidia’s move from InfiniBand to Ethernet-based solutions for AI infrastructure. There are two things to note about this announcement:
- The fact that networking was a (key) topic in the keynote (and across the entire event) because, well, it is networking, stupid! Even though networking is a small part (~10%) of the entire infrastructure cost, it can account for as much as 80% of infrastructure complexity. That’s why Jensen talked about Ethernet, co-packaged optics and other networking topics in his keynote.
- The second, and perhaps more important thing to notice about this announcement, is that Nvidia has (finally) admitted that the world is going towards Ethernet and moving away from the semi-proprietary, yet still high-performance benchmark, InfiniBand.
Networking #1 – Performance
This brings us to the first, most discussed topic, when it comes to networking – performance. The golden standard for AI backend fabric in terms of performance is InfiniBand. The race is now on for achieving this level of performance in an Ethernet-based architecture.
Nvidia’s Spectrum-X is getting close, but, some would say, not close enough; it still yields lower performance than InfiniBand (though much higher than plain-vanilla Ethernet). This will most likely be the case with any endpoint-scheduled solution (e.g., Ultra Ethernet). Many people we talked to at this event came to realize that the only Ethernet-based solution that can reach the required level of performance is a fabric-scheduled solution.
Networking #2 – Cost
Next is the cost issue. While this is pretty straightforward, it’s worth mentioning as it was the second topic many customers raised. The cost structure of an Nvidia-centric solution is, naturally, problematic. The way to resolve it is to go Ethernet and open, from the ODM to the optics level. This saves a lot of money. This is, of course, valid only if you resolve the first issue of performance.
Networking #3 – Lead time
This, I have to admit, took me a bit by surprise, as I did not expect it to be one of the top three challenges raised by customers. It all boils down to time to market (TTM) and supply-chain diversity. If you have a solution that is on par (and, in some cases, better) in terms of performance, that reduces costs significantly, and also that shortens TTM from months to weeks – you have a no-brainer decision!
It was a fascinating event. I have to wonder – will the next GTCs be as important for our industry, or will Nvidia, and thus the events, lose some of their dominance? Time will tell.
And, lastly, on a personal note. Nvidia is building something very exciting that is changing the industry, and, to some extent, humanity. As part of a growing but small vendor (well, tiny, compared to Nvidia), it feels good to be a part of this huge thing. As it turns out, building a high-performance (highest, even) Ethernet-based backend fabric is a very important part of the industry’s evolution – because “it’s the network (stupid)” and it’s Ethernet (towards which the industry is moving). Not a bad place to be…
Related content for AI networking architecture
DriveNets AI Networking Solution
Latest Resources on AI Networking: Videos, White Papers, etc
Recent AI Networking blog posts from DriveNets AI networking infrastructure experts
White Paper
Fabric-Scheduled Ethernet as an Effective Backend Interconnect for Large AI Compute Clusters