Insights from ISC 2025: Evolving AI and HPC Networks

Getting your Trinity Audio player ready...

This year’s event, as always, revolved around the industry’s top dog—Nvidia—with plenty of buzz around its latest products and services. The event also highlighted the industry’s growing interest in compute capacity trends, innovative cooling solutions, and next-gen storage technologies.

But from a networking perspective, the conversations went well beyond the usual. Ethernet had more presence than ever as an emerging interconnect solution. Real-world multi-tenancy challenges were discussed more frequently, even in the context of HPC clusters. And, of course, there was strong interest around the recent release of Ultra Ethernet Consortium (UEC) Specification 1.0.

Download now!

Fabric-scheduled Ethernet as an effective backend interconnect for large AI compute clusters

I left the event with four key takeaways about networking:

#1 While InfiniBand remains strong, why is InfiniBand no longer the only game in AI networking?

InfiniBand continues to be a dominant force in both AI and HPC environments. Its performance and familiarity make it the go-to networking solution for many. That has not changed. But what has changed is that InfiniBand is no longer the only serious option on the table. Alongside other HPC-specific networking technologies like Slingshot and Omni-Path, Ethernet is now gaining significant attention—especially in multi-tenant environments where flexibility and cost-efficiency are critical.

Once viewed as a low-cost, low-performance choice, Ethernet is now stepping up as a high-performance interconnect, narrowing the performance gap using technologies like RoCEv2, scheduled fabric, and UEC. At ISC25, it was clear that more cluster builders are looking at Ethernet as a viable alternative, especially for AI workloads that demand greater flexibility and lower costs.

#2 Is the release of UEC 1.0 a game changer and transforming Ethernet for AI?

One of the key networking discussions at ISC25 was the release of UEC Specification 1.0. Designed to address Ethernet’s fragmentation and deliver more consistent performance, the standard aims to make Ethernet a more accessible choice for large-scale AI and HPC deployments.

DriveNets is proud to be a member of the UEC and is fully committed to supporting the standard within our Ethernet-based fabric solution. For organizations that may have previously dismissed Ethernet for high-performance backend networking, UEC 1.0 represents a turning point; it brings new confidence in Ethernet’s ability to support the scale, reliability, and performance demands of next-generation AI and HPC infrastructures. While UEC is still in its early stages, DriveNets’ Schedule Fabric already delivers many of the key benefits the standard aims to achieve.

#3 Why is multi-tenancy still a significant challenge for AI and HPC clusters?

A recurring theme at ISC25 was the need for multi-tenant architectures. As AI and HPC clusters increasingly are shared across teams, departments, and—within GPUaaS providers—by enterprise customers, ensuring performance isolation and fair resource allocation has become a real challenge.

Traditional HPC fabrics like InfiniBand were not designed with multi-tenancy in mind, often requiring extensive tuning and architectural workarounds to deliver workload flexibility and isolation. These discussions reinforced our confidence in DriveNets’ approach: The deterministic nature of our scheduled fabric ensures tenant isolation and consistent performance right out of the box, without the need for complex and ongoing fine-tuning.

For HPC cluster architects at universities and research institutions, who must support diverse users and workloads, this translates to faster deployment, simplified operations, and seamless alignment with modern AI infrastructure models.

#4 XPU-to-XPU connectivity gains traction

XPU-to-XPU communication received more attention than ever. As AI models grow in size and complexity, the ability for xPUs to communicate efficiently, both within a server and across the fabric, is a performance challenge.

Broadcom’s new Tomahawk 6 platform introduced a key feature: Scale-Up Ethernet (SUE), designed to enable high-performance XPU-to-XPU connectivity—similar to what Nvidia’s NVSwitch delivers. Whether it’s Broadcom or Nvidia powering the cluster’s network, the trend is clear: Enhancing xPU-to-xPU communication, particularly in rail-optimized architectures, is becoming a key focus for next-generation AI and HPC clusters.

Final thoughts on AI and HPC networking from ISC 2025

The future of AI and HPC networking is being shaped not only by high performance, but also by increased openness, flexibility, and cost-efficiency. Whether it’s the increasing traction of Ethernet, the introduction of standards like UEC 1.0, or the growing need for multi-tenancy and improved GPU communication, it’s clear that networking solutions are evolving to meet new demands. And it’s also clear that DriveNets is well positioned to support this shift.

Key Takeaways

Once seen as low-performance, Ethernet is gaining traction thanks to innovations like RoCEv2, scheduled fabric, and the new Ultra Ethernet Consortium (UEC) 1.0 standard—making it a strong, flexible, and cost-efficient contender in AI and HPC networks.
Traditional HPC fabrics like InfiniBand struggle with multi-tenant needs. DriveNets’ scheduled fabric addresses this by offering out-of-the-box performance isolation and resource fairness—crucial for shared AI infrastructure.
With AI models increasing in size and complexity, the need for high-speed xPU communication—both within servers and across the fabric—is becoming a top priority. Innovations like Broadcom’s Scale-Up Ethernet (SUE) and Nvidia’s NVSwitch are setting the stage for more efficient, rail-optimized architectures that can support the scale and performance demands of future AI and HPC clusters.

Frequently Asked Questions

Why is Ethernet now considered best for AI and HPC workloads?
Ethernet has evolved significantly, with technologies like RoCEv2, scheduled fabric, and the introduction of the Ultra Ethernet Consortium (UEC) Specification 1.0. These innovations help reduce performance gaps and offer greater scalability, flexibility, and cost-efficiency—making Ethernet a viable alternative to InfiniBand in modern AI and HPC environments.
What challenges does multi-tenancy have for AI and HPC clusters?
Multi-tenancy introduces complexity in ensuring fair resource allocation and consistent performance across diverse users and workloads. Traditional HPC fabrics require deep tuning to manage this, while newer solutions like DriveNets’ scheduled fabric provide built-in isolation and determinism, simplifying deployment and operations.
Why is xPU-to-xPU connectivity becoming a critical focus in AI and HPC clusters?
As AI models scale in complexity, fast and efficient communication between xPUs—both within servers and across the network fabric—has become a major performance priority. At ISC 2025, technologies like Broadcom’s Scale-Up Ethernet (SUE) and Nvidia’s NVSwitch highlighted how next-gen infrastructure is increasingly designed to optimize xPU-to-xPU data flow, especially in rail-optimized architectures. These advancements are key to unlocking higher throughput and better scalability in modern AI and HPC workloads.