Superior Economics with Scheduled Fabrics
Meeting with NextGenInfra.io, Dudy Cohen, VP of Product Marketing at DriveNets, explains how Ethernet is evolving through advancing speeds (800G to 3.2T), co-packaged optics (CPO) integration, and scheduling layers that address lossiness to support AI infrastructure with GPU clusters scaling from 8,000 to nearly one million units. He describes how DriveNets has adapted its scheduled fabric technology, delivering superior time-to-first-token performance and lower cost-per-million-tokens compared to proprietary technologies.
Full transcript
Ethernet evolution: speed, optics, and scheduling
Ethernet is undergoing changes or evolution in multiple paths. One path is feeds and speed. Ethernet has always been the fastest-growing technology in terms of line rates. Now it’s moving to 800G, which is the latest. Next year we will see 1.6 Terabit, and later on 3.2 Terabit, et cetera—outpacing any other technology in that manner.
There is also the integration of optics into switches and into end devices, with co-packaged optics (CPO)—we see that happening this year. And of course, as I mentioned, the introduction of scheduling layers, or additional layers on top of plain Ethernet, which resolve its inherent problem of being a lossy technology.
Scheduling—either at the endpoints or in the fabric—is helping Ethernet tackle new use cases that it did not fit until now. So it can be scaled up. It can be scaled out for back-end networks connecting GPUs across clusters, connecting different data centers, and so on.
Is Ethernet really suited for very large GPU clusters in AI data centers?
When it comes to the size of the cluster, the sizes are growing. For different types of customers, we see different ballparks. For enterprise customers or newer cloud customers we see several thousands; the sweet spot seems to be around 8,000 GPUs per cluster. But for the largest neo-clouds, hyperscalers, and the giants developing LLMs, we see exponential growth—tens of thousands, hundreds of thousands—and I think we will soon cross one million GPUs in a single cluster.
There, the need for a very scalable and simple-to-scale network technology is very evident. Ethernet, being the ruling technology in the data-center domain and the technology that scales across the entire infrastructure of the Internet, is a very easy choice.
When you go to tens or hundreds of thousands of GPUs, you need to go to endpoint scheduling, because there is basically no limit to the number of endpoints you can connect. For smaller scales—the sweet spot of 8,000— fabric scheduling will provide much better results in terms of performance. But those two technologies coexist, and each customer selects the variant of scheduled Ethernet that suits their needs.
Why is ESUN needed?
ESUN is needed in order to scale Ethernet up to the level required by scale-up networks, because scale-up networks have very specific requirements in terms of capacity, latency, and robustness—or the lack of packet loss. ESUN tackles those points, allowing Ethernet—which was not designed to be a scale-up protocol—to be an alternative to technologies like NVLink, CXL, or UAL, which were designed from the bottom up for scale-up.
I think ESUN is the first time that Ethernet is considered a valid solution and a good alternative to those very closed and very limited solutions that were designed specifically for scale-up networks.
What is the difference between fabric scheduling and end-point scheduling?
With fabric scheduling, the scheduling is done with a cell-based fabric. Any packet that enters the system is broken into cells, sprayed across the entire fabric, and reassembled on the other side. This is controlled by the switches themselves and ensures that even at very high fabric utilization, you will not suffer from any packet loss, jitter, or delay. This results in very good job completion time and very good bus bandwidth within the server’s collective communication library.
With endpoint scheduling—as in the Ultra Ethernet standard—the management of how traffic runs through the fabric is done by the endpoint, specifically by NICs or SmartNICs in the server. They receive information from telemetry monitoring the network, find congestion hotspots, and engineer traffic to bypass those hotspots and better balance the fabric. This gets you very high fabric utilization, and from the workload side, it shortens job completion time because GPUs spend less time waiting for network resources and more time processing data.
DriveNets strategy: how does the scheduled fabric design fit across customer segments?
This drive started about ten years ago in the service-provider domain with customers like AT&T. We implemented the scheduled fabric to build scalable routers—using basic white-box building blocks connected to a fabric white box, rather than a chassis format—enabling very large routers deployed in service-provider core networks.
When the AI industry started growing exponentially, we found that this scheduled-fabric solution is very suitable for back-end networks connecting GPUs. This is why we were very fast to implement this solution and have working deployments with both hyperscalers, new cloud, and enterprise customers.
DriveNets’ strategy is essentially to modernize or reinvent networking. We use the same technology—with tweaks to functionality and scale—for AI infrastructure. It is very successful in both markets.
How important is the fabric architecture to the success or failure of data centers?
At the end of the day, you need to connect the infrastructure to the business efficiently. This market is full with new clouds, hyperscalers, and enterprises rushing into AI infrastructure investment. This infrastructure needs to be very efficient in terms of time to first token—how quickly you can start producing with the infrastructure you build.
Time to first token is influenced by supply chain, hardware availability, simplicity of setup, and the amount of time and effort required for fine-tuning the architecture.
The other key metric is cost per million tokens. When you run such large-scale infrastructure projects, marginal cost at the end of the day defines your business. Going to an Internet-based, open, high-performance Ethernet solution makes the time to first productive output—and the marginal cost of each token—much better economically than any proprietary alternative.