AI & Networking – a Match Made in Heaven

Well, when trying to understand that, we need to recognize that “AI networking” is actually a well-defined name for two, very distinct areas in the technology world.

Download now!

Fabric-scheduled Ethernet as an effective backend interconnect for large AI compute clusters

AI networking – when AI makes networking better

AI technology can make networking services, planning, customer experience, and (most of all) operations much better and more cost-efficient.

While the concepts of a self-healing, self-operating network and a fully automated customer lifecycle are very appealing to network operators, they have been quite illusive for many years. Yes, some customer service bots and operational scripts have been available for years, but those have not made the desired jump towards the end goal – a “driverless” network without human intervention.

Then came AI. And generative AI (or gen AI) brought some great news to two main areas – the customer lifecycle and network operations.

The ability to automate many customer interaction processes is now increased by gen AI. Instead of dependence on preconfigured scripts, the customer interaction process now can train itself and increase the number of cases it can close without the need for human intervention. This means reduced costs (from a smaller contact center) and better customer satisfaction.

On the other side of the corridor, the network operations center can also benefit from gen AI. The ability to automate network planning, forecasting and, more than all, network maintenance and troubleshooting, has again significantly increased. The ability, for instance, to train the system with past events in the network shortens both root cause analysis (RCA) and mean time to repair (MTTR), while improving SLA performance. To some extent, it even allows predictive and proactive fault handling, which can improve mean time between failures (MTBF), increase customer satisfaction, reduce churn, grow ARPU, and more.

AI networking – when networking makes AI better

Ask not (only) what AI can do for networking, but (also) what networking can do for AI.

AI, machine learning (ML), and specifically gen AI systems are large and complex. When looking at the AI training phase, inter-GPU (graphics processing unit) networking (often referred to as back-end fabric) is critical for training performance.

This performance is measured in job completion time (JCT), which indicates how well you utilize those very (very) expensive compute resources, predominantly general-purpose GPUs (GP-GPUs). A good networking back-end fabric (e.g., a scheduled fabric, like the one based on DDC – disaggregated distributed chassis) can (and does) increase JCT performance by more than 10% (and, in some cases, more than 30%). That means >10% better resource utilization across the entire GPU cluster. For an 8K GPU cluster (which is a medium-sized one when it comes to AI training), this leads to savings of up to $50M!

That’s what networking can do for AI!

AI and networking – making each other better

When you think about it, this synergy is kind of cyclic. AI technology is perhaps the one most dependent on connectivity and networking – not only for training but also for inference. The better the connectivity, the higher the performance, accessibility and responsiveness of AI systems.

And such high-performance DDC for AI networking systems are exactly what’s needed for improved networking services and infrastructures – delivering a better customer experience, and, most importantly, efficient operations.

Now that’s wonderful symbiosis, I’d say…

Download white paper

Utilizing Distributed Disaggregated Chassis (DDC) for Back-End AI Networking Fabric