Getting your Trinity Audio player ready...
|
Let’s dim the lights and look at the scary situations facing today’s networks.
Scary story #1: Traditional hardware-based networking solutions create excessively complex networks
When it comes to scalability, operations, flexibility, and cost, it is no secret that traditional hardware-based networking solutions haven’t solved the challenges holding back service providers. Years of patches and upgrades have transformed service provider networks into mazes of service-specific equipment, each requiring unique management skills, inflexible software update cycles, and a vast inventory of spare parts. Many networks even need dedicated teams for different network domains.
This approach not only increases operational costs but also restricts the service provider’s ability to innovate. Introducing new services becomes a months-long process due to the complicated operational matrix that needs realignment for each change. This leads to a pileup of unused, idle hardware resources that cannot be easily repurposed for other services.
Scary story #2: A very messy network architecture for Cable MSOs
Experiencing exponential growth in traffic across their networks, cable multiple-system operators (MSOs) have been feeling the ground shaking beneath them for a while.
Some operators run multiple-generation DOCSIS, as well as some xPON services across their network, yielding a mix of aggregation nodes, edge sites, termination systems, and core routers. In some cases there is a single instance of each function per access technology/generation. Bottom line – it’s a very messy architecture for cable MSOs.
Like other service providers, they have been watching hyperscalers vacuum up their margins, with their ability to adapt rapidly. All cable MSOs know the burden of operating a complex and inflexible large-scale network infrastructure with dozens of different routing boxes across different domains. Operating such a diverse network with multiple hardware types and generations requires a wide variety of management skills, software stack updates, and the availability of spare parts.
How are cable MSOs simplifying their network architecture through disaggregation and virtualization?
Scary story #3: Supporting heavy workloads on the Edge
As more and more latency-sensitive applications emerge – networking and compute functions need to move towards the network edge – meaning closer to the end user. Today, metro edge points must accommodate a wide range of use cases, including business, mobile, and residential services. This requires supporting high-speed broadband, VPNs, mobile backhaul, cloud services, video streaming, and generative AI – as well as the introduction of new business and industrial AI tools, advanced 5G IoT and smart devices, and virtual reality technologies. These applications are diverse, demanding real-time performance and high bandwidth. As the traditional core-out architecture cannot deliver the performance required for these applications, demand falls on the network edge.
How are service providers overcoming challenges at the network edge?
Scary story #4: Diverse ecosystem requires constant management
Current network topologies and operation models are fragmented and complex. With organizations relying on multiple vendors, tens of platform types, and hundreds of components, the diverse ecosystem – including aging hardware equipment and thousands of software versions, each with distinct data models, bugs, and security patches – requires constant management.
The support infrastructure is also fragmented, requiring dedicated teams for design, lab operations, IT, and development. In addition, the operating model is heavily manual, relying on command-line interfaces (CLIs), third-party tools, and various manual interfaces between systems. All these issues increase the risk of human error and hamper efficiency and scalability, creating challenges for network management and optimization.
Scary story #5: Congestion in AI workloads
AI workloads are large and getting larger. Model accuracy and the race to outpace the competition is calling for larger infrastructure to run on. AI workloads for training demands high-performance, lossless, and predictable connectivity between GPUs to reach the most efficient and optimal job completion time (JCT) targets.
The challenge is for how long it takes messages to run through the network between endpoints – not only flat latency but also the variation of delays between messages. This is due to the fact that GPUs are kept hanging until they receive full acknowledgement that a compute cycle has concluded. A long tail latency could eliminate any value of minimal head latency.
Different traffic patterns and workload characteristics, various AI models run differently, and often use different communication algorithms. Different packet sizes and flow sizes represent some of the different characteristics of the traffic that the network needs to propagate. The AI network cannot be fine-tuned to one specific traffic pattern but must satisfy them all. Whatever challenges or attributes of the networks which are noticeable in moderate scale are multiplied when scale increases.
How are enterprises best optimizing their networks for AI workloads?
Scary story #6: Automating a traditional monolithic approach
Maintaining large-scale networks is a highly complex task with overwhelming operational efforts and costs. Service providers (SPs) juggle between too many service-specific routers, thousands of different spare parts, diverse software updates and growing performance requirements — all while under pressure to meet strict service-level agreements (SLAs). Trying to implement automation in complex, patchwork traditional networks with dozens of different hardware boxes and software solutions is doomed to failure.
Download
Which Network Architecture Is Right for You