While searching for a robust and balanced solution, service providers encounter many networking solutions that claim to be carrier-grade. Theoretically, carrier grade is what service providers are supposed to demand to reduce risk and gain confidence, but the label is often applied without much value.
Service providers should evaluate the following key elements before choosing a carrier-grade networking vendor.
#1 High Availability and Multi-Level Redundancy
Service providers invest extensive capital in premium infrastructure to support new demands from enterprises and other businesses. Yet the ability to monetize this infrastructure within the business sector relies upon its availability and redundancy. This is especially true when service providers offer “five nines” (99.999%) SLA, tolerating only 5.15 minutes of downtime per year.
However, redundancy and availability should not be measured when everything is running smoothly; rather, they should be measured based on how the solution tolerates failures. A serious carrier-grade vendor will be able to assure network performance even when failures occur.
Key parameters for high availability and redundancy include:
Failure rate (AKA mean time between failures (MTBF)) – How often does a networking function fail?
The more complex systems (and their components) are, the more susceptible they are to failure. Service providers, therefore, should embrace simplicity. Using simple and standard white boxes as a unified building block across all network locations and functions reduces failure rates and, as a result, improves network availability.
Failure radius (AKA blast radius) – What is the collateral damage inflicted by the failure?
“The bigger they are, the harder they fall.” This is exactly what happens in most situations when using a large-scale monolithic router – any failure can damage the whole router and the blast radius can be catastrophic.
The best way to reduce the failure radius is by completely separating the router’s hardware and software components. When each component acts as a stand-alone element, a failure in one component does not affect the others and the blast radius is minimized to practically zero.
Figure 1: Smaller failure radius with disaggregated and distributed networking architecture
Although both the failure rate and failure radius are important on their own, the two also have a “trade-off” relationship: service providers can tolerate a higher failure rate if they know that the associated failure radius is sufficiently small.
#2 Maintenance Without Service Impact
Traditional networks are not only complicated but also fragile. They are built with a cumbersome hierarchical structure that leaves the operations team nervous about any required maintenance activities or changes – even those within a scheduled maintenance window.
Network operations teams work under the influence of “network outage anxiety.” They know that even brief downtime could affect millions of users and in some cases be life-threatening – for which they could be fined heavily. Thus, when choosing a carrier-grade vendor, it is crucial to consider the following aspects.
Maintenance Windows, ISSU and On-the-Spot Replacement
To minimize service interruptions and service impact, service providers should adopt disaggregated and distributed networking solutions that eliminate the interdependencies of traditional monolithic designs and allow for independent maintenance of specific elements (software and/or hardware).
On the hardware level, a solution should include built-in redundancies for any hardware element. If any component fails, the network cluster can continue to operate seamlessly. As such, on-the-spot replacement of any component can be done immediately without scheduling a maintenance window.
On the software level, embracing cloud-native architecture should be adopted. Running every function on a dedicated container isolates each function so that a failure in one service will not affect the operation of other services on the shared infrastructure. Furthermore, in-service software upgrade (ISSU) can also be hassle-free using containerization.
While many networking vendors promise short maintenance windows, simple hardware replacement and ISSU, the inherent capabilities of disaggregated and distributed networking make this architecture the best choice for maintenance that is free of negative service impact.
Figure 2: hardware disaggregation leads to redundancy, where failure in one component does not affect the other components
#3 Running on a Live Tier-1 Carrier Network
Many vendors can promise big things and “talk the talk” – but can they also “walk the walk”? Service providers’ demanding SLA is not relaxed for new and unvalidated technologies, since choosing the wrong technology can lead to substantial customer churn, penalties and more.
That is why validating a promising technical blueprint is extremely important. A technology or solution cannot be crowned as carrier-grade without any live network experience.
Successful implementation of a solution within a live tier-1 carrier network is the ultimate technology approval. If the solution has too many failures and every failure has a large blast radius whose recovery takes days, it is it’s clear that the solution is not meeting the carrier-grade criteria. No carrier in its right mind would deploy such a solution into its live network.
If a carrier does implement the technological solution into its live network, you can trust that the carrier-grade promise was evaluated vigorously.
Service providers should seek extensive live network experience from their networking vendors, preferably with a tier-1 carrier network.
Simplicity Leads to Robustness and Confidence
Now let us circle back to the beginning. When choosing a real carrier-grade networking solution while considering growing bandwidth and performance requirements, service providers must embrace simplicity. Distributed and disaggregated networking solutions, if done correctly, can offer simplicity that leads to robustness and confidence. Having said that, running successfully on a tier-1 carrier’s network is what allows vendors to truly earn the “carrier-grade” title.
DriveNets Network Cloud orchestrates a distributed and disaggregated network function that turns simple “off-the-shelf” white boxes into a shared resource supporting multiple network services in the most efficient way possible.
DriveNets Network Cloud is a risk-free, battle-proven solution that credibly “owns” the carrier-grade title. Already running successfully on global tier-1 carriers’ networks, our solution architecture delivers high availability, multi-level redundancy and seamless maintenance.
In summary, key benefits of the carrier-grade DriveNets Network Cloud include:
- Field-proven performance: implemented and running successfully on tier-1 live global networks
- Reduced failure rates: using only simple off-the-shelf white boxes
- Reduced failure radius: shared compute and networking resources enabling independent upgrades of system elements
- On-the-spot replacements: separated control and data planes allowing replacement of white boxes without negative service impact
- Perfect balance of cost, performance, and scalability: incremental, automated and practically unlimited scaling
Download White Paper
DriveNets Total Cost of Ownership