DriveNets has been very active in helping tier-1 operators around the world towards achieving their goals. There have been some concerns in the market about the complexity of a new cloud-based model, in terms of planning, sourcing, integration, installation, growth, maintenance, and support. So it’s time to talk about our actual experience, the ins & outs of our deployments in the past two or three years, to show how disaggregated, cloud-native networks is the right path forward.
Let’s dig in!
Addressing Malfunctions on the Spot
BGP, or the Border Gateway Protocol, is the gateway protocol that enables exchanging routing information between autonomous systems (AS), through peering. Early on, during our greenfield deployment into our Tier-1 service provider customer’s production environment, we noticed several, recurring BGP failures. Thanks to DriveNets’ built-in redundancy in its disaggregated solution, there were no traffic drops at all, but identifying the cause to these failures was a priority to the DriveNets team.
After analyzing the system, the team was able to determine that the unsupported BGP community was behind these failures. The swift development and disaggregated architecture of DriveNets’ network operating system solved this issue within hours, without any impact to the end-user quality of experience.
Bottom line: despite claims from competitors that our solution could be more complex, that’s not the case even when we need to solve problems. When a protocol failure happens thanks to misconfiguration (as explained above), DriveNets’ routing expertise was a major factor to quickly find the problem and solve it immediately. With DriveNets, operators can avoid maintenance window outages, in addition to having fewer people involved in the process, reducing opex.
Line card failure? Is immediate replacement considered?
Not really. At least not in traditional networks.
Sometimes, with legacy monolithic chassis routers, the failure of a line card has the potential of jeopardizing the continuity of the whole chassis during its replacement, as it can interfere with the delicate alignment among chassis components. Even though line card replacement is claimed to be a non-service-impacting procedure, service providers have been avoiding the risk, and performing this operation during maintenance windows only. It is important to take into consideration that power distribution, airflow, and pin connectivity are interrupted when a line card is extracted and a new one is inserted.
The Distributed Disaggregated Chassis (DDC) white box architecture adopted by DriveNets employs a separate line card system for packet forwarding (Network Cloud Packet Forwarder – NCP) and for fabric system (Network Cloud Fabric – NCF). As such, failure of any of these components does not affect the rest of the cluster in any way, and can be replaced immediately without scheduling and waiting for a maintenance window.
We have gone through this procedure with our customers and the results were clearly reflected in the significant reduced OPEX over time. In addition, it reduces the need for maintaining a hot spare inventory on site. While maintenance windows are a regular activity in the field, it demands the resources and support team in case something goes wrong. Maintenance windows are expensive and involve several risks. With DriveNets’ DDC architecture, there is no ‘domino effect’ of failures, and it can be done easily and on the fly without a maintenance window. Since all components are separated, if there are any issues, they are isolated to their specific units, and as such, the blast radius is controlled.
When There Are Problems with Zero-Touch Provisioning
As part of our Network Cloud solution, DriveNets Network Operating System (DNOS) is a cloud-native, distributed networking software, built-on containerized microservices, creating a unified, shared infrastructure over a distributed architecture. DNOS supports multiple service offerings at scale, including routing – from core to access – and hosting for third-party services. DNOS is highly integrated, and all provisioning is done automatically through DriveNets Orchestrator (DNOR). DNOR, a zero-touch provisioning network orchestrator – automates the deployment, scaling, and management of the DriveNets Network Cloud solution.
In one of our deployments, we found out, through our zero-touch provisioning process, about an incorrect configuration, due to an extra network interface controller (NIC) card that was not supposed to be there per the hardware BoM (bill of materials). In fact, DriveNets discovered that this issue prevailed across multiple sites from different vendors.
DriveNets’ Support and Operations team was able to quickly and easily resolve this by coordinating with our system integrator partners, who were able to remove the extra NIC cards and ensure the right hardware configuration was delivered to each site. By supporting zero-touch provisioning, our software is able to identify such issues in the pre-deployment stage, and as such, the operating system is not blindly loaded onto unsupported or unexpected hardware.
Coming Up Next
There is no straight line when it comes to network development, and our goal is to be straightforward and transparent, highlighting the ups & downs, and how we addressed all the challenges and glitches in our customers’ networks.
DriveNets has been quite active in helping operators reach their goal towards a simpler, faster and more cost-effective cloud-native network infrastructure. In fact, KGPCo, the leading system integrator of network solutions – both legacy and disaggregated – confirmed this in a study based on their multiple network deployments of DriveNets Network Cloud solution:
Reaching the Brink of Capacity
Over the past few years, DriveNets has been able to prove – over and over again – that high-capacity networks can be built with fewer boxes, further improving the superior total cost-of-ownership model of deploying DriveNets Network Cloud. A high-capacity white box is ideal for multiservice, aggregation, core networks, and edge cloud infrastructures, an increasingly important playing field for cloud and service providers.
Which explains why we were shocked when we received a phone call from one of our customers within a year of production and deployment in several sites. It turns out the customer was updating us that the DriveNets-operated line cards (NCPs) from UfiSpace started to reach the RAM (Random Access Memory) threshold on a few units. According to our initial projections, we did not expect this to happen. After the call, our support team wasted no time, and raced to solve this issue as quickly as possible.
Turns out these specific threshold-top units had been shipped with a lower amount of DIMM (Dual In-line Memory Module) units than expected. It was an easy-to-fix issue – DriveNets procurement and operations teams coordinated this analysis and the return merchandise authorization (RMA) of these specific units with UfiSpace, solving the problem at once.
Speeding Up Deployments
A major challenge service providers have faced over the past 20 years is how quick networks can be deployed, particularly regarding installation, maintenance etc. One of the main benefits of DriveNets is that it has production clusters installed in datacenters around the world. Since DriveNets Network Cloud rely on generic, COTS equipment and white boxes, service providers can rely on the datacenter local engineers for any routine hardware maintenance, without any specific training for that specific hardware. This leads to lower TCO and better, faster recovery time in case of malfunctions.
On one occasion in the past year, one of our customers in the US reached out to us, requesting an urgent network capacity increase of white boxes in the network we had set-up. Drivenets was able to ship the new white boxes overnight, which were then physically installed and remotely configured by their employees in less than 48 hours, doubling the datacenter’s capacity.
The Challenges of New Technologies
While new technologies can bring significant benefits to the industry, sometimes they may present some hurdles or stumbling blocks.
The question is how easy (or quickly) we can address those.
DriveNets NOS (Network Operating System) is a clear example of how a disaggregated router model can simplify the upgrade and evolution towards a better network. DriveNets’ DNOS brings cutting-edge protocols to advance our customers’ networks. Our customers implement DNOS into their networks according to their specific needs, which may lead to incompatibility with previous implementations of similar protocols.
Because DriveNets NOS is modular, it enables us to react faster and make the necessary adaptations in our system to align with other legacy equipment which have implemented a different deviation from the loose definition of the protocol.
In fact, at one time, we ran into a multi-vendor incompatibility issue, related to traffic engineering. Thanks to the DNOS architecture, the time of fixing such issue was very short, making a huge impact in the whole system. Every necessary change was provided and applied to the router within hours on the same day. It became clear to our customer that DriveNets’ solution enables a much faster reaction vis-à-vis what they expected from their incumbent vendors and their traditional, monolithic networks.
We Are Not Done Yet…
It would be easy to say things are easy and smooth all the time. But that is not true, there are always problems and challenges, whether we are talking about traditional networks or DriveNets’ disaggregated, software-based solution. But there is a difference – DriveNets bring two main benefits to service providers:
- One of our goals is to be as straightforward and transparent as possible, being as clear as possible regarding any issues that may arise, and how to handle and address them;
- And when there are issues, the fact that our solution is cloud-based, supporting zero-touch provisioning, means that we can address any issues faster and more effectively than those issues with the proprietary, traditional hardware that have been in use for so many years.
A while ago, DriveNets decided to share some of our hands-on experience with our customers, the leading Tier-1 operators around the world. While marketing focuses mostly on properly presenting a company’s product or solution as the answer to customers’ needs, the truth is that there are always ups and downs everywhere. And to be honest, showcasing some of the “downs” and how we can take them “up” right away is the right way to clear the path ahead towards even more network evolution.
of DriveNets Network Cloud
As a quick recap, in the first two blogs, we shared the following actual examples of DriveNets’ cloud-native deployments:
- Addressing multifunction on the spot;
- Risk-free operations and optimized maintenance;
- Zero-touch provisioning issues;
- Reaching capacity;
- Speeding up deployments; and
- The challenges of new technologies
Are we done yet? Nope. Again – hands-on experience never ends. Let’s keep going.
The Benefits of Zero-Touch Provisioning… Again
In my previous blog posts, I mentioned how, by supporting zero-touch provisioning, our software is able to identify problems in the pre-deployment stage, and as such, the operating system is not blindly loaded onto unsupported or unexpected hardware.
In one instance, a Tier-1 operator, where several DriveNets clusters have been deployed, reported a failure at one of the white boxes used in a cluster. The Tier-1 engaged in troubleshooting and debugging, involving the white box hardware manufacturer, and they found out that the Solid-State Drives (SSD) endurance was not sufficient to address the network needs, and as such a hardware swap may be necessary.
DriveNets understood that, based on the actual usage, the SDD was drastically affected. Based on our belief that zero-touch provisioning and high reliability are critical elements for a network success, we updated the software and released a new service pack that changed the way the drive is accessed and how the information is written on it, reducing the unnecessary pressure on the SSD. Mind you, it’s clear to us that (almost) anyone can do the software and service pack updates. But thanks to our cloud-native software and our agile software methodology, we were able to do it much faster than traditional solutions. In short, we were able to extend the mean time between failures (MTBF) significantly and well beyond the manufacturer original specifications, bringing significant benefits to our customer. Our “fix” is risk-free, thanks to our distributed containerized software.
Taking Remote to a New Level
One of the main benefits of a disaggregated router model is that not every problem must be fixed on-site. In a recent incident, one of our customers had a firmware issue in the data plane line card (NCP), which was managed through an incumbent vendor’s platform. With this firmware issue, there was a loss of connectivity between the NCP and the incumbent hardware. It could no longer be accessed in-band from the cluster/router, which was managed by the incumbent vendor’s supervisor.
Since DriveNets’ disaggregated router model provides independent console access to any router component, we were able to investigate and repair the issue remotely, without requiring an on-demand, on-site presence. Our team at DriveNets was able to take advantage of the console port of the NCP to access the box and recover the connectivity remotely. With an out-of-band console access to any component, we dramatically shortened the time to restore the NCP, significantly reducing opex. With an incumbent router model, there would not be an option to do this remotely, and an on-site presence would have been required.
When a Network Issue Becomes a Simple Hiccup
Service providers have to be particularly careful with any issues, failures or malfunctions in their networks. Any of these can potentially jeopardize the whole network operations (remember the Facebook outage?), so service providers are particularly careful with any kind of replacement, support, etc., leading to expensive maintenance windows, as we mentioned in our first blog post in this series.
In one instance, one of our customers experienced a hardware failure in the Network Cloud Fabric (NCF), which needed to be RMA-ed. Thanks to DriveNets’ fully redundant disaggregated router model, the NCF failure had no impact whatsoever on the router functionality and traffic forwarding capacity. In fact, the installation of the new NCF was done while the router was actively in-service, with no packet drops.
With the traditional, hardware-based infrastructure, this easy replacement would not have been possible. In such an occasion, when there is an issue in the fabric or backplane, the entire chassis needs to be replaced, a costly, timely and burdensome process.
So NOW We Are Done?
Not quite… Although in an imaginary, fantasy world, we would love to say that there are never problems in any network (no matter whether hardware or software), the truth is that real life is… well, real.
Even though we have experienced some disbelief on DriveNets’ revolutionary solution, the truth is that DriveNets have proved – over and over again – how disaggregated, cloud-native networks bring considerable benefits to service providers, enabling service providers to leverage cost, flexibility, and scalability by building their networks like cloud.
Download White Paper
The Five Operational Benefits of DriveNets Network Cloud