Don’t get me wrong, I really enjoyed the hands-on part of my career. During the past 30 years I loved to plan, deploy, configure, maintain, and even troubleshoot different types of network devices – from routers, through wireless and optical gear, SDH and PDH nodes, and all the way back to my first node, a statistical mux of V.24/RS-232 9600bps signals (a long, long time ago).
My very first troubleshooting gear
“Modern” network outages
But even though I loved this work and its long hours (even staying up late at night for a complicated upgrade procedure), the one thing I hated was the call/text/alarm/shout indicating “the network is down!” This punch-to-the-stomach event was always followed by a lot of pressure, a lot of frustration, a lot of stress, and, at the end of the day (or night), a very frustrating result for the root cause analysis. It was, and still is, frustrating since as networks evolve and become more complex, you get less “hardware failure” and “software bugs” and more human-related reasons such as a missing exclamation mark or a rogue backslash.
A typical root cause for an outage
The thing is, in many cases a single character in a configuration can create a storm in your network in a way that makes troubleshooting very inefficient.
Such events often overload network elements’ CPUs. In such cases, trying to connect to your router via CLI and run basic diagnostic commands (or implement workarounds) can be super slow, if at all possible. The reason is the router is busy with high-priority tasks and does not really bother to “pick up the phone” as you call it.
Your next network outage
But I’ve got some good news – your next outage can be shorter.
If you think about it, your network is probably built in a similar manner to the way it was a couple of decades ago. But what if it wasn’t?
What if instead of a monolithic router, your network function was a software, running as a micro-service in containers over an abstracted hardware layer?
This would mean, for instance, that even if this function (say, a router) would be overloaded due to any type of network storm exhausting its CPU resources, those resources would be just a fraction of the underlying hardware compute resources – the fraction allocated to this network function. In any scenario, the infrastructure hardware would still be accessible and manageable.
So instead of troubleshooting a flooded router, you could manage its surroundings without interruptions. You could allocate additional resources to it, and rewire its connectivity to other routers in the site (without physically arriving to the site). You could even shut it down altogether and instantaneously create a new router with the right configuration, allocating all the ports and resources to it instead.
Building networks like this, like cloud, makes this a reality.
I will leave to you the math regarding how much time this architecture could save in your service interruptions.
That is, unless you actually love network outages. I don’t – which explains why I’m now in marketing…
DriveNets Business Case
Network Cloud & Multiservice Architecture