Is the Metro a Natural Fit for Disaggregated Device Technology?

I’ve blogged a number of times on the importance of metro in edge computing, cloud computing, function hosting, and network evolution. Metro, of course, is primarily a geography, and second a natural place of concentration of traffic. Economically, it’s the best place to site a resource pool for new services and network features, because it’s deep enough to present economies of scale across a larger user population, and shallow enough to permit customization on a per-use basic.

In a vendor opportunity sense, metro is the brass ring. My model says that there are about a hundred thousand “metro” locations worldwide, and obviously a hundred thousand sites where we had at least mini-data-centers, server switching, and router on-ramps, not to mention whatever we needed for personalization, would be a huge opportunity. In fact, it would be the largest source of new server and network device opportunity in the market.

Traditional network vendors see this, of course. Juniper did a “Cloud Metro” announcement a couple years ago, and Cisco just announced a deal with T-Mobile for a 5G feature-host-capable 5G Core gateway that makes at least a vague promise it would be “leading to lower latency and advancing capabilities like edge computing.” The technology includes Cisco’s “cloud-native control plane” and a mixture of servers, switches, and routers. Not too different from a traditional data center, right?

Is that optimum, though? Is a “metro” a single class of device, a collection of devices, a reuse of current technology, an on-ramp to a new generation of technology? We really need to talk about how the new missions that drive metro deployment would impact the characteristics of the infrastructure and architecture that frame that deployment. As always, we’ll start at the top.

First and foremost, metro is where computing meets networking. We know that we need servers for the computing piece, and we know that we need both switching and routing for the network side. We may also need to host virtual-network components like SD-WAN instances, if we plan to use metro locations to bridge between virtual-SD-WAN and MPLS VPNs. Further, if we are planning to support https sessions to edge computing components, we’ll need to terminate those sessions somewhere.

The second thing we can say about metro-from-the-top is that metro is justified by edge computing, which in turn is justified by latency sensitivity. I do not believe that the mission of hosting virtual functions, whether it’s arising out of 5G deployment or through a broader use of NFV, will be enough. Everyone wants to find new applications that would drive network and cloud revenue, and those new applications would have to have a strong sensitivity to network latency to justify their deployment outward from current public-cloud hosting locations.

The third thing we can say is that metro traffic is likely to be more vertical than horizontal, which has an impact on the metro data center switching model. Latency-sensitive traffic is typically real-time event processing, and to have this processing divided into network-linked components hosted on different servers at the edge makes no sense. Think of the metro as a way station between the user and traditional cloud or enterprise data centers, a place where you do very time-sensitive and important things to an application that’s far more widely distributed.

Third, metro infrastructure must be readily extensible to meet growing requirements, without displacing existing gear. It’s impossible that any operator would deploy a metro center at the capacity needed for the long term, when there would be zero chance that capacity could be used immediately. You need to be able to start small, metro-wise, and grow as needed. You also need to avoid having to change out equipment to reach a higher capacity, and to change management practices radically.

The final thing we can say is that metro is very likely to be a partitioned resource. The 5G missions for metro, which might involve network slicing for MVNOs or service segregation, would at least benefit from if not require segregated metro resources. Some operators already have relationships with cloud providers that involve resource sharing, almost certainly in the metro, and many operators are at least considering that. Regulatory issues might compel the separation of “Internet” features/functions from those of basic communications.

You can see from the sum of these points that there’s a fundamental tension in metro architecture. On one hand, it’s always important to support efficient hosting and operations, so it would be helpful to have a single pool of resources and a single management framework. On the other hand, too much unification would compromise the requirement that we be able to partition resources. But if we were to build a metro infrastructure with discrete resources per mission, the result would be inefficient from a capital-equipment utilization perspective, and management complexity would be considerably higher.

A potential compromise could be achieved if we assumed that our metro deployment was connected using a cluster device rather than a single fabric or a traditional switch hierarchy. However, there are very few cluster implementations, and in any event you’d need specific features of such an implementation in order to meet the other requirements.

I’ve mentioned one cluster router/switch vendor, DriveNets, in some other blogs. The company launched in part because of AT&T’s desire to get a disaggregated open-model-hardware router, and it’s been gaining traction with other operators since. DriveNets offers three features that facilitate a metro deployment model, and none of these features are automatic properties of a cluster architecture, so we can’t be sure that other vendors (when and if they emerge, and tackle the space) will have them. Still, these features pose a baseline requirements set that anyone else will have to address to be credible.

First, you can divide the cluster into multiple router/switch instances through software control. That offers the optimum solution to our final requirement for metro. Each router instance has its own management system linkage, but the cluster as a whole is still managed by a kind of super-manager, through which the instances can be created, deleted, and resized.

Next, you can add white boxes and x64 servers to a cluster without taking the cluster down, if you need to add capacity. The maximum size of a cluster is roughly comparable to that of a big core router, and the minimum size is a nice entry point for a metro deployment. All the white boxes and servers are standardized, so they can be used in any cluster for any mission, which means that you can build metro and core switches, and even customer on-ramp routers, from the same set of spares.

Finally, you can deploy network features and functions directly on the cluster using integrated x64 devices. Everything is containerized so any containerized component could be hosted, and the result is very tightly coupled to the cluster, minimizing latency. Each hosted piece can be coupled to a specific router instance, which makes the feature hosting partitioned too.

I think cluster routing is a good solution for metro infrastructure, possibly the best solution available. Right now, DriveNets has an inside track in the space given their link to AT&T, their early success, and the maturity of their features beyond simply creating a router from a cluster of white boxes. I’ve not heard of specific metro successes for DriveNets yet, but I do know some operators are looking at the technology. Given that there’s not really any organized “metro deployment” at this point, the lack of current installations isn’t surprising. It will be interesting to see how this evolves in 2023, because I’m hearing more trials and maybe even metro deployments are likely to come.

It will also be interesting to see whether traditional vendors like Cisco and Juniper decide to step up with a metro package. 5G Core gateways are plausible paths toward metro, but one that’s a bit hard to recognize among the weeds of uncertain technology steps. “Cloud Metro” positioning is a commitment to metro needs a strong definition and specific technical elements. So does the cluster model, and while DriveNets has addressed a lot of the requirements they’ve not pushed the details either. If metro is important, it will be important in 2023, and so will the steps vendors take, or fail to take, to address it.