Today’s modern enterprise WAN is a mix of public internet, cloud provider networks, SD-WAN overlays, containers, and CASBs. This means that as we develop a network visibility strategy, we must go where no engineer has gone before to meet the needs of how applications are delivered today.
We all know the story.
As the Enterprise entered the Mutara Nebula, Khan lost sight of his prey. Sensors were inoperable from the firefight, and finding anything in the gaseous cloud was near impossible.
Khan maneuvered, stalked, and hunted his mortal enemy with the cautious vengeance of a madman tempered by misguided intelligence and patience. But this type of encounter in space, this new application of battle strategy borne from intelligence without experience, meant Khan was handicapped from the start.
His pattern suggested… two-dimensional thinking.
Spock, spotting the fatal flaw, raised his head from the familiar viewfinder to report to his captain this new advantage. Without delay, Kirk ordered the Enterprise to a full stop and Z-minus 10,000 meters. Now, positioned close to the Reliant but beneath it in such a way that the great Khanh Nguyen Sung, despite his great intellect and inimitable prowess, would likely fail to spot the famed Federation starship.
And so the battle progressed until Captain Kirk and his crew destroyed the Reliant, or more accurately, put Khan into a position where he destroyed himself with the Genesis device.
Khan, a brilliant tactician, was unable to meet the new challenge because of legacy thinking. He thought in terms of two dimensions. Of a flat universe. But the universe isn’t two-dimensional, is it? And therein lay the end of Khan (and the greatest of Star Trek films.)
Enterprise WAN in 2023
Enterprise WAN networking in 2023 is very much the same. An engineer standing in front of a console today stares at the traffic moving from their on-prem data center up and out to a CASB, receiving DNS responses from a cloud-provided DNS service, and then on through an ephemeral microservices architecture in a public cloud.
And this, of course, is just to reach the front end. On the back end is yet another series of intricate and complex traffic patterns within and among various public clouds and back to an end user, an actual human being, working on a mobile device on a train heading into a tunnel.
To succeed as an engineer in this new network, and to successfully manage the infrastructure and services that deliver applications to people, we must rid ourselves of two-dimensional thinking.
I recently had the privilege of attending the WAN and AWS Summits in London. Both events, focused by virtue of their names on very different aspects of technology, were in practice and conversation all about moving resources from the public cloud to a person anywhere in the world. In other words, both events were all about cloud networking.
Not many years ago, all my WAN projects were IPsec tunnels to a headend, dual hub DMVPN designs, coordinating MPLS handoffs from last mile providers, etc. Full or partial mesh topologies, backhauling to data centers, testing connectivity to on-prem resources.
Recently, conversations with colleagues and customers depict very different traffic patterns. Today, it’s up and out. User to the cloud. Cloud to cloud. There’s very little going on with branch to branch connectivity or backhauling traffic to centralized data centers. There are exceptions for sure, but they are not the norm.
For the most part, people access resources, usually in the form of applications, directly from the cloud — whether that’s public cloud, private cloud, or a SaaS provider. This means traffic patterns have changed, and by extension, the nature of network visibility has changed.
Today’s network is more a collection of autonomous networks of varying sizes under one administrative domain than a highly interconnected mesh of networks. Sometimes those networks are individual branch offices that talk to no other branch of a private data center. Sometimes, they are large campuses with minimal resources on-site; and very often, they are networks of individual end users working from home.
The genesis of new enterprise WAN visibility
Enterprise network visibility in 2023 means much more than collecting flows and SNMP trap messages. Yes, those forms of telemetry are still important, but the network is very different than it used to be, so we need a new strategy to understand what’s happening.
First, we can start by collecting data from our cloud providers. This may go without saying, but if you’re using a public cloud in any way, you should also collect any type of logs or flows your cloud provider offers through visibility. This includes information about resource connectivity and traffic flow within a particular cloud provider’s environment and among different cloud providers.
AWS provides VPC Flow Logs, which allow you to gather info about traffic going in and out of your VPC interfaces.
GCP also uses VPC Flow Logs to record a sample of network flows sent from and received by VM instances, including instances used as GKE nodes.
Azure NSG flow logs inform us about ingress and egress IP traffic through a Network Security Group.
Next, considering we access cloud resources mostly over the public internet, we need to collect data about the state of the internet itself. This has always been something network engineers have wanted, but it’s never been more critical than it is today.
We can start with collecting whatever telemetry our service providers will give us, especially our last-mile providers. But that’s typically lacking at best, so we should also explore other data types and sources.
First, the global routing table is freely available to ingest from various reliable sources. This allows us to see the internet weather, as it were, which can help us understand why traffic moves the way it does.
Next, we can deploy test agents to monitor connectivity to public resources such as public DNS servers, SaaS providers, etc. More than simple up/down, we can test for latency, jitter, route and path changes, and so on.
Remember that many, if not most, of our end-users are accessing our applications over the internet today, so understanding what’s happening on the global network we don’t own or manage is critical to understanding application delivery.
So many applications are built on microservices architectures today that collecting metrics about container network performance is crucial. eBPF is a common way to interact with the container network stack to collect metrics such as packet loss, latency, jitter, and elements like TCP retransmissions, fragments, etc.
Monitoring container networking also means tracking the traffic among containers and between your pods deployed in the cloud, on-premises services, and interacting with third-party SaaS services.
Network overlays were once relegated only to the data center to stretch VLANs or other niche requirements. Today, network overlays form the basis of how we connect to resources over the internet.
SD-WAN, for example, is an overlay technology that obfuscates an underlying physical network, usually the public internet. It provides many benefits to network operations, network security, and, above all else, cloud connectivity.
In my professional experience, a conversation about SD-WAN is usually about the cloud. So without this telemetry as part of our visibility strategy, we’re missing visibility into the forwarding and policy component of how our end users actually reach their apps.
Some SD-WAN vendors expose an API you can use to collect information, and some export flow, SNMP, and more modern telemetry like streaming. Getting a clear picture of how the overlay and underlay networks interact and how the underlay affects what we see in the overlay is crucial for understanding application delivery today.
We want to know as much as possible about our end-user’s experience with their applications. After all, that’s the reason the network exists in the first place. And for that, we can collect a variety of data.
There are browser plugins that can tell us how an application performs, and of course, there are locally installed agents that can tell us about a computer’s resource utilization, such as memory and CPU.
However, to be proactive about monitoring an end-user’s experience, we can deploy test agents to simulate end users interacting with an application in a deliberate and prescribed manner so we can gather the specific telemetry about each step in a digital transaction.
The Starship Enterprise WAN
In 2023, applications are consumed almost entirely over the internet. With few exceptions, the enterprise network is more critical than ever, more complex than ever, more distributed than ever, and more impactful than ever to an end-user’s experience. That means designing and maintaining a performant application delivery mechanism requires a different form of visibility than we’ve previously had.
Like the Kobayashi Maru, we must change the conditions of the test. We can’t afford to lose, and as enterprise network engineers, we can’t afford to believe in a no-win scenario. The network has to work, or no one has access to anything.
Using more data, additional sources, and new data analysis workflows, network observability is the three-dimensional thinking necessary to meet the needs of today’s network.