In the old days, it took a bunch of help desk tickets for an engineer to realize there was something wrong with the network. At that time, troubleshooting meant logging into network devices one-by-one to pore over logs.
In the late 80s, SNMP was introduced giving engineers a way to manage network devices remotely. It quickly became a way to also collect and manage information about devices. That was a big step forward, and it marked the beginning of network visibility as we know it today.
In 1995, NetFlow was released, giving us a way to send network traffic information directly from our routers and switches to a centralized flow collector. Flow data was another big step forward in the type of data we could collect, especially because we could get information about application activity rather than just CPU and link utilization.
Then we saw the first dashboards display SNMP alerts and network flow data on colorful graphs and charts. The thing is, this is where the industry stopped for a while. We got better at SNMP and analyzing flows, graphs got prettier, and menus got snappier. But behind the scenes it was still just a collection of SNMP traps and flow collectors. To be fair, it worked well enough for the time period, but as the great networking guru Bob Dylan once said, “the times they are a-changin.”
Today’s needs are just a little bit different from the needs we had in the mid 90s. OK, so maybe that’s a bit of an understatement. It was only a few years ago when public cloud and SaaS took off. Network overlays like SD-WAN became standard rather than corner-case, and hipster tech like containerization became mainstream. Network visibility needed to evolve because the technology we need visibility into changed.
So how do we get visibility into parts of the network that don’t provide us much in the way of SNMP or flows? How do we get visibility into parts of the network that we don’t even manage or own? And how does a mere mortal make sense of this enormous collection of data?
Today, we need to collect everything and anything. Old school SNMP information and flow data, public cloud logs from AWS, Azure and Google, streaming telemetry direct from our beloved routers, and metadata data gathered by looking at DNS responses make up the bulk of our visibility databases. And once we have it all in one place, we can smush it all together to analyze programmatically.
“Programmatic” is a loaded term. It conjures thoughts of Python scripts and Ansible playbooks, but in terms of visibility, it means much more than that. The next evolution of network visibility is a programmatic approach to finding meaning in huge amounts of diverse data. “Finding meaning” may sound like a spiritual pilgrimage, but it really just boils down to helping engineers find the root cause of a service delivery problem faster and automatically.
Think about how many data points exist in a database of network metrics, especially if it’s data collected over time. There are probably millions of timestamps, alerts on CPU utilization, flow logs, configuration files, AWS VPC flow logs, Google system logs, packet information, ephemeral information like interface statistics, and information derived from synthetic tests. That’s a big list. Understanding how everything relates to everything else is an overwhelming task for an engineering team.
The next step in network visibility is network observability, which means finding correlation among data points, inferring visibility where it’s difficult or impossible, and deriving meaning from the data beyond associating a problem with a timestamp. A dedicated team of engineers and data scientists might be able to do this given enough time, but network observability gets us there faster and, in theory, with more insight than even a skilled engineer could provide. It solves the problem of how to handle today’s huge amount and variety of network data.
This means network observability doesn’t replace legacy network visibility. Instead, network observability is built on the foundation of years of network visibility technology alongside the latest data types and methods. It automates correlation, performs anomaly detection, provides actionable insight, and answers why something is happening on the network, not just that it is happening.