In part 1 of this series, I talked about the importance of network observability as our customers define it — using advances in data platforms and machine learning to supply answers to critical questions and enable teams to take critical action to keep application traffic flowing.
Most of the history of network operations has been supported by monitoring tools, mostly standalone, closed systems, seeing one or a couple of network element and telemetry types, and generally on-prem and one- or few-node, without modern, open-data architectures.
Networkers running enterprise and critical service provider infrastructure need infrastructure-savvy analogs of the same observability principles and practices being deployed by DevOps groups. We see these DevOps teams unifying logs, metrics, and traces into systems that can answer critical questions to support great operations and improved revenue flow.
We see the network observability platforms, teams, and tool-builders needing:
In part 2 of this series, to continue diving into what’s needed to make the network observable, we tackle the first key to the input needed for network observability — which networks and network elements to get telemetry from.
To achieve observability in modern networks, it is key to gather the state of all of the networks your application traffic traverses — overlay and underlay, physical and virtual, as well as the ones you run and the ones you don’t.
The breadth of network telemetry sources we see in modern networks include the components of network types such as:
It’s also critical to think about the forwarding and control elements and observation points:
There’s probably nothing on these lists that comes as any surprise, other than the fact that most companies can’t yet see a unified view across these networks and key elements in one place.
This highlights one of the big challenges of making the network observable. Our networks have been built up with a wide range of devices — from multiple vendors, old and new, physical and virtual — all working together. Network observability must include most, or all of these to be capable of answering the questions critical to keeping application and user traffic fast and available.
The good news is that it’s possible — with modern data platforms and an inclusive, upfront design — to get started, add value, and iterate/repeat towards complete coverage.
In the past, it may have been okay for the network to consist of interconnected islands, each with their own network monitoring tools. With the shift to DevOps and application-driven everything, we simply can’t work in this fragmented way anymore. All of our operational concerns, planning, running and fixing, need to be coordinated across the complete variety of the networks that affect our traffic.
In my next blog, the third in this series, I will discuss the types of network telemetry data that is generated across this wide range of network types and devices.