In part 2 of this series, I talked about the range of network devices and observation points that generate telemetry data. Over time, this range has expanded, and networks are more diverse than ever. All of our operational concerns, planning, running and fixing need to be coordinated across the complete variety of the networks that affect our traffic.
In this blog, I discuss the telemetry data itself. Telemetry is the key to seeing, and seeing is the first step in the practice of observability.
The wonderful thing about network telemetry is that there are so many types, which also creates the challenge of starting on the network observability journey!
Historically, many systems have taken one or two types of telemetry to answer a more limited set of questions. However, with modern data systems and techniques, it’s possible to take a broader set of telemetry, which opens up an even wider set of use cases and questions that can be answered.
Wire data can also help as a type of traffic data, but we see almost all cloud-focused customers using traffic summaries (flow) because of difficulty scaling packet observation in distributed networks.
However you get it, traffic is the key “what is” that shows you what users and applications are up to and how they’re interacting with the network!
Questions you can answer with traffic telemetry:
Questions you can’t answer with traffic alone:
This typically covers high level stats about both the control and forwarding planes, though usually not the deep telemetry on the traffic flowing across the network. Historically this was CLI, then became majority SNMP, evolved to add API access, and more recent energy has been around streaming telemetry.
Questions you can answer with device telemetry alone:
Questions you can’t answer with device telemetry alone:
Questions you can answer with events alone:
Questions you can’t answer with events alone:
Questions you can answer with synthetics alone:
Questions you can’t answer with synthetics alone:
This information tells you (modulo bugs) how traffic or packets will flow through the network under different conditions. Broadly this includes inter-domain (BGP), intra-domain (OSPF, IS-IS, RIP, BGP) and even switching (ARP and CAM) updates and tables. Routing is generally observed by participating in listen-only routing sessions, or for BGP, via BMP. Note: Really, I think of tables as composed in the observability data layer from updates as well but probably better to skip for now.
Questions you can answer with routing alone:
Questions you can’t answer with routing alone:
Questions you can answer with configuration data alone:
Questions you can’t answer with configuration data alone:
There are a wide variety of metadata types to tap into, often already available on data busses. Examples include application orchestration from Kubernetes, VMware, and controllers; user association from IPAM, NAC, and RADIUS; threat intelligence curated by security groups; SaaS and cloud identity mapping; customer or department identification; and “business criticality” metadata including customer size or application criticality to business operations.
How metadata lets you ask better questions:
For example, if apps or sites are using cloud infrastructure, flow without DNS may not be able to “see” them distinctly, but adding DNS to traffic data can help you to peer better into your traffic to those properties.
Questions you can answer with DNS alone:
Questions you can’t answer with DNS alone:
One explicit note that’s critical to modern network observability is that some of the most rich, real-time, granular, and valuable data to shine light on the network comes from application-layer sources. Most application-layer traffic data has performance instrumentation simply not available from high-speed silicon-accelerated network elements. While network and application observability teams have work to be done to obtain common telemetry, terminology, workflows, and platform interoperability, we see this unification as an active effort in 2021 across our customer base.
Questions you can answer with application telemetry alone:
Questions you can’t answer with application telemetry alone:
Gathering network telemetry data is the key to being able to ask questions, and is the first step in the practice of observability.
As I’ve tried to lay out in this blog, a wider and varied set of telemetry types can answer many more questions — and this makes your network more observable! Many common questions require two or more telemetry types to answer, and generally, adding combinations of network telemetry types gives you exponentially better ability to ask questions. Which is what network observability is about.
Now that we see the need to have lots of different network telemetry, and from the key network elements and types, how do we create a practical solution that is capable of handling all this data?
That will be the subject of my next blog in this series — the Telemetry Data Platform.