Kentik - Network Flow Analytics
Kentik Blog
More Posts

The Network Also Needs to be Observable, Part 4: Telemetry Data Platform

Co-founder & CEO

In the previous two blogs, I talked about the need to gather telemetry data from all of the devices and observation points in your network, and across the different types of telemetry available.

The previous blog demonstrates practically how gathering more telemetry data directly enhances your ability to answer the questions that help you plan, run, and fix the network by framing the kinds of questions you can and can’t answer with different types and sources of telemetry. If you have other great examples, please also feel free to start a conversation! I’m avi at kentik.com, and @avifreedman on Twitter.

Awesome, I have data! Now what?

Network observability deals with a wide diversity of high-speed and high cardinality real-time data and requires being able to ask questions both known and unknown. Doing this at network scale has been a hard problem for decades, and requires a flexible, scalable, and high-performance data platform.

This creates a set of requirements for data architectures to support network observability:

Standard data bus: While all that’s required is that data be able to be routed from and to different systems, in practice, most modern network observability data platforms will have a common data bus, often based on Kafka, to support real-time ingest, subscription/consumption, and export of telemetry data.

Network primitives: Prefix, path, topology, and underlay and overlay are all concepts that are unique or have unique context in the network world. Without the ability to easily enrich, group, and query by prefix, network path, and under/overlay, it’s difficult for networkers to ask questions and reason about their infrastructure.

Enriched data: Whether it’s simpler enrichment like adding IP geography or origin ASN, or completely real-time streaming metadata like user and application orchestration mapping data and real-time routing association with traffic data, enrichment is key to being able to ask high-level questions of your network.

High-cardinality storage: One requirement of a modern network telemetry data platform is the ability to store at high resolution, at full cardinality. Network telemetry comes in dozens of types and hundreds of attributes, and each attribute can have millions of unique values. In modern, complex, distributed network and application infrastructures, it’s not possible to know in advance all of the questions you may need to ask.

Generally, people building on their own go through multiple backends in their search. Some options we commonly see are:

  • Backends focused on log telemetry, which can typically handle the cardinality, but are usually prohibitively expensive being more oriented to less structured ASCII data.
  • Time series databases, which operations teams usually are already running. These usually fail quickly with traffic data because of cardinality, or routing or other network data because of lack of support for network primitives.
  • OLAP/analytics databases that rely on rollups or cubing of data, which can look fast, as long as you remove the ability to ask unanticipated questions.
  • Streaming databases, which can be useful for learning patterns across the data, but also remove support for quickly asking new questions over historic data.
  • Distributed columnar databases, which is what we see most (and what we use at Kentik) to provide high-resolution storage and querying.

Immediate query results: Users need to be able to ask complex questions and get answers in seconds, not hours, to support modern “trail of thought” diagnostic workflows — again, at high cardinality and resolution.

Massive multi-tenancy: Operational data telemetry platforms need to be able to support dozens of queries. Not just from interactive users via the UI or API, but from other operational systems within the network stack and across an organization’s operational systems. Kentik built our own data storage and querying platform because current OSS big-data systems either require rollups, or don’t handle multi-tenant large queries well — often blocking on and losing data at ingest.

Streaming analytics: While it’s possible to simulate streaming analytics with high enough speed data storage and querying (early on at Kentik we did this), to run modern machine learning pipeline and thousands of real-time queries on streaming telemetry data, streaming analytics is the most common architecture, and is found in most modern and greenfield network telemetry data platforms.

APIs and integrations: Modern observability and telemetry data platforms must be open in spec and easy to integrate, ideally with built-in integrations to key cross-stack systems, to allow application, infrastructure, and security engineers to collaborate with a common operational picture. This typically means full APIs for provisioning and querying, and a streaming API to feed other systems with the volume and richness of normalized and enriched network telemetry, in real-time.

Over the next few months, we’ll be having panels to invite builders working in and around network observability to share tips, tricks, and platforms that they’ve used both for network observability, and to connect to broader operational and business observability platforms.

Conclusion

The good news is — with the right architecture and capabilities you can create a data platform capable of supporting network observability, and integrating across the business’s data platforms.

It may sound like a huge lift to go from nothing to a full observability platform. This is the business we’re in and we’re happy to help — but for those charting their own path, we’re creating resources like this blog series that we hope will be helpful.

Join the Kentik Slack Community
Be part of a community of Kentik users who can help you along the way.
Join Now
We use cookies to deliver our services.
By using our website, you agree to the use of cookies as described in our Privacy Policy.