To fully capitalize on the promises of digital transformation, IT leaders have come to recognize that a mix of cloud and data center infrastructure provides several business advantages. Read on to learn how an observable network leads to a better customer experience.
To fully capitalize on the promises of digital transformation, IT leaders have come to recognize that a mix of cloud and data center infrastructure provides several business advantages, including increased agility, cost efficiencies, global availability, and, ultimately, better customer experiences.
But, for the network and infrastructure teams making these hybrid infrastructures a reality, it is increasingly difficult to entirely understand and control what’s happening in these networks. The patchwork of cloud providers, architectures, services, carriers, peers, and the transient nature of many cloud-based connections and resources creates an equally patchy monitoring terrain, further complicated by global scale. These inputs can affect hybrid cloud networks’ cost, performance, and reliability. Network operators must account for their performance individually and holistically understand this data.
“Kentik helps us diagnose network-related problems much faster. The ability to dive in and put a trace on every connection is a huge benefit.”– April Carter, Senior Software Engineering Manager
Traditional network performance monitoring (NPM) strategies fail to synthesize this distributed telemetry in an actionable way, leaving operators and engineers with critical visibility gaps as traffic moves around and between networks. And it isn’t just a matter of collecting data. How is it being analyzed? How is it presented? Are you able to ask questions about your data?
With the rest of this blog post, I want to explore how we use network observability here at Kentik to bridge these gaps and help our customers build and maintain affordable, performant, and reliable networks.
Making your networks observable
Making a network observable requires a ground-up effort that, for some systems, might require rethinking fundamental network assumptions. At the very least, it will involve becoming very intimate with your network’s telemetry and perhaps even thinking of new data to establish and collect.
The tools and strategies of observability began in DevOps circles trying to tackle the problems of monitoring distributed systems at scale. One of the more interesting projects is OpenTelemetry, an effort to standardize metric, log, and trace data instrumentation, generation, and collection for observability efforts. As one of the fastest-growing CNCF (Cloud Native Computing Foundation) projects, OpenTelemetry’s popularity highlights the need for distributed systems engineers to be able to instrument their code to provide otherwise unavailable telemetry.
In network observability, this granular instrumentation is one of the principal data mechanisms allowing operators to “ask any question” about their observable systems. As such, instrumentation begs considerations for collection, sampling, storage, persistence, and analysis. But what needs instrumentation? There is readily accessible flow data from public clouds and telemetry from network appliances. Still, synthetic agents need instrumentation at the host level and the instrumentation for application and service context at the orchestration and service mesh layer.
Instrumentation provides the “how,” and contextualization provides the “what” for network telemetry.
As I mentioned, it’s possible to get rich performance metrics from your key application and infrastructure servers, even components like proxies and load balancers. You can map those metrics against contextual details like customer or application ID, internet routes (BGP), and location (GeoIP) and correlate them with volumetric traffic flow details (NetFlow, sFlow, IPFIX) from your network infrastructure. Storing these details unsummarized for months enables operators to get answers in seconds on sophisticated queries across multi-billion row datasets (but can become an engineering challenge at scale).
Implementing contextual instrumentation often represents the early “heavy lifting” when making networks observable. It provides much of the raw data that observability platforms like Kentik rely on to provide powerful querying support and detailed network visualizations.
Besides providing contextually rich telemetry for analysis, instrumentation allows networks to be visualized by operators in several helpful ways. Visualizations provide a layer of abstraction that enables network operators to unify disparate data sources such as data centers, private and public clouds, Software-as-a-Service (SaaS) applications, network edge devices, internet, backbone, WAN and SD-WAN, and more.
You might ask, so what? Visualizations can be paired with performance and security alerts for quick reads on “this is where and what your problem is,” provide a framework for digging down into data, and deliver an efficient, consistent (everyone sees the same thing) way to monitor traffic and overall network performance. This means a more precise picture and faster, more targeted responses when things go wrong.
Having an observable network means getting answers quickly in a system that lets you query, filter, drill in, zoom out, and map your network telemetry, no matter how large or complex the data set. This is the “ask anything” tenet of network observability and represents one of the more significant departures from traditional network monitoring, closing the gaps between data siloed in separate network performance tools.
Getting immediate responses to your queries is key when business-critical network operations are at stake. Troubleshooting often requires being able to sharpen your filters through repeated attempts, and slow responses can cause this workflow to become tedious and insufficient for the task.
This simple yet powerful ability to engage in open-ended network exploration from a single pane of glass saves network operators time, subscription costs, and even helps maintain or reduce overhead costs in the face of scaling.
Automation and optimization
Thankfully, one of the real gems of an observable network is that it creates an operations environment that is not beholden to constantly reacting to network performance issues and security threats. This uniquely contextual data of network observability allows network operators to craft predictable, highly-specific automation strategies for security threats, performance deviations, and even network-specific demands like peering policies, regardless of scale and complexity.
With unparalleled visibility into traffic, network observability lets you see your blind spots. Optimize against specific resources, applications, locations, and customers, ensuring your networks are dynamic and delivering the highest quality experience at the best cost.