Back to Blog

AI Advisor: Automating Network Troubleshooting with AI Runbooks

Phil Gervasi
Phil GervasiDirector of Tech Evangelism
ProductAI
feature-runbooks

Summary

Kentik AI Runbooks are machine-readable instructions that codify tribal knowledge into specific diagnostic workflows. By guiding AI Advisor’s reasoning and tool selection, Runbooks turn alerts into actionable, automated investigations, dramatically accelerating MTTR.


Modern network operations teams are facing an unprecedented combination of complexity, scale, and operational pressure. Hybrid networks, multi-vendor architectures, multi-cloud sprawl, and massive east-west traffic volumes have made traditional troubleshooting both slow and inconsistent. Traditionally, engineers rely on institutional knowledge to interpret alerts, correlate telemetry in their heads, and decide on the next steps. A big problem is that this isn’t an approach that scales in a world where latency is the new outage and performance issues can cost millions.

AI Advisor was built to solve these challenges by bringing AI reasoning, automation, and natural language interaction into network operations. One of the most impactful features, adding that coveted institutional knowledge into the system itself, is the Natural Language Runbook.

Runbooks are not generic documentation or static playbooks. In the Kentik platform, they’re living, machine-readable instructions that guide AI Advisor through highly specific diagnostic workflows. They align the agent’s reasoning with the engineer’s expertise, ensuring that triage and troubleshooting steps are not only fast but also correct, repeatable, and grounded in the domain knowledge of the specific network environment.

What is a Runbook?

A Runbook is a predefined, Markdown-formatted set of instructions that teaches AI Advisor how to investigate a particular alert or incident. Runbooks live in the Kentik AI settings, in a dedicated Runbooks pane, which means they can be easily accessed and adjusted to ensure AI Advisor carries out precisely what the engineer intended.

A Runbook contains:

  • Context about the alert or condition
  • Prescriptive diagnostic steps
  • Hints about what data sources to query
  • Logic for how to interpret results
  • Any custom workflows or caveats relevant to that organization

In the overall AI Advisor solution, Runbooks are part of the knowledge and toolbase, alongside Custom Network Context and the Kentik Knowledge Base. This means Runbooks become part of the system prompt the agent uses to reason through an investigation, and not simply a resource it reads after the fact.

The image below shows a Runbook used to investigate and explain incidents in which an interface experiences unusually high traffic utilization.

Natural Language Runbook Overview

Eliminating guesswork and institutional knowledge

Runbooks solve several common operational problems:

1. They enforce consistent troubleshooting.

Without Runbooks, two engineers investigating the same alert would typically follow different steps or interpret data differently. Runbooks guarantee that every diagnostic workflow begins with a well-defined, validated sequence.

Therefore, Runbooks minimize human and AI errors by ensuring a pre-defined, systematic troubleshooting approach.

2. They capture institutional knowledge.

Every network team has its own institutional knowledge, its own network idiosyncrasies, and unwritten rules for troubleshooting specific, especially recurring, incidents. Institutional knowledge means understanding:

  • Which metrics matter most
  • Known quirks of specific hardware
  • How to validate (or refute) certain failure modes
  • Local naming conventions and operational procedure.

Runbooks turn this institutional knowledge into codified, reusable intelligence.

3. They help L1 and newer engineers perform at a higher expert level.

AI Advisor already reasons across telemetry and tools, but Runbooks give it domain-specific expertise. When an alert triggers, even a junior engineer can click “Investigate with AI Advisor”, and the agent will follow the same diagnostic procedure a seasoned senior engineer would.

4. They accelerate mean time to resolution (MTTR).

By automatically taking the right steps in the right order, without context-switching between dashboards, known by seasoned engineers as “stare-and-compare”, Runbooks help AI Advisor perform triage in seconds instead of minutes or hours.

How Runbooks work inside AI Advisor

Once Kentik AI is enabled, Runbooks can be created and assigned to alert policies directly within the Kentik portal. When an alert fires, and a user launches AI Advisor from the alert page, the Runbook is used to initialize and guide the troubleshooting path.

First, AI Advisor receives alert context.

AI Advisor automatically begins a pre-determined and validated investigation with relevant data preloaded. In the image below, notice that you can select the relevant alerts you’d like to use to initiate a Runbook. The alert we see here is relevant to the Runbook created to investigate unusually high traffic utilization interfaces, though a more complex Runbook could be tied to multiple alerts.

Natural Language Runbook Alert

Second, when the alert policy tied to a Runbook is triggered, AI Advisor loads the assigned Runbook.

This incorporates the instructions directly into AI Advisor’s reasoning process, meaning the Runbook influences:

  • Planning
  • Tool selection
  • Data retrieval
  • Interpretation logic
  • Recommended next steps

Third, AI Advisor executes multi-step troubleshooting.

The agent calls Kentik tools that can search flow records and device metrics, look up information in the Kentik knowledge base, utilize synthetic tests, perform a syslog search, and more. This way, following the Runbook’s guidance, AI Advisor can gather the necessary data needed to troubleshoot the incident.

Lastly, AI Advisor explains findings and recommends actions.

Runbooks ensure the reasoning stays aligned with domain-specific expectations. The result is that the engineer receives a summary of findings, probable causes, remediation steps, and links to the data used in the investigation.

AI-augmented incident response

Runbooks turn AI Advisor into a true network intelligence partner, not just a chatbot. With AI Advisor, alerts become immediately actionable, troubleshooting becomes guided and consistent, MTTR drops dramatically, and institutional knowledge becomes systematized and scalable.

Using AI tools in network operations requires high levels of accuracy and precision, as well as the incorporation of engineering domain knowledge into the system itself. Runbooks, combined with Kentik’s rich telemetry and AI Advisor’s agentic reasoning, provide the blueprint for a future where network operations are faster, smarter, and far more resilient.

Learn more about Kentik AI.

Explore more from Kentik

We use cookies to deliver our services.
By using our website, you agree to the use of cookies as described in our Privacy Policy.