Agentic NetOps: How to beat the cloud monsters at their own game


Summary
The secret to hyperscaler success isn’t magic. Kentik Co-founder and CEO Avi Freedman explains how organizations can adopt the same operating principles and empower network teams to drive results that far exceeds their headcount.
The promised land
Every networker dreams of the same place: a network that largely runs itself. Detecting problems early, adapting automatically, and letting humans focus on design and intent instead of firefighting.
In a recent article, AWS described how its own network is already operating much closer to that vision. Google, Microsoft, and Oracle tell similar stories. Their networks don’t just scale to absurd size — they do so with a level of operational efficiency that most enterprises and service providers can only envy.
And then you look at your network.
It doesn’t look anything like that.
So how is this possible? And more importantly, does it mean the promised land is out of reach for everyone else?
The uncomfortable truth: Their network is simpler than yours
It’s tempting to believe hyperscalers have unlocked some magical, futuristic form of networking. In reality, the biggest advantage they have isn’t AI: it’s simplicity.
Cloud providers got where they are by making a series of deliberate, sometimes brutal choices:
- Their own networking devices, both physical and logical
- Highly regular, scalable designs repeated thousands of times
- Far fewer protocols, features, and device types
- Less diversity across almost every dimension
- Aggressive overprovisioning to avoid operating at the edge
- Deeply integrated, in-house monitoring and control planes
This is not how most enterprise or service-provider networks evolved. Yours likely grew organically, absorbed mergers, layered new technologies on old ones, and optimized capital spend far more than operational simplicity.
From that perspective, the cloud networks aren’t superhuman.
They’re just less complicated.
So, are you locked out of the promised land?
Not entirely, but expectations matter.
If “self-running network” means fully autonomous, hyperscaler-style operations, then no: most organizations aren’t getting there in 2026.
But if it means dramatically higher human operational efficiency, fewer outages, faster root cause analysis, and networks that actively help their operators — that is achievable today.
And this is where AI actually matters.

Step one: Modernization (where it counts)
You don’t need to forklift everything. But on newer infrastructure, insist on components that behave like modern systems:
- APIs for configuration and state
- Streaming telemetry instead of periodic polling
- Programmatic access to counters, events, and topology
AI and automation don’t fail because models are weak. They fail because the underlying systems can’t be observed or controlled cleanly.
Step two: Ruthless simplification
Every extra protocol, device type, and architectural exception increases the cognitive load on humans and machines.
Cloud providers win here by default. Everyone else has to be intentional:
- Use fewer protocols
- Standardize hardware and software where possible
- Avoid “special cases” unless they truly pay for themselves
AI doesn’t eliminate complexity — it amplifies your ability to manage what remains. The less there is, the more effective it becomes.
Step three: Ubiquitous telemetry
This is non-negotiable.
If your physical routers, virtual routers, wireless systems, and edge services can’t export rich telemetry, neither your engineers nor their AI helpers can understand what’s happening.
Wireless controllers and SASE platforms can genuinely simplify operations, but only if their telemetry isn’t trapped in a vendor “walled garden.” If your AI has to scrape dashboards or reverse-engineer APIs during an outage, you’ve already lost time and clarity.
It’s possible to work around that, but it adds complexity when the whole goal is to reduce it.
Step four: Agentic NetOps
This is where the cloud advantage can finally be shared.
Instead of trying to replace humans, the winning pattern is to multiply them:
- Agents that are always watching
- Agents that are constantly testing assumptions
- Agents that correlate signals across domains
- Agents that bring recommendations, not just alerts
All under your team’s direction.
This is how hyperscalers achieve the illusion of superhuman operations: not fewer people, but far more machine assistance per person. With modern telemetry, simplified architectures, and AI-driven agents, smaller teams can operate networks with a level of situational awareness that once required entire NOCs.
We’ve been here before, of course.
AIOps promised something similar, but largely failed to deliver. Not because the goal was wrong, but because the approach was. Too often, the focus was on throwing away detail instead of accelerating troubleshooting. Data wasn’t enriched with enough context. “AI” meant static logic and pattern matching, with little real reasoning and frequent human intervention just to keep it useful.
What’s different now is that we’re no longer asking machines to summarize the world — we’re asking them to understand it, continuously, in support of human operators.
From agentic NetOps to autonomous assistance
This is where agentic NetOps becomes real — not as a research project, but as something operators can rely on day to day.
With the right data foundation, systems like Kentik’s AI Advisor act as a persistent, autonomous teammate. It is always watching the network, continuously evaluating telemetry, traffic patterns, routing behavior, and performance signals across domains.
When something changes, AI Advisor doesn’t just fire an alert. It triages:
- Is this real or expected?
- Is it localized or systemic?
- Is it likely a capacity issue, a routing change, a device failure, or an external dependency?
Then it recommends action, grounded in what the network is actually doing:
- Where to look first
- What changed recently
- What other signals support the hypothesis
- What the blast radius appears to be
In other words, it does the work that usually burns the most human time: narrowing the problem space and turning raw telemetry into operational understanding.
This doesn’t eliminate human judgment, but it radically increases its leverage. Engineers are pulled in with context, not noise. Decisions are made faster, with more confidence, and fewer late-night war rooms.
Cloud efficiency, without being a cloud
You may never run a network as homogeneous or overprovisioned as AWS’s. But you can operate one with similar human efficiency.
But by modernizing where it matters, simplifying intentionally, exporting rich telemetry everywhere, and pairing your team with an always-on AI advisor, you can move much closer to the promised land than you might expect.
The hyperscalers didn’t win by removing humans from the loop. They won by surrounding them with systems that never stop watching.
That advantage is no longer reserved for the cloud monsters.
The good news
Organizations using modern telemetry and agentic NetOps approaches are already seeing early wins:
- Faster detection of real problems
- Shorter mean time to understanding
- More confidence in making changes
- Less time spent chasing ghosts
In practice, many teams discover they can go further — and faster — than they expected.
You may not have AWS’s budget, control, or homogeneity - but you don’t need them to capture much of the benefit.
The promised land may not look exactly like the hyperscalers’.
But with the right foundations, AI can help every network operate with cloud-like human efficiency.
And that’s a future worth being a little jealous of.


