Video

From Reactive to Preventative: How AI Is Reshaping Network Operations

Hello and welcome to this SDX interview. My name is Kat Sullivan. I'm head of channels for compute at SDX Central and Data Center Dynamics. We're going to be talking about reactive, proactive and preventative operations, but actually much more around that and to tell us about what it is we're talking about today, I've got Alex Cruz Farmer, VP of product from Kentik joining me today. Hello, How are you today? Hi, Kat. I'm very good. Thank you. Excited to be here today. Excited to have you join us. I sort of did tease out what we're going to be talking about just now, but before we get into that, can you tell us a bit more about yourself, your role as VP of products, and what you're currently focusing on at Kentik? Yeah. So, yeah, I'm VP of product at Kentik. I help Kentik understand where the world is going and help guide where we're trying to take our product, help our customers, help the field, really truly understand how we're gonna tackle this this new world of AI ahead of us and how networks are are changing and breathing and evolving through this really interesting time. It is an incredibly interesting time. It's it's almost impossible to have a conversation these days without talking about AI. And with that massive push towards AI and automation taking over the industry, where do you think companies are currently struggling when looking to adopt AI in their operations? I mean, I I think what's really challenging for many organizations is that AI is sort of almost being forced upon people to to use it. If you're not using it, then why aren't you using it? It's the question, which I think is funny because I think maybe a year ago, we were in this situation of, hey, if you're using AI, it felt like you were cheating at your job. You were trying to to do a better job, but how can I use AI to do it better? And I think now that we've seen this sort of more kind of, you know, more adoption in organizations, this is kind of where we're seeing some of these, if you like, kind of cracks in the way that it works and operates. And I think where we've traditionally seen automation adopted in organizations is that there's been a lot of manual work, a lot of processes have been done manually, and then looking and understanding, well, how can we automate some of these steps to make things better and faster and essentially increase the quality? And I think what we're seeing with AI is that we're sort of sometimes skipping some of these steps where instead of just going from how do we do this manually, how do we automate it, we're going, right. Well, how can we use AI to solve this problem? And not necessarily thinking about all the different capabilities and things that we have within our our organization to help us fuel the AI to do a better job. And let's maybe start with one of the big challenges that you and I have spoken about before, which is siloed data. How do you bring bring the different data sources together into single coherent view that teams can actually then act upon? Yeah. That's a that's a great question. I mean, I think what what we've found when we're talking to customers about AI and different capabilities is that if we take a network operation standpoint, when an organization builds a network and the organization is small and growing, most of that information is in people's heads. Right? It's tribal knowledge. It's something that isn't really easily pulled out. You have people who leave the organization. You've got random switches and routers who get forgotten about, and there's that gap in documentation. And I think this is really where a lot of organizations sometimes struggle. So what we've we've found working with our customers is that understanding how the operation works from a from a runbook perspective is really important. But when the adoption of AI comes comes together, organizations have to kind of take the stage of, well, what can we do in a reactive way to try and help improve operational efficiency? How can we help that to reduce cost? How do we use that to increase CSAT, whatever it may be? And then kind of move through these stages of reactive and runbook based to proactive and then to preventative. And I think these are kind of where this information store becomes really, really important. And one of the things that we've done at Kentik is we've really done a great job of being able to pull together flow data alongside data from your network through NMS as well from cloud as well through from even synthetics to kind of give you all of these different data points across your kind of vast network, vast operation. And this is really where we can start pulling in, in in insights for you to help get to root cause analysis based on the data that we found. Now, obviously, we can't do that alone, and this is kind of where things like our MCP server that we now offer as part of Kentik can plug into your own tooling or even using our REST API. And that's how this data can then be alongside other datasets that you have. So, for example, a great flow could be that you've got a ticket that's just come into ServiceNow or your ticket system. A customer is complaining about a problem. You pull in data from CMDB or wherever it may be, you know, additional metadata that you may have in an external system, and then run that query and run that against Kentik to then pull that data back in to your platform to help answer that ticket maybe in one go or at least help inform the user that's receiving that ticket. Maybe it's a frontline. Maybe it's a network operations team to really help speed up that that root cause analysis because that's really what costs money. If you think that there's a big outage that happens, what we often see is that there'll be a war room set up, and in five, ten, fifteen minutes, it takes a whole bunch of time to get a whole load of people together. So what we wanna do is, well, how can we shortcut that? How can we take this situation of everybody crowding around in a boardroom to one guy taking his laptop into the boardroom because he needs somewhere quiet to concentrate where AI is helping him as a copilot, not a replacement, but a copilot to help him solve those problems? And that's really where the introduction of the Kentik AI Advisor has really, really changed the game for for a lot of our customers because we can start doing that root cause analysis for our customers. Things that would be done manually, we can help take that take that away and start providing you that insight in a in a more sort of a reactive and proactive way. And, I mean, how are you seeing customers respond to this? Because we have seen these features are being introduced more and more frequently in a variety of different ways within this industry. So when we look at operational efficiency, particularly within within the networks and operations teams, you've touched upon it a little bit, but maybe talk us through what the customer experience then looks like. Yeah. I mean, I think for let's take let's take NOCs, and and that's a that's a real, real big focus for us at Kentik is how can we help the NOC become more efficient in what they do on a day to day basis, if you like calling it, like, the agentic NOC. So what we've seen with our customers is that when they're coming to tackle a problem, generally speaking, the NOC is the one who always gets the blame. You know, it's never the application team. It's always the network team's fault. So the network team actually have to spend more time proving it's not them than they have to have to actually proving it proving, you know, out what the actual problem is. So one of the great things that we can do with Kentik is because we've got so many different vantage points and so many different views into the application, Synthetics giving you that view from outside of your network as well as inside of your network, giving you visibility of that entire digital supply chain, having flow to give you an an understanding of, well, where is my traffic going across my across my network and how is it transiting. And And we've obviously got NMS to help you understand the health of all your network devices. So again, what we're seeing is that when someone reports a problem, we can have a description of that problem. We can then use our AI adviser, fill it up with all the context of that ticket, add additional context based on runbooks that that can be preconfigured within Kentik, and the and the and the NOK engineer has to just describe a little bit of what they're trying to do, and we will go ahead and do that root cause analysis. Now the great thing is, yeah, that that may take thirty seconds or a minute or a couple of minutes. But whilst that's happening, the network engineer can be doing other things as well, maybe writing up a ticket. They can be doing other things. So I think it's really, really great use of time for us to build out this this view, essentially a report of what's going on and basically giving the answer of, hey, yes, the problem's us. Yes, it's not us. And I think that that's that's truly where we've been revolutionizing this kind of, you know, these point problems that kind of come in and come out. But I think what we also have done is that when there's sort of broader network disruptions, this is really where that mean time to identification, that mean time to remediation comes into play. And that's really where the power of Kentik and the AI capabilities alongside all that data that we have really starts starts painting a really phenomenal picture. And I teased out at the start the the process to reactive, eventually proactive, and even preventative operations. So maybe let's talk a little bit about this because one of the the the key sentences that we hear a lot is AI is only as good as the data that that you feed it. So how can companies think about this journey? So we're looking at reactive, proactive, and preventative operations. What does that look like? Yeah. It's a it's a great question. So the way that I look at the at the sort of reactive side of things, it's it's reactive purely because of trust. Right? AI is new. You don't know how much you can trust it. Every sort of conversation that you have is almost ephemeral. You know, once you get to a context window of maybe two hundred thousand tokens or a million tokens, that's it. It is kinda compacts it. You'll lose context, and and, like, you kind of almost feel like you're starting again. I mean, if you've ever done any vibe coding, it's kind of the most annoying thing when you're having a conversation. It's like, you're pretty sure I've had this conversation already. So one of the things that's really important is that when when you're using AI, you need to know and understand some of the constraints that exist and the way that you the or the way that I AI and LLMs operate. And I think that when it comes into that sort of more reactive standpoint, it's like, hey, I've got this problem. This is how I would usually solve it. How can AI help me get to that answer in minutes rather than tens of minutes or even hours? And that's kind of that important part around around the reactive space. And the great thing is, like, even our AI adviser, you can ask it for a configuration recommendation. They'll kind of give you one. But you need that human in the loop to be able to know whether that's the right thing to do. So, again, AI as a copilot in that reactive space is really important. And I think then once you've built that trust, that's when you can start expanding more into this kind of proactive space, which is where you're kind of taking the insights, the alerts, the things that your network's doing, notifications you may have even seen like a third party has just posted an outage notification, and an AI has spotted that that's there or that's being fed into a system somewhere. Maybe it's a webhook that comes into our platform. Maybe it's a syslog message that we pick up. Maybe it's a, you know, some random small little thing that usually a human would miss. And this is where the proactive starts comes in because we can then start looking at that and going, interesting. How does that correlate with other things that are happening on the network? Because we know that this specific interface that's maybe got these couple of errors actually intersects with all of these other things, and there were dependencies that are linked here. So how can we be a bit more proactive about doing outreach to the team and saying, hey, we spotted this thing. We're being pretty clever and smart about it, and we're trying to see if we can spot maybe a memory leak or a cable issue or maybe there's a device or something that's gonna fail, an optic that's gonna fail. So that's kind of where the proactive piece comes in. And where I think we can see where we will see this done really well is where you can use the AI advisor in a situation where, hey, a user phones up and says, hey, I'm having a problem with my credit card processing, for example. And actually, the AI could then come into play. And, you know, again, with my love of vibe coding, I created this really cool call center style thing where you can actually have a conversation with the Kentik AI adviser as if you're an end user, and it goes through and pulls all the data, finds a root cause, then gives her a recommendation out out of the back end or using our APIs and MCP, etcetera. And again, that really takes away that pain of needing to be kind of reactive when someone phones up, and it really changes the way that your organization operates. And then when you come into that preventative stage, this is more about, hey, well, what's capacity planning looking like? Do I have enough bandwidth? Is is this site operating well? Am I seeing an increase of usage of a particular application? Is there something I need to do on the network to make sure this is getting getting better? Have we seen an increase in latency happening? Are there different trends that we're spotting? And that's, again, where you can start making changes and being able to be preventative so you can avoid outages before outages or issues or disruptions. Because disruption is kind of almost the new down now, and degradation is the new down. If you have a choppy experience, that's pretty terrible. For most users, they consider that as down. So again, if you can try and get ahead of some of these things, that's really where the preventative thing happens. But none of this can happen unless you've got great documentation about how your network's mapped out. You're collecting the right data that you need in the right areas. And we're kind of using AI now to answer that question of, well, engineers only know what they know. So how can we help them find the things that they don't know? So with proactive and preventative, hey, we noticed that you're running, know, I know traffic out to ServiceNow or, let's say, Zoom, but you don't have any tests to Zoom, and you're not really alerting on anything when Zoom goes wrong. So, again, trying to help them cover some of these gaps in their in in their kind of in their world. It's I do I wish we could we had a lot more time to talk about each of those in a bit more detail. But I want to go back to you mentioned the agentic earlier in this conversation. I mean, you seeing the the rise of agentic AI change any expectations from customers? Because you've just talked us through a lot of the customer journey points and I think that that's it seems it seems as though the the whole industry is facing this and you're right, latency, just the reaction the reaction portion of of your of your point is very much also related to time. The preventative portion is related to time. Something that we're hearing more and more is the way that expectations are slightly beginning to change just due to agentic. Are you seeing that with your customers? Yeah. Absolutely. I mean, I think I think even for me on a on a day to day basis, you know, I probably find myself less and less clicking around in in the Kentik dashboard now. I'm usually asking the AI for answers because, hey, you know, I wanna try and understand this. Or even it's like, you know, when I joined Kentik, was trying to configure an instrument my own network, and and, you know, I hate reading documentation. So I thought, hey, I wonder if our AI advisor can give me the answer and just type it in and boom, there it is. And I think that we're shifting to that world now where instead of clicking around the UI to make a widget or make a dashboard, now we're asking questions because those questions that we're answering can be reasoned and understood, and the right very specific answer for my specific need can be returned. Whereas traditional static dashboards and views and things like that that are kind of architected for a specific use case often won't necessarily give you that right answer. And I think that's really where we're seeing this agentic piece come in really give engineers the edge. I mean, I think when I've seen our AI capabilities used really, really well, it's really exciting to see how a NOC engineer goes from having to tap on a keyboard to get results to using an AI our AI advisor to ask questions and then be able to get that same result in a fraction of the time has just been absolutely phenomenal. Because at the end of the day, time is the only thing that we can't get back. So we can speed that up and time is money and all those kind of things. So, you know, using AI as that copilot, being able to get those answers quickly, you know, only improve operational efficiency, get your make sure your customers are happy, make sure your boss is happy. You know, all of these things really play in. And, you know, it kind of makes it sound like AI is the silver bullet solution, and it can be if you architect it well, if you plan well, and you have the right data, and you have the right documentation. So it's all important to make sure that these foundations are in place. Just before we go looking ahead, how are you how do you see AI continuing to transform network observability and operations over the next few years? So what's the next step? Yeah. That's a that's a great question. I mean, I think I think we have been talking about things like AIOps for a really long time. Right? You know? And and there's always been this gap in trust. And and I think a lot of that gap has been this sort of lack of really truly understanding and reasoning about the entire network that we have in front of us, doing things like risk assessments on the fly, being able to understand, well, hey, what is the cause and effect of of this thing happening and and making that change? I mean, I don't think I would ever say, you know, the NOC is dead, and it's never gonna come back. I think the NOC is always gonna be there, but I think we're gonna see a transformation where network engineers and and NOCs as a whole, they're gonna be very, very focused on on different things and different problems than than what they have done today. And I think we really saw that when cloud came about because network operation centers and generally engineers were having to deal with speeds and feeds and all of that kind of thing. But with that move to SaaS, with cloud, with SASE and all of these things, IT teams as well as NOCs have had to focus a lot of their effort on things like vendor management because they don't own these things anymore. So, again, I think this is what AI can really do and really help is try and give you that feeling of control back to a lot of these different areas, give you that visibility that you fit that was potentially lost, but also give you those answers that you really need quickly and efficiently by pulling in data from third parties from the network from wherever it may be. Amazing. Alex, thank you so much for taking the time to speak with me, and thank you to Kentik for partnering with us on this SDX interview. To all our viewers, thank you so much for watching. If you miss anything, you can go back and watch this interview. Again, you can also find more interviews on our website. But for now, that is all we have time for. So thank you so much, and goodbye.

In this SDxCentral interview, Alex Cruz Farmer, VP of Product at Kentik, joins host Kat Sullivan to explore how AI is reshaping network operations across three stages: reactive troubleshooting, proactive issue detection, and preventative operations. The conversation covers why companies struggle to adopt AI in NetOps, how siloed network data blocks effective AI investigation, how Kentik AI Advisor and the Kentik MCP server enable agentic workflows for the NOC, what the “agentic NOC” really means in practice, and why “degradation is the new down.”

Along the way, Alex explains how Kentik unifies flow data, NMS metrics, cloud telemetry, and synthetic testing into a single platform that AI can reason over — and how teams use Kentik AI Advisor as a copilot to accelerate root cause analysis, eliminate war rooms, and prove (or disprove) that “it’s not the network.”

What you’ll learn

The difference between reactive, proactive, and preventative network operations
Where most organizations struggle when adopting AI in NetOps
How siloed network data (flow, NMS, cloud, synthetics) gets unified for AI-powered investigation
How Kentik AI Advisor accelerates root cause analysis and reduces war-room time
What the “agentic NOC” is and how AI changes the NOC engineer’s day
How the Kentik MCP server and REST API integrate Kentik with ServiceNow, CMDBs, and other systems
How AI can proactively detect issues like memory leaks, failing optics, and capacity problems
Why “degradation is the new down” and how preventative operations get ahead of disruption
What the future of AI in network observability and operations looks like

Kentik is the network intelligence platform for modern infrastructure teams.

844-356-3278

Platform

Solutions

Technology

New and Notable

Learn

Company

We use cookies to deliver our services.

By using our website, you agree to the use of cookies as described in our Privacy Policy.