More episodes
Telemetry Now  |  Season 2 - Episode 62  |  November 20, 2025

Beyond Chatbots: AI Reasoning at Scale with AI Advisor

Play now

 
Kentik’s Mav Turner joins host Phil Gervasi to go beyond chatbot hype and dig into real AI reasoning for network operations. They discuss how Kentik AI Advisor uses network intelligence, hybrid RAG, and tool-calling to troubleshoot issues, optimize cost, and democratize access to network expertise. Along the way, they cover architecture, data governance, model evaluation, and why AI has to be built into an observability platform itself, not bolted on.

Transcript

There has been so much activity and discussion and even hype around AI over the last couple of years, in the networking industry, our industry.

The problem, I think, is that it took some time to really understand how to implement the complexity of a sophisticated AI agent system at scale and for real world applications.

And in the case of network operations, it requires an entire world of data engineering and data governance and domain knowledge and all that.

But we're finally here, and it's so exciting. It's so exciting to announce AI Advisor, Kentik's answer to bringing network operations to the next level using the very latest in AI technologies, and that's designed specifically to help augment a real life human being network engineer troubleshoot, fix, understand their network like never before. So we'll get into what it is, some use cases, architecture, how it works, and how it enables programmatic root cause analysis and sits right next to a network team as an expert assistant to help keep the packets flowing.

My name is Philip Gervasi, and this is Telemetry Now.

Mav, thank you so much for joining me again. We just had you on a few weeks ago or a month ago, something like that. But we have just the other day pretty big launch here at Kentik, and I'm so excited to talk about this with you today. Something that Kentik has been working on for quite a while, and it's something that you and engineering your team has been working on internally for quite a while and with and with some select customers. So it is so neat to see what we've all been talking about internally be out there for the public to to look at and to absorb and and see what we've been working on with regard to using AI in network operations. So, Mav, thanks so much again for joining me today.

Yeah. Thanks for having me. And in the AI world, a month a lot happens. So it feels like a long time ago in some regards since I was here, but also just recently. So thanks for having me again.

Yeah. Absolutely. A month is a long time in the or, you know, in the AI world for sure, and networking now especially, because the way that things are going with trying to incorporate these things, it does feel like a lot of folks are almost experimenting, trying to find their footing. And so one thing that I noticed correct me if I'm wrong, perspective is a little different than mine There was a lot of hype around AI and specifically LLMs and LLM wrappers, Texas SQL. And all that stuff is very, very cool, by the way. Love it.

And that was about a year ago, two years ago. And a lot of companies started to talk about that, incorporate it into their marketing, perhaps build, like I said, that little LLM wrapper around some kind of function in their tool, a little chat bot. Neat. And I feel like there was a lot of buzz that started to die out.

I wonder if it's because folks were like, well, what's really the use case here? Or this isn't that helpful. It's neat, but whatever. I feel like that's starting to change now where we're finally starting to see some real practical use for for us, you know, in in network operations, some real use cases and and and some real value finally, you know, you know, kind of coming coming forth from all that effort.

Yeah. The the value word was the one I was actually gonna say if you didn't say it because, you know, like most things with technology, when when it starts when it first comes out, everybody gets excited about the potential. There's a much broader opportunity that everybody's come envisioning, like, here's all the the world of the possibility that exists. And then a lot of vendors, you know, jump on the bandwagon there and talk about how, you know, they're doing something.

And then I think then you get to this, you know, standard hype cycle curve where you start to get to this plateau of productivity. And I think that's happened very quickly for us anyways with AI and the field of networking and this kind of evolution of network observability into intelligence and actually having intelligence and having data and systems that have bigger outcomes than than than we maybe thought we we we could have, which to to be honest is one of things we've been surprised actually at how fast we've been able to to get there with some of these capabilities which has been exciting.

Yeah. Absolutely. Very exciting, especially when we're talking about those capabilities that we're gonna get into more. But, you know, I wanna level set and remind everyone about what we were talking about with regard to capabilities only a year or two ago, both at Kentik and, you know, the industry as a whole. We were talking about, you know, a text to SQL function where we used a large language model to generate SQL queries or or a simple rag solution where where we would use an LLM to query just a, you know, a set of KB articles or vendor documentation, whatever it happens to be. Now, both of those use cases are still very useful, and they're very compelling for a lot of situations.

But what we're talking about today is AI Advisor, which is an AI agent system that does make use of a hybrid rag system, that does make use of internal and external tool calling, that does have a memory component, so that way it's conversational over an entire troubleshooting an entire incident, and therefore really, really useful in actual real life end to end network operations today, augmenting a network engineer trying to troubleshoot tickets, trying to fix some sort of a network problem, trying to understand what's going on in their network. But Mav, before we get into that much more deeply, you did mention two terms: network observability, and I understand network observability because I have worked at Kentik for some years, and also network intelligence.

So before we start unpacking AI advisor, can you explain the difference between network observability and network intelligence?

Yeah, absolutely. So when I think about the difference between those observability is the ability to get, you know, telemetry on any component of the infrastructure. Right? And then and then put that into some place where I can then go manually look for it.

I can set up static rules. I can create dashboards and do all of these things, which is very powerful because a lot of times with how complex and dynamic networks are in real life, it's really hard to know what's going on in these large enterprise networks. And so the observability has really gotten to us this really amazing place where I can start to see all this data. And then of course, this is the standard data deluge, you need to be an expert and the experts are spending a bunch of time working with tools and interfaces to try to get the understanding versus what they really want to be doing, which is like making decisions and acting and being proactive about things.

And so when I talk about network intelligence, it's about that next evolution of taking all of that that data that you have and being able not just to understand what's going on, but to have, you know, we sometimes talk about a a coworker that's really informed, right, they know what's going on everywhere. I can I can engage at this higher level?

The the the cool thing about network intelligence is it's not just experts talking to experts, just like you mentioned. It can be a non technical person saying, is there an issue here? Is there a problem there? And and getting that question as opposed to having to, disrupt the ops team or somebody else to see, like, is there an outage. Right? So being able to democratize the access of the information and the expertise of that observability data to everybody is really kind of how I think about this.

And most importantly, it's the the reasoning and the logic behind it. It's not just a q a question and answer. It's not just a I'm pinging this this service to see if it's up or down. Okay.

There's there's reasoning behind that. Like like, if you say, is this site up? You said you the the answer might be, this site is up and accessible, but you may be having difficulties because of the route that you're taking has had super high latency, so you may be experiencing problems. That's very different than a binary up, down, yellow, green, you know, that nuance, I think, is super important, and that's where it requires multistep reasoning to provide a response like that.

Okay. Yeah. I I remember a while back, we had journeys, which was in effect a rag system without vector DVs using a text to SQL function, which is awesome, and I really you know, I I thought that was fantastic, but that's not what we're talking about here.

Right. You mentioned reasoning several times. Can you explain what that means? What is what is reasoning in in the context of, you know, what we're talking about here with Kentik with rather with AI Advisor?

Yeah. This can be a fun fun topic with with lots of folks. So I'll say what it is what I'll I'll kinda start with a little bit what it what it's not and then and then and then kinda come back and say why I think it is. But a lot of there there's a lot of philosophical debate around is AI really reasoning like a human being, and and I I don't really wanna engage in that.

It's a it's a fun topic to talk about, but but that's not really the point. For me, the reason the reasoning and the logic and rationale is I need to be able to explain why I'm giving you this answer. So if I just spit out forty two is the answer, and you're like, I I don't understand that. That's that that that's not super helpful.

Right? Because you you might say, well, that's that doesn't seem right. That seems wrong, but I can't see your logic and rationale for getting to that number. Can you explain?

And so when we talk about a reasoning system, we needed something that can actually show you, I got this data point, this data point, this data point. I strung those together to get to this output. And so that's why I believe this. And then you, as a network engineer, who understands your environment and your network can say, oh, I see.

You maybe you didn't convert your bit rate correctly. Maybe you pulled this site is actually tagged wrong, and so you pull information about enrollment site into this conversation. And so you can actually understand the rationale, and then you can respond to it and say, this is actually wrong. Go back and fix it just like you would, again, coworker or something.

And they're gonna, oh, okay. Now I'm gonna rerun my, analysis and then give you a different output based on this kind of back and forth. And so to me, the reasoning is is not about, you know, are is this a sentient system or not? It's more about, you know, what's my multistep logical progression to get to this outcome?

And can I display that? Can I show that? And can you engage with that in a way that doesn't you know, I'm not rerunning the entire thing. I can kind of just point out one of the specific steps that maybe was a misstep in the logic process.

So that that's how I kinda think about it.

Yeah. Yeah. Understood. I mean, I'll answer your question and have that philosophical conversation for twenty seconds.

The answer is no. They don't reason like human beings because number one, that presupposes that we know exactly how human beings think and we have, you know, neuroscience nailed down, which we sort of don't yet. But also, I don't think that matters. I agree with you.

The idea is whether it's a misnomer or not, the idea is this recursive process of a large language model determining intent. Right? And then, you know, we one of the definitions for a a an agent system and AI adviser is just that, is this idea that we can ingest information, whether it's from in the broader industry, that could be environmental variables, it could be end user stuff, if you're talking about reinforcement learning, things like that.

There's a planning element, which I think is where we start talking about reasoning. And then there's some sort of action, whether it's producing some sort of conclusion or one day maybe pushing config and running your network autonomously. Right?

And so here when we're talking about AI Advisor, and please correct me if I'm wrong anywhere, I'm kind of misstepping, we're ingesting information from the environment. In this case, the network is the environment, all sorts of telemetry, but also including some unstructured data like text. And you mentioned knowledge bases and configs and all that kind of stuff.

And then, of course, the general purpose the information that's embedded in general purpose models.

But all of that coming together, we use this intent engine, if you will, which is in the form of a large language model that can use technologies like semantic similarity and semantic search and all that to sort of understand, I use that in air quotes, what is the intent of what you're prompting here? So when I put in there, why is my internet so slow in my New York office right now or this past week? What does slow mean?

So now I have an assistant that says, oh, yeah. You know, lost latency jitter. Let me check CRC errors on interfaces. Let's look at flow stuff that's relevant because there's this sort of under I know it's not perfect, Nav.

You know, we're gonna talk about hallucinations at some point, I'm sure. And I'd like to talk about model evaluation and evaluating a system in general. But certainly, there's there's an understanding there, quote, unquote. And and and you know what?

An interesting prompt that I was experimenting with recently, I'm doing a presentation on this soon, was, you know, it was on purpose. I knew there was no data in the system to to respond to this prompt. And I wanted to see what it would do. And it said, all right.

Let me try this. And it ran these things, checking this, you know, running this synthetic test or or checking the synthetic test. And then it said, not getting the results. Let me try this instead.

So there was a plan B. I mean, that's like that so typifies this concept of reasoning, also using air quotes, because it's reasoning in the context of you know, evaluations and checkpoints and understanding, like, okay, what is the intent behind the word slow and stuff like that? Really interesting stuff. So would you say that's the crux of AI Advisor?

Obviously, there's a lot more to it, but that's really the main point here.

Yeah. Yeah. You hit on several several interesting topics So one, the crux of AI advisor is, yes, how have we been able to take the observability platform that that we've had and all this network data in this context relevant understanding of the network and the raw data and make and make our platform accessible to these large language models to use. Because you mentioned earlier, when a lot of the LLMs came to the market, it was a lot of chat and response, you know, doctor office.

It was really great at reading text descriptions and giving you kind of the ability to to answer those. A lot of software companies, frankly, that that was their business, you know, law, a lot of these things are text heavy, really immediately benefited from out of the box because of just the bill, especially once the rag solutions came on the market. We saw we saw that industry specific take off really quickly. But a lot of the the other data and information about our environment, about our world is is not so easily accessible through LLMs.

And LLMs can be a natural interface from a natural language interface perspective.

And in lots of languages which is a fun side thing that we weren't even really explicitly doing but seeing people interact with advisor in also Portuguese, Korean, all sorts of languages is super fun. We we didn't intentionally build that. It just we got that for free. Right?

Super But but being see being able to interact with it and then and then it'd able to get real data. And this is something that a lot of times people say like, oh, well, you know, what's the relevance of your company with AI? AI will just do everything. And you have to remember, like, it's an interface.

It's it's smart. It's in its ability to it can answer your questions. But if you want to answer questions about the real live world, right, it needs to have access to that data. And and this is what you've seen a lot of the models evolve where they can start to get external data resources.

Right? And people start to open up those ability to go fetch websites and stuff. But the but the but but you're not gonna say, go fetch data from my infrastructure. And it really be able to understand that without the context of a platform like Kentik.

And it's also not gonna have a historical data. Right? So we're storing that data over time and trending and and having all of that available to it that if you had no matter how great your AI agent is, it simply wouldn't be able to get access to that because it's not gonna have the historical context. And so, anyway, sorry, a little bit of a a tangent there, but we talked about what what is advisor, it is really this this system that you can interact with and you can ask it questions, it can do the reasoning, it has a plan b, like you said.

We we did have to actually explicitly put in schema information to tell it, this is what you know about, and this is what you don't know about. And this is what you know about if you're managing this type of device in this way because because otherwise, yeah, that's gets a little bit to the hallucination challenge that you've that you mentioned earlier. But we put a lot of I don't wanna say guardrails, but a lot of advice into advisor so that it knows what it should be able to speak authoritatively about and what it should not Yeah. You know, make up.

Yeah. Just as a kind of a silly example, I know this is not a very sophisticated one, but just the other day, I entered, you know, what is the tallest mountain in the world.

Know, any general purpose model is gonna give you an answer. But I did notice that our system said, I I give you answers about your network, about Kentik, if you wanna talk about Kentik, you know, your your telemetry, that kind of stuff. And I'm like, cool, nice, because that is one of those things that engineers are going to literally hammer on until they find the breaking point. And those kind of guardrails again, a silly example, little bit humorous are actually very, very important because what that shows me is that we've limited the scope to what's relevant. And from a hallucinations from an accuracy perspective, that that tells me that, it ensures a higher level of accuracy and relevance because we have limited that scope.

That's very important, I think. Well, it's important for anyone, but especially for engineers that deal with a lot of hard metrics and with specific it's not the ambiguity and nuance of language necessarily. Now what's interesting to me is that we are doing both. We are leveraging the ambiguity and nuance of language to then kick off these planning cycles to work with very structured data with hard metrics.

Can you explain to me then why now? Because it's been a while that Kentik has been working on this. We started with Journeys. The industry is sort of, I think, finding a lot of use cases.

When I say the industry, I am talking about the networking industry and IT in general. Why why now, do you think?

Yeah. I think again, I think initially we saw a lot of vendors race to talk about AI. We implemented it. A lot of people did very similar, you know, chatbots or natural language interface to to answer questions from their KB or to but but I think it's actually very hard.

It requires explicit work to expose the the data in context appropriately to AI. So that this this LLM can actually give you a reasonable response. And a lot of what you're talking about earlier with like your your tallest mountain question is some of that's a lot of like our massive system prompt where we tell it what it should do and shouldn't do. And it's really important not just from a accuracy perspective, but also from a security perspective, we can get to again in the moment.

But when we think about why now, I think it's really, you know, the the kind of natural evolution, but but also a very challenging thing to make a platform that has all of that data accessible the way that a large language model can act an AI system can actually use it effectively. Right? So I think I think a lot of vendors struggle with that. That's something we invested in early on.

And so I think that we've we've seen the the dividends of that. And I think it also requires more than just data collection. Right? So if all we were doing was, you know, some just ingesting a bunch of flow, standard flow, and doing some SNMP, you know, gets in some screen scraping, I don't think that would be enough.

And so I think we're seeing it now because of the other enrichments that Kentik has and the understanding of the infrastructure in addition to the raw data. So it's all of these things coming together and then working with, you know, we could have released this six months ago, but and said like, look, we've got this thing, but we weren't satisfied with it. And we didn't want to set up customers up for that because because everything you just described about making sure we understood edge cases and making sure that as we're evaluating the effectiveness, how often is this giving me a relevant and valid answer?

And then all how are the what are the things we need to put around it so that it gives it so that it is giving it is adding value. But back to your original very original point versus just being like a fun, you know, kind of tool or feature. But but but to me, when I think about advisor, I think it is a core service of the product. And as we go in the future, you know, there is not an add on, it's not a bolt on, you don't say, oh, I'm gonna go add the the skew for AI.

I think fundamentally, if that's your approach, and that's your strategy, then then your product is not then AI is not core to your product. It's it's it's kind of just something extra. And to me, that that is a significant positioning shift and something that that I that we've really embraced here.

Yeah. Yeah. And we have talked so many times in the past, specifically on the the other podcast that we recorded about how the foundation for this entire system, this AI system, is is the data pipeline.

Yep. And so I think that it just makes so much it let we Kentik lends itself to being able to do that, probably more effective than most other organizations. I'd have to either build that from scratch or figure out how that works in the first place, piecing together multiple. Mean, all the data is there.

We also enrich it, so there is that context. But please step me through this is kind of a three part question. Please step me through what AI Advisor looks like. I don't mean like what it looks like on the screen, although you can certainly discuss that.

But what it looks like, how it operates, kind of a quick the specific goals that we, as Kentik, have for AI Advisor and then specifically for, you know, via proxy network operations teams, maybe some use cases. But I also do wanna get into the architecture because I have some questions about evaluation and security and, you know, what what tools we're able to use.

So let's step through that.

Yeah. Sure. So so a couple of things. What does it look like? I will answer actually the what does it visually look like.

Today, most physical or visual interaction is mostly through a user, you know, asking a prompt, right? So it's reactive. It's there waiting when you're ready to ask it, like, again, coworkers there, ask me a question, you ask it. Whether that's a general question or whether you're in the context of a specific page or an alert or device and you're like, I don't understand something here.

And it'll pass through the relevant context depending on where you're at. Another common interaction model is through the API, right? So, a lot of people have and this will take us down a rat hole, so I don't want get just yet, but a lot of other people have their own internal AI initiatives, and they want to have one agent, they don't want to have, you know, their users having to go to fifty different places. So, they have their one primary agent, and they want all of the other agent systems to plug into that, whether that's MTP, A to A.

And so for now, one way that a user would experience value from this would be through that interaction, their kind of primary agent for that company. And we have a partnership with ServiceNow, obviously, that we get a being able to bring that data into the context of that ticket workflow is super valuable and important. So, there's a lot of ways that it can kind of expose itself and be present. And one of the next big things, and I hate to say this because we just released this, but I'm already super excited about the things coming down the roads.

The road is instead of that reactive, I'm gonna ask you the question, the proactive. So it's like, hey, noticed this, we should go look at that. And that to me is another massive step in this journey. But so that's how a person would would experience adviser.

How does it look like underneath the hood? Obviously, you know, our core one of the things that that we look through this that we built from the day from day one was our AI gateway. So this allows us to quickly switch between any foundational model. And this is important because we as we started the conversation with things are changing so fast and rapidly that we need to be able to take advantage of the different foundational models advancements quickly.

And so we didn't want to kind of hitch ourselves to one specific model. We wanted to be able to use the value of different ones. We wanted to be able to fail over to different ones if there's something that happens, right? And so just from an architectural perspective, that AI gateway piece, I think was and I'm not taking credit for it, know, architecture team, you know, designed it that way, was gives us a lot of options and also allows us to, in real time, basically evaluate and continuously evaluate, which we do the the output of these different models to see if things are changing if there's drift to see if if we're getting better responses from one system or another.

From a customer's perspective, I kind of wanted to abstract that complexity. We do we do we can work with customers if there is a kind of corporate policy that they can only use OpenAI or only use Anthropic. We can set that up for them very easily. But but in general, I don't want you to to be an AI expert to get the value of this system. Right? I want you to the the network engineer to get the understanding. So that AI gateway is a core piece of our architecture that allows us a lot of flexibility and allows us to ensure that we're providing valid the the best response possible.

And then beyond that, you mentioned earlier, there's tools. So when we use the word tools, there's internal data sources that tools have access, and then there's external tools. From an internal tools perspective, this is all the different you mentioned this earlier, synthetics performance data or NMS data sets or BGP information or flow data. Like, how do how do we make sure that that data is being presented to the to the AI in a way that it can use it?

And so that's where, like I said earlier, we spent a lot of time setting those up and we're continuously expanding even some of the internal tooling. From an external tooling, you know, today there's a few external systems mostly as proof of concept and the design versus like the the full response. But who is? Right?

I just wanna get some basic information when I'm giving a response as opposed to what might be inside the system. Appearing DB info. You mentioned NetBox earlier. There's a lot of integrations here to to pull data out.

You know, we we we are kinda drawing a line to say we don't necessarily want to get to full orchestration in the near term. That's not our objective. We want our system to be able to provide the best data to then drive any of these systems. Right?

But but there is value in a real time incident being able to SSH into a box, wanna show command, get that output, include that in the troubleshooting stuff that that AI is doing. And those are things that are right around the corner on on top of what we just released. But but at a super high level, you know, that that's kinda how we think about it. And again, interaction models primarily UI, but also I've been talking to a lot of customers about either MCP or agent to agent and how they want to engage both with us, but then also where we might go hit another data source, some seem to be or some other system via an agent to agent communication to enrich our response as part of that analysis.

Right. Okay. So AI advisor is an agent system using any any number. And I I did look in the settings, so I saw which specific foundation models we're using, of course.

And that it is just a point and click to change between models. I did not see any local models on there. So is other than other than very specific corner cases for those kind of customers that require it, generally speaking, are we offering local models as well?

And if not, there is no fine tuning going on. This is purely relying on rag and tool calling. Correct?

Correct. Exactly.

Today, we don't plan to have any local models. The closest equivalent to that, which is not like, you know, deploying Walmart on your in your own environment, is the ability for a customer to bring their own. Meaning, they say, wanna give us their their key for their OpenAI instance and they want to set it up like that so that they're not using ours because again, most of that's corporate policy. Although Yeah.

To be honest, it helps us on the COGS if they're gonna pay for all of the AI usage, then that's that's great. But but But that option does exist. We aren't finding customers that actually needed that just yet. But as more people use it, we expect that might.

So that's less of a locally deployed and operated more of a customer owned and customer responsibility. So it kind of isn't that answers that, but yeah, certainly not fully training local, you know, model environment at all.

Yeah. And and that's a debate. You know, there there are a lot of folks I know that, you know, and there are mechanisms to reduce the complexity and the burden like LoRa and stuff like that of fine tuning a model. But certainly with a really extensive rag system, like a hybrid rag system like we have with adding the context and the appropriate tools to grab more data, you don't necessarily need to fine tune a model to have a really effective, useful system.

But it does make me so if we're using foundation models and some, you know, model API, how how are we ensuring the security of information and that whole data governance question?

Yeah. Yeah. So it's a great question. There's there's a couple of things. When you think about, what data is going through the system.

So obviously our platform has all the data and this is mostly acting on the data that we have collected so far and that is not storing or persisting any of that outside of the context of that user's tenant. So, you know, we're not pushing this across other environments. We we are using the the data to learn from a product perspective, so we can look at logs and look at that. But when we think about the AI system itself, it is not kinda learning from different environments.

That that's a very manual thing that we're doing today where we're reviewing and seeing where customers are tripping up and and then adding, you know, different responses and and and tuning on on the the system prompt there in order to give better responses. But another area to think about that that I always talk to customers about is you really wanna put in your custom network context, which is a very specific feature that I hate to jump to that that detail, But but that includes like your IP ranges, your naming schemes, your office hours, your contact information. There's a lot of information there that that that the more it has, the better the responses it will be able to give to you.

Right. And so but that is that is your data, that data does not get uploaded so that another customer can ask, you know, what's the after hours, you know, contact information for for this customer? What's this? What's the cell phone for this network engineer that works at this other company?

But but there's a lot of valuable data that we're looking at there. Runbooks is another thing where what is your procedure when there's an event? What's this what the steps you wanna follow? I could see us very soon being able to share runbooks, and I would love to be able to have users sharing these things together.

But you'll be very sensitive there with with data that might have IP addresses or hopefully nobody's got credentials in these things. Yeah. But but so so we're keeping an eye for those those types of things. But generally speaking, you know, from a implementation perspective, the the the walls are very, very solid as far as being able to ensure data is not going between customers here.

Any of that was one of part of our part of our quality work on ensuring AI is our red teaming of the system, right? Trying to jailbreak it, trying to get it to answer questions.

A little more complex than the generic asking it about the latest football score or something that's not a network related question. But that's something that we've spent a lot of time doing, not only internally but external resources too because obviously you don't necessarily you don't just want the people who built it to be the ones that test it, especially when it comes to security and making sure that, yeah, we're not we're not doing this. But from a design perspective, it doesn't really allow for that. So we weren't surprised by by our results on the testing there.

Are you also able to explain, you know, to the extent that certain things are are proprietary and things like that, how how we evaluate responses? Because you did mention that there are several models available, which suggests that you may end up with a slightly different result depending on the model or an update from a model or just using the same model twice in a row. When I say model, I'm specifically referring to the LLM, of course, that we've called out in the settings. So how how do we evaluate internally then say, hey, AI Advisor is, for all intents and purposes, a trustworthy assistant in your NetOps?

Yeah, absolutely. So it's a continuous process, as I mentioned earlier, where we have a set of test cases that actually execute the same test against each of these different model providers. So we ask it the same question, and then we evaluate the response.

We're both using AI in that evaluation, but also having some very strict evaluation and then an occasional human review, right? So there's kind of three pronged approach. So, we ask the ask the same question and make sure it has access to the same data of each of the different models. And then the automated tests run, they analyze those results to see like, is this accurate? Is there something different? Deltas, the different responses compares to a expected baseline response.

And then we can kind of see there and then we can also, so that's the value of model that we can also ask AI to then look at the different responses and compare and give. So just another interesting data point. And then again, we take a look at it, you know, with our own eyes to look at it and make sure that all those things are coming through. And, know, again, for us, the things that make the most difference in the response are obviously the data that the platform has.

So how much data is being ingested more than the model and then the custom network context that I mentioned earlier. So the more you give that, more specific the response is gonna be. Those things tend to matter a lot more than which, you know, model providers. So with this that we're using, we're Anthropic and GPT, ChatGPT and all these things that are kind of for our use cases and what we need them to do.

Their book smart of the network knowledge is comparable.

Right? So we're don't seeing big differences there. The difference is the platform, the data in the platform, the specific context and all of all of that. That's those are the things that really drive the the difference.

Does does prompting still matter then?

I mean, it always matters.

We're talking about large Yeah.

No. Yeah. You're right. You're right. That's that's I I didn't mention that and you're absolutely right. You know, asking the the asking the question, I wanna say the right way, but but being flexible and being a little bit iterative and experimental with the way you ask questions. If you don't get the response you expect the first time or something seems wrong, it's like kind of what I was saying earlier about the value of that that logical reasoning.

And tell it. You know, one of my examples, I can't remember if I used this last time, one of the product managers on my team was like, oh, it's good. Gave this response, but it didn't include this in the table or I got this wrong. Right.

I was like, well, did you reply to it and tell it to add that data? And he was like, no. And he did and it responded. And so that that, like, like, exploratory model, like, that's super important.

But, yeah, prompting is is very, very important. Okay. Absolutely. Yeah.

You know, what's interesting is in my own experimentation and then also in talking to folks that have been experimenting themselves both internally and then, you know, customers that are using it.

Some folks say, oh, I always it says hide reasoning. When you put in your prompt, it starts to think about it, and it's showing the reasoning, all the different things that it's looking at, and that's awesome. And as an engineer, as somebody who's also skeptical, I want to see all that. And I've talked to folks both internally and externally that were like, yeah, I absolutely keep that expanded. I have also spoken to folks, not internally, but maybe there are, that are like, oh, no, I minimize that. I don't want to see it. I just want the result.

But it is it is very interesting to me to see when I put in something that is that could be ambiguous. The example I gave earlier is why is my network slow in my New York office, right, yesterday?

There's so much embedded there that I'm like, I'm looking at the reasoning, and I see, oh, it's looking for interface errors. Oh, it's looking it's it it it created a, like, a time frame for yesterday. You know, it it understood what I meant by all these things and created the appropriate parameters.

But but I wanna know now what tools are available, what what external databases. We mentioned rags, so I know that there's the the Kentik knowledge base and and, you know, all that kind of stuff. But what are the actual things that we're looking at that I could use Kentik or rather AI Advisor for? And then what specific tools are available now?

Yeah. So when we think about, you know, from the Rag perspective specifically, we're looking at our documentation or KV, those types of things.

Maybe almost a little bit to a fault sometimes where people want to say let's make KBs to have the system give us a better response. Like, that it makes sense. But there's better ways to do that, right? With the custom context usually can provide that. But when we think about the internal Kentik tools, we're talking about the telemetry data, the metadata from our inventory information, anomaly detection engine, cause analysis engine, forecasting, alerting.

We have some of our pricing information that's all in there. NMS, synthetics, we're really pushing all of the data in Kentik and we want to make that as available as possible. Okay. And and but the reality is though, you know, I hate saying this because it feels a little bit sales y, but if you're not one example we've seen a couple times is a customer may have NMS and they may monitor like two devices, But if they're not collecting all the data there, then it's not going be able to give them the answer about that.

And so, you know, it does make a huge difference the more day, which is why our platform approach I think actually is the right one. Even though I don't think we intentionally did it because of AI, it absolutely, you know, accelerates the net value that the customer gets, the ability to really see everything in the environment. From an external tooling perspective, you know, I mean, NetVox is one that's come up. We're looking at, you know, other data lake systems, different data stores, Prometheus, Elastic.

We're looking at, like I said earlier, the ability to just ask one of the customer's agents for additional information help append. That's going be exciting. That's more kind of coming down the road. We've talked with customers that want to have like iTential integrated for more of the action.

Right? We've informed it, now that system can go act on it. So that's one that is very active development at the moment. We're kind of constantly every day doing something on, so that's expanding even even further, but there's a lot of good stuff that we have on that today.

Yeah. Yeah. And I agree with you about the platform approach and the data and this AI system going hand in hand. You really can't have an AI an effective AI system without that underlying data.

So I I agree with you that that's probably, the main, in my opinion. You know, I I I don't work on your team, so maybe there's other things that I I should be considering. But I feel like that's one of the main reasons that we are where we are as a company in the industry right now as compared to some other folks that are still working on just a telemetry type. Now, I will say that for me personally, in my learning experience, I do focus on keeping things very, very simple, a very simple naive rag system or a very, very basic hybrid rag.

When I create an MCP server in my basement, it's with one tool. One you know? I I define one, and I just try to make it work. That's really my goal. But the reality is for a full fledged network operations implementation, the complexity gets pretty intense pretty quickly.

And and that's not even including all of the the things like you've mentioned now, like the security considerations and data governance, the regulatory bodies that we have to abide by from time to time, or our customers have to.

And, of course, all of the observability parts of it, looking at the lineage and why is this tool being called instead of this tool, and are the descriptions, my JSON schema, are they sufficient so that way the LLM knows which tool to call, That kind of stuff. The complexity becomes very great very quickly. But I think that's just normal in any sophisticated system. I have had that as or heard that as an objection. Oh, it gets complicated. I'm like, have you not used the internet? I mean, everything is complicated out there.

That there but but the the the complicated obviously, not it's it's not a surprise that the Internet works.

It's a surprise that it ever worked at all. I I forget exactly something right. Yeah.

But I just see it as anything sophisticated that there's an element of complexity. That's not the part. I mean, kind of engineering solution, the value is not in how simple it is necessarily, but is that complexity necessary to solve the problem? That that's what it is. So, you know, sometimes it's out of necessity.

And and I But that's what we're trying to do.

Right? We're trying to reduce that complexity for the user. Well, we abstract it.

Right? For sure.

We're that that well, we try. That's what that's what we're that's that's our objective. Absolutely.

Yeah. Yeah. I mean, you and it does we mentioned this in our recent podcast a month ago, the build versus buy argument. I mean, when when you do start to build this out enterprise scale to use in production, that makes that build versus buy conversation that much more important. It's like, jeez, Louise, this is a lot bigger than I thought if I'm gonna start including all of these data types and external tools and, you know you know, understanding where those points of breakage and latency are.

Now, for a network operations team using AI Advisor right now, you said it's included in the platform.

And can I use it for, you know, real time operations? Is it just historical data? How how real time are can we get?

Yeah. That's a great question. So, you know, it depends.

So, yeah, advisor is gonna talk to you about the data that is in the platform. Okay. So, if you're pulling something, I'm just going to make an example, every ten minutes and you ask it for a status information and it happened to just pull it ten seconds seconds ago, great. And if the next poll isn't for ten minutes, that may not necessarily give you the answer you want. And that's where I think, as I mentioned earlier, I think one of the next big things I'm excited about doing is being able to do kind of live troubleshooting or live interrogating.

Started synthetics test to go check this connection, right, right now in the context of troubleshooting. But it's live in the sense that it's active, it's acting on the data in the system that we have, so it just depends on how granular and recent that data is. But, you know, a lot of questions we'll get are, like, are all the are all the BGP neighbors up in in Austin? Right?

Or can machine a talk to machine b on port twenty one, for example. Right? Or show me all of the incidents for the last on November first, were there any Azure instances? Right?

So instead of me having to go through and comb through and try to figure these things out, I just ask it the question that gives me the response. And that's I think the big moving fast and having a difference and and taking off the burden, you know, create a dashboard with these devices and this utilization. Right? And we're really just quickly see that, ask it and see it.

You know, why was there an outage here? Why isn't it slow? Right. Can I save money by changing my peering?

Like, that's one of the most amazing ones I've seen where customers ask, you know, to optimize its peering cost and it'll say, hey, you have a free peering connect here, but all your traffic is going out through this this other peer. If you change that, you can save and based on the the the data we have around some of the costing, it'll give you a real estimate. And we've had customers that have seen where advisor will say, you can save three hundred thousand dollars a year by doing this, right? And because we have good data on some of those.

Again, always recommend that customers go validate all of these things before they're Of course. Just push the go button. But being able to really quickly find that without having to manually sift through all of that and try to understand where these optimizations exist, whether that's cost or performance, I think is what's going make a big impact immediately for for customer. And it has frankly has from what we've already seen.

Can you give me some examples of use cases then that you've seen? And and and when I say use cases, certainly some specific prompts that have come up in conversation in the near and and and all of that, of course. But maybe from a little higher level, you know, from a security perspective, from an operations perspective, you know, network operations perspective, from a from like a a leadership perspective because you're talking about cost, which is not typically the purview of like a line engineer, frontline engineer, you know.

Yeah. Yeah. So this is those prompts are probably more like, give me the uptime status of agent blank, give customer data name, but you give me agent status for this specific machine for the last two days.

What I mentioned earlier, that connectivity test, right? Does this system connect with that system?

Which what's the IP of the machine that has device tag ABC123?

Show me all the devices with MTU lower than some value, nine thousand one hundred, whatever. Right? So you can really quickly start to find some of these devices. Show me all devices with duplex mismatch in their in their link.

Show me the oh, what what is the status of this environment? I mentioned earlier, like, are all of the BGP neighbors in Austin up. Right?

When was the last error on this device? How many errors a day do I see from this device? Right? What's my highest error device?

These are these are all things that we've I mean, every day I'm seeing customers interact and ask these types of questions. Yeah. You know, what is the IP of some machine name, right? So instead of going to the device, you're looking up just what's the IP of this device?

And so that they can go do something?

You know, why why is this device showing down?

You know, again, we could kind of I could go on and on. Yeah. But like, are like any any any question and that's actually, you know, troubleshoot this alert for me, right? You're on the screen, see an alert, what's going on?

Now, ask it to troubleshoot and give you all of the relevant data to resolve it. But the simple way that I respond to this is, what were the last three tickets that you got that you had to work on? Take the question that came in from those tickets exactly as they came in, put it into advisor and see the response you get. Right?

Or what was last time when somebody Slacked you a question or asked you a question about network status or information? Just take some of those very recent examples and just put it in and see what happens. And I think that a lot of our customers have been very surprised at how accurate and how quick they got responses to just the same natural thing that they're doing day to day.

Yeah. Yeah. And I and I really do agree with you about the reasoning part of that entire thing because you just gave several show me x y z. Right?

And those are great. But I have to admit, I prefer to experiment with those prompts that are beyond show me, beyond, you know, show me the top utilization interfaces. I love the ones that are ambiguous where it has to think about it and come up with a plan. You know what I mean?

What's wrong? I would say what's wrong What's wrong? Okay. Exactly.

Like, what in the world does wrong mean?

You know, does wrong mean high latency past, you know, a hundred milliseconds? Does it mean down? Does it mean Yeah. You know, taking an inefficient path?

Does it mean and and so an example I've been using recently was or is are there any DDoS vulnerabilities on my network? That's very broad and, you know, maybe not a great question, but I wanna see what it does. And it starts saying, alright. Well, what is your network?

And which network should we be looking at? And which devices would be vulnerable to that? Oh, look at that. It's looking at WAN interfaces on my public facing routers.

It's not looking at wireless access points and stuff like that. It's looking at specific things and it knows what, you know, those vulnerabilities would be and are those so there's there's a lot of reasoning going on.

I think I mean, opening question, there malicious traffic in my environment? Is there a lateral movement in my environment? Is there communication with embargoed IP address addresses or suspicious? Like, those types of just general questions, I think you're right.

I think typically, we're starting with a very specific problem we're trying to solve versus more stepping back and saying, you know, how can I optimize my network? Are there security things I should worry about? Are there are there trending outages or impacts that where where's my next outage gonna be? Right?

How is it gonna respond to to that?

Now it's a crystal ball. So Yeah. Yeah.

I could I'm I'm not saying it's gonna, you know, be a crystal ball, but it'll be a little better than the Magic eight Ball and we'll tell you, like, looks like there's training issues here. Historically, you've had these outages at this time of year or, you know, there there's definitely interesting responses to your point about exploring versus, like, very, you know, more narrow, show me type of commands.

Yeah. Yeah. Absolutely. Okay. Great.

So is AI Advisor available in the platform if somebody signs up for a free trial on our website?

Yep. Absolutely. Okay. Depending on where you're at, you may get it on by default. You may have to go to the AI settings and turn it on. Just depends on what jurisdiction you're in, but it's there ready to go when you are.

That's great. So somebody who's already using Kentik, they already have it, and they can start experimenting with it and and perhaps, you know, reach out to somebody at Kentik to get some help with that.

But also, if you're new to Kentik, you can sign up for a free trial, which, to be fair, doesn't have a ton of data in it because it's a free trial. So we don't know anything about your network yet.

But you could still see AI Advisor and start experimenting with it and see how powerful it really is. So as somebody who's been a network engineer for many years and who has been very focused on MLOps and now AIOps in the past five to seven years, I think this is such a compelling tool, system, and all all grounded. I'm not a data engineer, math, but all grounded in that, like, the the fact that the heart of what Kentik is about is is kinda like a data analysis company. So it just forms the foundation, and I think it's just fantastic. So is there anything else that we didn't cover that is important in this discussion about AI advisor today?

No. Mean, think I would just ask, you know, especially if you're an existing Kentik customer, at the very top, you'll see the ask button. Just ask it, just engage with it, play with it. It's not going to judge you for asking it a weird or wrong question.

I would say just just just play with it, right? And I think you'll learn really quickly and get value out of it. Because I think people are hesitant because it's a very different interaction model. And so I'm saying, don't let that bother you get in there, ask questions, see what you learn.

And if you have feedback, we'd love to hear it, right. So anybody out there that's using it, I want we want to learn more talk, we talk to any customer that that's willing to talk with us and give us feedback, because we're trying to help. Right? And we think this is the next big thing and how we could help our customers run their networks better.

Absolutely. Absolutely. So to our audience, if you want to give AI advisor a try, experiment with some prompts, kick the tires a little bit, and take a look at that reasoning section and see what kind of results you can get. All you have to do is go to Kentik dot com and go to the upper right and start a free trial. You're going to see a little button that says free trial there.

And from there, you can enable Kentik AI, you can experiment with AI advisor, you can even send some of your own network data into your own dedicated portal and start asking it some questions about your network.

So for now, thanks so much for listening. Bye bye.

About Telemetry Now

Tired of network issues and finger-pointing? Do you know deep down that, yes, it probably is DNS? Well, you're in the right place. Telemetry Now is the podcast that cuts through the noise. Join host Phil Gervasi and his expert guests as they demystify network intelligence, observability, and AIOps. We dive into emerging technologies, analyze the latest trends in IT operations, and talk shop about the engineering careers that make it all happen. Get ready to level up your understanding and let the packets wash over you.
We use cookies to deliver our services.
By using our website, you agree to the use of cookies as described in our Privacy Policy.