Video

Kentik NMS AI Diagnostics Demo

Hi, everyone, and welcome to this Kentik demo. My name is Eric Hian-Cheong, senior product marketing manager here at Kentik. And joining me today is Jason Carrier, principal product manager for Kentik NMS. So today, we're gonna be introducing, and demoing two new NMS capabilities powered through AI Advisor, that are really designed to help network engineers, troubleshoot issues faster, particularly where, when caused by things like configuration, issues or really need to dig into a device to understand what's going on on its in its live state. You know, we hear this a lot from network engineers all the time that, observability tools in general have gotten really, really good at pointing to there being a problem. Right? They they can detect symptoms, and they can really help narrow the problem scope oftentimes down to say it's the network. In a in a sea of a thousand haystacks, they can point to the networking haystack. But then when it comes time to finding that needle in that one hay stack, it's still really manual. You have to kinda dig dig in and, you know, follow hunches, follow the evidence, iteratively ask questions of the data, come back, you know, ask more questions, and go through that, like, iterative loop in order to, in order find kind of what's going on. So our goal with these two new features is to help, really drive that drive time out of that process, and do it in using a adviser in the same contextual window that you have all of your other telemetry from alerts, from flow, from synthetics, and and all your other network network data. So with that, you know, Jason, do you wanna walk us through a little bit kind of what we are gonna be looking at today in the demo environment? Yeah. So we're we're gonna simulate a very realistic human error, which is accidentally dropping a zero out of a a single line of a configuration. But this single zero is gonna make it so instead of restricting traffic traffic to the intended ten megabits per second, it's gonna errantly restrict it down to one megabit per second. So, you know, one tenth of the expected amount of traffic going through dropping the other ninety percent. So this one little typo can be very have a very immediate performance impact, and it can be very difficult to troubleshoot and find. So the process that we're gonna show here is configs and diff capability in NMS, helping us to instantly highlight where the exact configuration drift is. And then the command access part is going to let us verify that the fix that we put in place does actually correct the problem. So if we look at our demo environment here, what we've set up is two work workstations, Ubuntu a and Ubuntu b, that are trying to talk. And in between, we have the cat eight thousand v, with the policy set up on the g e two interface, and then traffic exits on g e three, headed to its destination. So that opens up our our demo environment here. And before we get started, you mentioned something because I I thought it was really interesting. You said QoS. I'm sure a lot of our audience is very familiar with that. But can you tell me, like, why is that a problem? You know, why, like, why would this come up? I certainly don't want people throttling my Internet at home, but, you know, there are cases where you want to do that. Right? Yeah. Absolutely. So QoS is a network's way of acting as sort of a traffic cop. So helping to prioritize the critical data streams over the less important ones, especially when bandwidth gets congested because, you know, we've got to obey the laws of physics. There's always so much that can go over the wire. Right? So the end goal is to optimize the end user's experience and not let any entity take up too much of the traffic. Right? So how this might play out in, like, an enterprise situation, in in enterprise networks, the net engineers really rely on QoS for business critical applications. So making sure that there's enough space for Zoom and maybe your Cisco VoIP calls to go over the network. And then in a service provider situation, it's more about service level agreements where you're committed to delivering, say, like in this case, ten megabit circuit, but you're only actually delivering one in this case, which is the whole problem we're gonna be troubleshooting. Awesome. Yeah. Alright. Let's jump into the demo environment and take a look at how configuration backups and disks and the new command access capabilities with NMS and AI Advisor can help, you know, find this issue really, really fast without needing to do all the manual slot. Alright. So here we are in in Kentik in our demo environment for today, and we've our attention has been brought to this device. So, know, you could have gotten here any number of ways. Maybe you came here because this is the a router that is supporting a service in a in a office somewhere, you've had a user complaint, or maybe you came here from a synthetic test or an alert. But, you know, we're here now on the devices page. We're gonna start our demo, and the key thing to notice is already we can see there's a problem. There's a substantial imbalance on inbound versus outbound traffic. Notably, you can see here that on, gigabit Ethernet two, we've got about, you know, eight seven and a half to eight eight megabits of traffic per second coming in, and we'd expect to see roughly the same symmetrical amount of traffic going out on the, outbound interface. However, when we hop over to outbound, you can see that on gigabit ethernet three, we're hovering at just under one megabit. So we've got almost a seven to eight, you know, x imbalance in traffic going over these two interfaces. So we're gonna we're gonna now look at kinda how AI Advisor can help solve this problem. But very quickly, wanna take a quick stop just to introduce the new configurations tab that is now present on the devices page. Yeah. So this feature solves the problem that used to be solved by having to SSH into the device. Maybe you had to go into a jump server. Maybe you had your configs backed up in a a GitHub repository, something like that to see. This puts them one click away instead. You can look at the entire configuration, scroll down through it a bit. And one of the things you'll notice is the the secrets are actually redacted out of this configuration when we scrape it. So the secrets actually never leave the the agent as we're getting it. You can see that the the secrets have been redacted out. Another key advantage of having this ability and having the configs in the UI like this is you can go and see the the differences in the configurations over time. So if we go ahead and hit the diff button there and go to, let's say, revision ten, this is a highlight of the change that we're making to inject this problem in our our demo environment. Right? So this is a demo, and and we are putting our our finger on the problem, highlighting it instead of looking at AI to find it right now. But we we want you to to see what what the changes that we're injecting before we go have AI try to diagnose this problem just so you can see how this is working. Cool. So, yeah, let's go back. So, again, that is we just saw the actual problem that we're looking at that's actually causing this imbalance. But let's say that we didn't find that or we didn't know to go look for it because we, you know, because we did not invent the problem ourselves a couple days ago. Let's click up here on ask, and we're gonna pull up the AI adviser window here. And we're just gonna start asking a question that to help me troubleshoot this problem. Right? Cool. So we asked AI Advisor to take a look at the configuration on this device, a very very logical, obvious place to start looking, right, in a problem like this. You've you've got like, the device is working. It's up. It's reachable. There's traffic flowing over the interfaces. It looks like it's you know, there's there's no packets being dropped that we can see or or something. So, you know, our already, we're thinking it could be a configuration problem. But instead of having to go to a separate configuration tool, dig through lines of configuration on your own, right, we asked AI adviser to look at it for us. Right, Jason? So what are we looking at now? Yeah. So we we asked a very general question just to help understand why is there a traffic imbalance between the gigabit two and three interfaces given this is just supposed to be a transit. Right? And if we scroll through the output of AI Advisor here, we can see that it's looking at the device configuration. It's doing some analysis. We scroll down a little bit more here. We can see that it's looking at the two interfaces, gigabit Ethernet two and three, and it says that the critical difference between the two is that one of them has this service policy attached to them. It sees that there's QoS running, and it sees that there's a policy map. So the policy map is associated, if we scroll down just a little bit, with that committed information rate, the police search, so that policing action is taking place that and here, we can see that this means that the inbound traffic on g e gig two is any anything over one megabits per second is being dropped. Right? Well, in this case, we're trying to have a committed information rate of ten megabits per second. So as the, network engineer, I would see that, you know, hey. The the committed information rate is one megabits per second. That's not what I was trying to do. That should be a ten meg circuit. Right? So it sees that. And if we scroll down a little bit, we can see that AI Advisor has pointed out that the primary cause is this QoS policing that's been applied to gigabit two, gigabit Ethernet two. It's only on that. And then this means that anything that exceeds one megabit per second is being silently dropped. So this is AI Advisor pointing out to exactly where in the configuration the problem is happening that is causing the traffic to be dropped. Awesome. And I I'll just point out that it's also pulling up some other things in here that are that are just an artifact of this being a demo environment and, you know, us trying to create, you know, create this situation where a adviser can can help us find a problem. Right? But, I mean, tell me, like, Jason, think back. When you're a network engineer, like, how long would finding something like this have taken, you know, manually without sort of the ability to to not right? Like, you didn't have to go and remember what is the device name in Kentik versus what it what was it labeled at as in your configuration file? Didn't need to go to a jump server. You didn't need to do your two factor authentication. There's a you didn't have to open a lot of other screens, and this this just gets you to the answer a whole lot faster. It it's not that this does something that a a network engineer couldn't have done. It just does it a whole lot faster. The time savings is is really the magic in the feature here. Yep. And now it's also worth noting that that I can now ask something like, okay. How do I fix this? And AI adviser is gonna figure out the kind of device that this is. It's going to look at you know? And it's gonna basically give me back some configuration options to inject to fix the problem. So let's take a look at what it does. So here we can see that it's it's brought up three different options on how to fix it. Now as is the the network engineer applying the fix, we need to apply a little bit of judgment. Obviously, anytime you're using AI, you don't wanna just let it control your actions. You wanna apply some critical thinking. So the first option that it brought up here is removing the policy entirely. So this would make sense if we weren't trying to police the traffic in the first place. Right? But if if, like, in a enterprise situation, we're trying to use QoS to protect some Zoom traffic, or if, like, in a service provider situation, we're trying to guarantee ten megabits of throughput, then that that option a is not a good good option for us. Right? Applying the same policy to gig three isn't going to change the traffic going through. That's just gonna make it police policed on both sides. Right? So option b doesn't really make sense. But this option c here, raising the police rate, because one megabit is too restrictive, well, us as the network engineer, knowing that we wanna do a ten megabit circuit here, like, that immediately key in our network engineer that, you know, hey. This is this is a problem, and option c is the way we need to fix it. We need to raise the police rate from one megabit per second to ten megabits per second just to add that extra missing zero. But the the magic here is that the AI Advisor helped to surface that problem very quickly. Just two quick questions. No logging into jump servers. No having to, you know, or or anything like that. Just very quickly from the platform are are able to get an answer to solve the problem. So I opened up PowerShell and SSH into the router and reset the CIR for gigabit three to ten megabits second so that, hopefully, our, you know, our problem is fixed now. I'm gonna ask an adviser to validate that for me. Now there are a bunch of ways I could do this. Right? And if, you know, I could wait for configs to scrape again and just go find that specific spot in it through how many thousands of lines of code. Instead, I'm gonna ask AI adviser to do it. And what it's gonna do is it's going to SSH into the device and run a run a show command so that it can actually go and find the specific policy that we were looking at again and validate that the change took place, which is what you're seeing it asked me to do here. So what it does is it once it asked me to approve the command first. Right? And that's the that's one of the big things to call out is that whenever we're running show commands on a device, it's always going to ask you for to approve that command so that you at least you have the option to say, no. I don't want to do that for whatever reason. In this case, I'll hit approve. And what it's done is it's gone and run a a show command on the device. If I expand where I approved it, you can see the command that it ran as well as the, successful return, you know, what the device actually returned back to AI Advisor. And so let's see. Now AI Advisor is gonna analyze a little bit about what it saw. And the key thing that we wanna see here is that the policy map CIR has confirmed at ten megabits per second and that the drop rate is trending down towards zero. Again, it's probably looking at a slightly longer time frame than we've given it right now because we just did this, like, two minutes ago. So that's why it it's not saying one hundred percent backup, but this is this is the result that we would want to hope to see when applying that fix. Yeah. You can see where it's pointed out there too that the drop has fallen from ninety six percent of the original drop rate. So big improvement by getting our fix in there. Awesome. And then if we close this and we go back and we've just let's just confirm, set this to fifteen minutes, you can see that indeed our outbound and inbound rates are now balanced and symmetrical as we expected to see at the beginning. So thank you so much for taking the time to look at this demo. Jason, walk us through kind of what did we what did we see today. I mean, we saw NMS and AI Advisor work together to look at configs and also run show commands, you know, depending on like, selecting and running show commands to confirm live device state. Why is this, like, why is this really impactful and cool? Yeah. So we we really looked at quality of service today, but this can help you troubleshoot any configuration related issue or any issue where you need to run show commands to get at the root cause. And it lets you do that without needing to swivel chair over to an SSH prompt to go through your jump server, you know, authenticate, go do all that. It uses the username and password, which is configured on the device and and just makes troubleshooting a whole lot simpler. It's also actually really useful for, say, junior engineers or if you have a lot of permission control and you have an engineer trying to troubleshoot something who may not have access to a jump server or be able to SSH into device. Maybe they're they're ******** Juniper folks, and they don't know all this commands in Cisco, you know, and, like, vice versa. We heard we heard from security folks, for example, that, you know, they're they're looking at twenty, thirty different vendors. They don't have time to memorize the command syntax for all those different vendors' devices, but they can use a feature like this and just speak in plain language to ask the questions that they're looking to have answered. And AI Advisor is intelligent enough to see the make and model of the device and understand what command it should run to answer the specific question that it's being given. And keep it all in context. Right? So you can as we saw, you can kinda go back and and say, okay. I did that fix. Did that thing that I talked that we had talked about earlier on in the problem work. Right? Or, like, that didn't work that didn't work for me. You know? I'd find something else. Yeah. And that's the beauty of having all this data from all these different sources on the same platform, not just NMS with what it's pulling in from SNMP and GNMI and Syslog and TRAP, but also what Kentik traffic is pulling in for NetFlow information as well as the performance data that's coming from synthetic testing. Yeah. Yeah. Awesome. So really, really cool stuff and more to come. Please follow the blog. Make sure that you're following us all the time because we've got some really, really great, exciting new things coming up in the near future as well. With that, I think we can wrap up this demo today. Thank you so much, Jason. Thanks for your nice time.

A live walkthrough of Kentik’s new AI diagnostic capabilities in Kentik NMS. Senior PMM Eric Hian-Cheong and Principal Product Manager Jason Carrier troubleshoot a realistic QoS misconfiguration — a single missing zero in a CIR setting that throttles a 10 Mbps circuit down to 1 Mbps — using two new capabilities in Kentik AI Advisor:

Config Context: Backups and Diffs — periodic SSH scrapes of running configurations, stored as backups inside Kentik NMS, with revision-by-revision diffs surfaced directly in the UI. Secrets are redacted at the agent so they never leave the device.
SSH Command Access: On-Demand Device Diagnostics — AI Advisor proposes, runs, and interprets approved, read-only show commands on supported devices, with user confirmation before each command runs.

The scenario is one every network engineer has lived through: a single dropped zero in a QoS policy on a Cisco Catalyst 8000V silently throttles a 10 Mbps circuit to 1 Mbps, dropping ~90% of traffic. Eric and Jason show how AI Advisor spots the traffic imbalance, pinpoints the policy map and committed information rate as the cause, proposes three candidate fixes (and explains why only one is right), then validates the fix by running an approved show command directly from the investigation thread.

No swivel-chair to a jump server. No two-factor dance. No copy-pasting CLI output back into a ticket. Everything stays in the same workflow alongside alerts, flow, synthetics, SNMP, gNMI, syslog, and traps — closing the gap between detection and diagnosis on the Kentik Network Intelligence Platform.