Video

It's always the network

Anyway, so I live in Maine. I enjoy hiking. I have a dog, no kids, and a beautiful wife. And then I'm a solutions engineer, and I've been working with NetFlow Data for over a decade, specifically, like, flow data and SNMP and network metadata. And this kind of presentation is gonna be kind of a personal antidote to why I really like Kentik's approach to what they're doing with Flow Data and specifically synthetics, and we'll kinda see that through the presentation. And then, of course, my favorite programming language is the best programming language, which is Python. Alright. So agenda, why we love Flow Data? I mean, I've been working with it for over decades. I'm gonna kinda explain why I like it so much, and, obviously, Kentik does as well. And then we'll just implement NetFlow on a sample network, investigate a user complaint, and then kinda come to this problem that I kept running into working with a cup working at a couple other vendors that just always came up. We didn't really have an elegant solution to it, and I really do believe Kentik does. So I'll cover that. It is basically gonna be a case for synthetics. And then I'll do a quick demo as to kinda how this looks in the product. Okay. So why do we love flow data? Right? The biggest reason I like it is low friction. So you've already like invested in this network equipment, you own it, you're really just turning a protocol on, right? So implementing it and getting up and running, I always liked as a vendor specifically, you know, water. Thank you, assistant. My boss, you know, so my boss's boss, actually. I always liked, like, being able to get to POC quickly. Right? It was like, I mean, worked for, I've worked for like more on prem kind of legacy collectors, we'd have to install an appliance, then you just like turning flow data on and you're sending it and you're just up and running. So I really like that from a from from just a protocol standpoint. You bought the equipment, you might as well use a protocol, and it gives you visibility across the network. And then really, it's kind of stood the test of time. Right? Vendors are still implementing new stuff. The protocol is getting a whole bunch of different metrics added into it. And then, of course, Kentik, as you've heard from the other presenters, can enrich the data and make it better. Right? It's kind of background and kinda why I like it. This is gonna be our sample network, So not that big of a network. I kind of went small because it's easier to see on a big screen. It's not even any firewalls at the branches or anything, but this is kinda what we're gonna use for our examples. And we'll do just a quick refresher. I think sometimes we kinda glance over some stuff that I know this is gonna be presented out to a large audience later. I'm sure everybody in the room knows how NetFlow works, but I'll just kinda go over that a little bit. So if we take that kind of network, right, and we go to our router, which is just the first hop in from the internet, and then we're gonna put a flow monitor right on the ingress interface, which is gonna be that green arrow, probably could have used slightly different colors than green and blue, but the inbound is the green, right? And then I have another inbound flow monitor on the side that's data center side, right? Observing traffic coming back the other way. So what we get with NetFlow data, as traffic goes into that router, is I'd see things like the source IP address of the host, the port, I can see the interface that came inbound to, the protocol, destinations it's heading to, the port it's heading on. Right? And then really the point I'm gonna be kind of expanding on here is you get the amount of bits and you get the packets. Right? More specifically, it'd be like the octets. Right? The octet delta count, the packet delta count. But, you know, just for simplicity, we'll say number of bits, number of packets. And then, you know, Kentik's doing a whole bunch of awesome stuff at ingest where you can get business context, like Justin was saying. We look up IP addresses to do GUIP and host reputation or correlate them to autonomous system numbers. So there's all these other rows or columns in the database that you kinda get for free, well, if you're paying for the license, but you get at ingest, right? And it kinda builds out each row in the database. So that's kinda just generic flow data, right? How it works. And then if we were implementing this in a network, right, I just kinda wanna highlight just the difference when you're thinking about this stuff, right? And in this path here, it's just a switch where I'm going back and forth into like an application, but maybe like the application that's a front end and a back end, right? So if I just had flow monitors on the router, right? And the front end was talking to the back end through a switch, and I didn't have flow monitors on that, I would never see that traffic. Right? So flow data is just really all about observation points. Wherever you put the monitors on, you're gonna get visibility to that. And a lot of times, just, you know, I've worked with network engineers that need help because, you know, everybody knows all like, network engineers know a lot of things about a lot of things. When you're very specialized in a protocol, like, you know, just stuff like this is things that I find helpful to go over when you're just kinda doing a one on one on FFlow. So just kinda thinking about how traffic moves to your network will help you understand maybe where you wanna apply flow monitors, right? All right, so in our network, what we're gonna do is just we're gonna turn NetFlow on everywhere. It's every switch, every router. We're gonna just turn flow data on that's represented with those kinda arrows. And now I could see like, you know, in branch a, I could see the top host communicating to the bottom host of the switch. Right? I could see a host communicating with the application. I could see the application communicating back and forth. And this is gonna give me the old, you know, adage for NetFlow, which is who, where who is the IPs, what, right, so the ports and protocols, where would be the interface or the device that traffic flows through, and then the when is, you know, the timestamp of when it happened, you get timestamps in that flow data. And then how much is really what I'm gonna drill into here, right? So how much, right? So how much traffic is really what flow data is meant for, but also it can kind of put you into a position where you maybe can't answer a question. So what does that mean, right? So we've probably been in this situation before, right? Application slow, it's the network, right? Title of the slide was, It's always the network, so we kinda start here. So if I have NetFlow enabled everywhere and you can't really see the subtext here in this picture that well, but basically what this is showing is I have an interface and it was maxed out and I can see that in that flow data, right? Because I get how much traffic. So I can say, yeah, it looks like the link was maxed out, right? And if you just say that to the boss, the person is complaining, what are they gonna ask you? Right? Maxed out by what? And then because it's flow data, we have IP addresses. Because you're using an awesome solution like Kentik, we're enriching that with autonomous system numbers. You may be getting, like, N bar from Cisco devices or some Palo Alto application recognition. But we can quickly pivot. And, again, you can't see it, but it's Netflix. Right? So we could say, oh, it's Netflix traffic that's coming in and that's maxing out the link. And then this guy can go yell at somebody else. Right? Because we could see who the IP addresses were. So that's all well and good. Right? That's a great use case that Flow Data can solve for us. Right? So in the rare case that it actually was the network and we think about given our little refresher on Flow Data, what can it help us spot? Well, we can see if an interface is maxed out. Right? We could spot something like a DDoS because I'm gonna have a bunch of IP addresses coming inbound or router, and I can see that flow data volumetrically. I could potentially see routing issues if I'm looking at all of the hops in the path, and I see things changing around. QoS changes, you'll typically get QoS or TOS in in flow data, so I can see that. And then, you know, BGP enrichments like Justin was talking about, it's not really in NetFlow, but because we're talking about Kentik, we'll just include it here. Some, you could see traffic getting blocked. Amazon will give you firewall events like things getting blocked. They're usually firewall vendors include that in their flow. And there's a bunch of other stuff, right? What I do wanna highlight is, you know, flow data isn't, to me anyway, just for troubleshooting user complaints, right? There's a whole bunch of different things you can do with it with, you know, somebody asks, you know, who's connecting to an application, guys wanna migrate it, right? Or just capacity planning stuff. You can do a whole bunch of stuff with it, but when we're talking about troubleshooting an issue, these are some of the things that we can spot. Right? And so what I wanted to say this was kinda like a like a personal thing for me is because at the some of the vendors I worked for, right, we'd be doing demos or even working with customers and you'd get into this situation, right? So same as before, application's slow, network's broken, because again, it's always the network, and then you look and everything's normal, right? All the interfaces are normal, The like, nothing looks maxed out. Nothing looks weird. Right? Everything kinda just looks fine. And then the next question that typically a customer would ask me when they're looking at this is what would they wanna see? Latency, bag of loss, jitter, right? Let's say, oh, can I see the bag of loss, the jitter latency for this? And that's when you kind of go, well, that's not really included in NetFlow data, you're just getting volume of traffic. So you're kind of stuck in the spot and then you have to make some decisions. Right? So if you kinda get to this point where you've done everything you've done with NetFlow data, then you gotta have to determine, well, what's next? Right? And so I wanna start troubleshooting performance issues. Where do I go from here? And there's a bunch of different options you can choose from. Right? Packet capture. There actually is ways you can get performance data out of NetFlow with, like Cisco does performance monitor, which I'll cover a little bit. You can span traffic to a NetFlow generation appliance. You can use TWAMP, SNMP, host agent based solutions that are actually looking at an application. And then what I'm gonna kinda do is make a case for why Synthetics, I think, a really good first pass of this. So we'll cover a few of these, right? So I can use packet capture, right, where basically I'm tapping a line and I'm looking at all the packets going through that line, and it's gonna give me incredibly high fidelity data because I'm actually looking at the packets. And then, you know, to get things like network delay, what it's really doing to calculate this is we'll say from branch A, that top laptop, right, sends a SYN packet to connect to the server. As the SYN packet crosses across the tap, we start a timer, and then the server's gonna SYN ACK back. So between the time I observe the SYN flag and I see the SYN ACK, I get my server side network delay. Right? So how long after the SYN do I see the SYN ACK? And then that goes back to the client, and then I ACK back to close the three way handshake, and then I get the client network delay. Right? Because I see the SYN ACK, and then how long after that does it take for the client to respond? And I kinda get some performance metrics. And again, there's a whole bunch more that packet capture's gonna do to help me diagnose these things. But at the high level of like, is it the network? When I'm using PCAP in this sense, I can kinda infer if I saw client performance issues, I would know that in this case, it's to the left of the packet capture. Right? But I wouldn't know, you know, which particular hop that was happening on or where on the network. I would just know, yeah, we got got some client performance issues, and it's to the left of the packet capture box. Or if it was a server issue, I'd know it's to the right. Right? Again, this is a small network. So the problem kind of here becomes like, if you wanna scale this out, it becomes a higher friction implementation and can be costly, right? Because you're tapping more links, you have more packet capture boxes, the bigger the network, the harder it becomes to scale, which gets me away from what I really like about flow data, which I said is just low friction. You can just turn it on. Right? It's kinda easy. You already own the technology. You don't have to go tap these links and do all this stuff. Right? So you could also use flows with performance monitor details. Did anybody actually set this up before? Performance monitor? Yeah. It's not great. It's it's the the the configuration like, I've I've I've done the configuration because, again, I'm super specialized in this protocol, and it's, like, it's very proprietary to Cisco from what I've used. I never really trusted the data that much, but it was a cool idea where basically the routers would do kinda like, you know, tracking of sessions, and then they put in statistics like latency, and round trip time and stuff like that. But you run into kind of a similar problem where you have to have, you know, the right it's typically only gonna run on routers like ASR one k's and and, you know, things like that. But then also, you have to have the routers everywhere so you can see, like, where you know? Because if you're only observing at this router, then you know that the client's performance to the left of the router, but you don't know what hop it's at, right? So hopefully the other routers can do performance monitoring. But again, I never really liked it as a solution. You're kind of locked into a box with the vendor, both from reporting side and you have to have a vendor that could actually report on this IP fixed data, right, because it's unique elements. So it is a solution and it definitely can work for people, but I never really liked working with it professionally. And this is kinda what I led with at a vendor because we didn't really have a better option. This is kinda how we went, and I never really liked it. And you can use I'll just cover one more. Right? You could span traffic instead of tapping it, so you send all the packets into a SPAN device, you start generating NEFL data out, and there's, you know, you can get more metrics with it, but again, kind of run into the same issues. And I always kind of found it hard if you're spanning like multiple interfaces into a single egress into the span. It was really hard to kinda determine, like, well, what interface is actually seeing the issue? Because it kinda all looks like one interfaces. I think that's probably been solved with other span devices since I've worked with it, but that was something that I always kinda didn't like. But, anyway, blah blah blah. Right? So you can troubleshoot this in all kind of ways, but if we really think about it, user calls up and everybody's complaining that the network's slow. Right? Where do we actually start? Where do we actually start? What's the first thing we do? Oh, you buy Kentik, of course. Think about it like even at your house. Right? You wanna see if you can get to the Internet. What's the first thing you're gonna do? Ping. Right? Yeah. Right? The fact we go to our caveman tools. Right? We go to our caveman tools. We run a ping. We run a traceroute, and then we open up the website and see if we can get to it. Those are, the three things where it it probably even before, like, a Kentik. Right? Everybody's complaining. So how can I ping the boxes? Is it up? Right? So we're gonna do a ping. Right? We're gonna do a trace route. Right? And then whatever the application is, we're gonna see if we can log into it. Right? Works for me. And then you kinda maybe have a jump box that you're like SSH ing into so you can actually see it maybe closer to where that client was that's complaining. So then you can kinda see from there instead of from where you are. But ultimately, you know, a lot of the times, you get the complaint and people say, yeah. The network was slow, and then you go and you run all this stuff and it works for you. And then you're just like, man, I really wish I invented that TCP time machine thing so I could go back in time to when the users were actually complaining, and then I could run these trace routes and I could see if there was an issue or something like that to correlate it. Right? So this is why I really like Kentik's approach with synthetics, and so I'm gonna kinda make the case for it. Right? So with synthetics, it kind of continues on the low friction implementation, right? It's a little higher friction for sure than just straight enough flow data because you do have to install agents. What's really nice that I've seen some customers do is if you're running like Arista or Cisco boxes, they allow you to install Docker containers on them now. So I've had a lot of customers have success for their installing our agent in Docker directly unlike risk to switch. You don't have to go have, you know, a server guy deploy this agent for you. But either way, it's, you know, it is higher friction, but it's not like you're tapping a line and putting these huge appliances everywhere. It's like a, you know, two CPU, two gig of RAM, little lightweight box, right, that you're installing. And if we kind of consider that we have NetFlow that we're collecting from everywhere and then we go to each of the branches, right, and we deploy these synthetic agents, right, we can create what we'll call mesh tests, which I'll show in our example. And I really like these for a few reasons, but one of which is if you do own Kentik, you get two point five million synthetic credits free. We give it to every customer, and it's usually enough where you can set up a mesh test. So it kinda goes along that same mentality of, like, it's something you already own, so just use it, kinda like Nefla data. Right? You just turn it on and you're using it. So I'm gonna cover mesh tests and I'll show you some other tests that you can do, but I really like these as just an elegant way to kind of get more information as to what's going on. And then the other cool thing is where before we were looking like it's kind of to the left or to the right, with synthetics, since we're running trace routes, that box that was just the internet before gets expanded out, right? And I could see where through the internet the traffic is theoretically going. Again, it's not the actual traffic, but synthetic traffic is going, right? And like I said, you can go beyond layers three and four with doing like page load tests, like Justin was saying, transaction tests, like Phil was kind of talking about, I'm gonna kinda stick to three and four for our our example of why isn't it very slow. And then this is why I'm like this was like a lights on moment for me when I started working at Kentik. Was like, oh my god. Synthetics makes so much sense because everybody was always asking us, I wanna see jitter, latency, and packet loss. And, like, when you blend that in with flow data, you could see all of the traffic, and then you can get these synthetic statistics. You kinda get a really, really good high level picture that will generally point you in the direction of the problem. Sometimes, like like with the Netflix link thing getting maxed out, you kinda get get lucky, quote unquote, it's like, oh, yeah, you get root cause right away. But synthetics is really good in my opinion for at least pointing you closer to the problem if everything's looking normal. Right? So let's let's do a demo. So here we are. We have our synthetics dashboard. Right? So what I'll do just because we have a little more time than I thought we have, I'll just go to agent management first, kinda show you what this looks like. So like Justin was saying, I'll just kinda highlight this. It's it's a little repetitive of what he was saying, but any customer that owns Kentik and wants to go completely no friction, not even installing agents, right? I don't have to set anything up. You have access to our global agent network, right? And this is a combination of agents that are deployed in different service providers, in AWS or Azure, or we have even, you know, Kentik employees everywhere that have taken Raspberry Pis and, you know, the agent supports ARM. So you just throw it on a Raspberry Pi and then they sync it up and they use it as a global agent, and Kentik gives them, like, a reimbursement for Internet or whatever. So we're pretty geographically dispersed, so it gives you some broadband type statistics, right? But you have access to all of these. And then you have private agents, which would be because again, global agents are on the internet, so they're not gonna be able to communicate your private IP space. So if you want that, kind of like in our slide example, you'd be deploying a private agent and set up procedures. You can do it on a Debian RPM or Docker. So those are our three deployment options. Docker is probably the most popular way I see people deploying it, but a lot of people do Linux as well. But either way, those are the deployment options. Right? So you have this combination of of different agents you can choose from. All right, so let's go to our example here. Actually, you know what I'll do real quick? What I'm gonna cover here is the mesh test. Again, this was our example of having an agent at different branches and then in the data center. And again, that map was small, right? So it could be within your Amazon infrastructure where you have different agents deployed in different VPCs, right? Or you're a multi cloud environment and you have agents deployed in all your different clouds, right? Network mesh test is a way by which you can come in, right, and then just simply select the agents that you want to connect. I'll select four for our example because we have four, right, and then you can give this test a name. I'm not gonna actually save it, but Brian is getting fired. Just kidding. Actually, let's change that, promote it. Just kidding. Okay. So here, what you'll see is this test, right? It's gonna use one point three million credits for these four agents. So you get two point five million for free. So you can kind of see, you can probably add a couple more agents, but the way the credit system works with Kentik is this number of agents, which really is number of tests. So here, this agent one is doing a test to two, three, and this one, right? So this agent is testing to these three, right? This one is testing to these three. So really it's four agents each doing three tests. So it's twelve tests, right? So I have twelve tests that are actually running. So the number of tests impacts credits and then the frequency also impacts credits, right? So if I run this every minute, I'm gonna use one point three million credits, but you'll kind of see if I take this and I go to like two minutes, right? Instead of one minute, I use half the amount, right? So that's what I was kind of saying with two point five million credits for free, you can set up a pretty big mesh and just use it as part of the product. And then if you really like it and you wanna start adding more tests, right, you can talk about adding, you know, synthetic, actually paying for more synthetics, but it's a really nice way to get some more data. And then these are, I could go into detail on them, but it's basically how we're building the standard deviations for what's normal and then creating alerts. And then you have different options for, do you wanna ping TCP or ICMP? You can pick a port. And then for trace route, you have options here where you can pick a port and things like that. If you wanna kind of mirror the port that some traffic would use, you can kind of gear it more towards that. So you have some knobs you can turn under the advanced options. If you have hierarchy in your network, you can cut down on the mesh size and presumably be a little bit smarter about it. Absolutely. Yep. That's a great point. So anyway, so that's just kind of the task, right, selecting a mesh test, but you can see you can do a bunch of other different tests within here, like page loads or transactions, which you can do more of the stack, but let's kinda go into my example of kinda why I like Before you leave that screen, do you have a a wizard where he can say these are my two data centers and then everything else tests to them? Hub and spoke test. Yep. Yeah. That would be more like a network grid test where you'd say, like, these are the endpoints and then test into that endpoint. Yeah. Instead of everything testing everything, you can certainly do that. Or you could do straight agent to agents. You can basically configure the test to work in any different way. What we're developing now is more like, you know, transaction test was the latest thing that we released, right, which is like log in to this website, put these credentials in, submit this form, and then follow each individual transaction. It's interesting to me because from a network engineering, which is where I've really worked, that gets a little more to DevOps, where it's kind of bleeding the lines into there a little bit. So you have to have a network engineer who who really wants to prove it's not the network and show this is what this is the actual thing. Right? But I think it's a really powerful tool. Anyway, I don't really know where I was going with that. Just trying to kill seventeen minutes here. Okay. So anyway, so we have this synthetic dashboard, and this is gonna be the actual demo that we're gonna jump into. Right? So we get an incident log, which Phil went over, which is basically, you know, if you have things that are exceeding your baseline and you can tune those, you can be more proactive, and everybody wants to be more proactive, but we still find ourselves troubleshooting. So then here are all the meshes that are set up. So more meshes you set up, you can break them. So it doesn't have to be one giant mesh, kinda like what Peter was saying. You can be pretty strategic. So you could have a bunch of different smaller meshes if you wanted a cloud mesh or an on prem mesh or however you wanna set it up. But what I really, really, really like about this is when you view a particular this should be drilling into an alarm or I'm just gonna drill into a mesh connection here. Right? What you're gonna see, dramatic pause, is you get your average latency packet loss and jitter, and you've actually seen this in the demos previously, right? But up top here, what Kentik's gonna do is it's within Kentik, you basically group your devices you're exporting flows from into sites, right? So we would know that this agent is at this site and this agent's at this site, and so then we're gonna overlay in the synthetic test, here's the actual net flow traffic that we see, the actual flow data that's happening in real time or minute by minute across the network. So here's, again, it's not much traffic in this example. It's like five bits per, really little amount of traffic in this example, but here's your actual traffic that's happening that we're seeing from NetFlow data, and then right below that, right, is the actual packet loss, jitter, and things. So here we can see we're getting a bunch of packet loss and it kind of shows the example of saying of TCP is gonna be really good at keeping a connection going, right? So you could look at NetFlow data, be seeing a bunch of packet loss somewhere that's happening, but it just kinda looks normal. It's like, yeah, it's all just kinda flowing along normally. I don't really see anything out of the ordinary, maybe a little dip here that could be a little sus. That's not really giving you the answer, but then when we're kinda trying to answer that question, is it the network or is it not the network? I can start to see, yeah, I'm definitely am actually seeing packet loss. And because this is time series, right, where I get that complaint and I go and I do a ping, a trace route and pull the website up and everything works, I can kind of go back in time and see this, but these are like tried and true technologies that I know we're all using every day, right? And we're just kind of putting it alongside flow data, which is just a super elegant way to do this. And then because we're running trace routes, you get the path view changing over time. So in this case, these devices are all on prem devices that we're actually getting NetFlow data from in this particular graph, And you can see, if you click into any of them or hover over them, because we're getting flow data, because we're getting SNMP data, when we try to zoom in, we'll see just at a glance, this is the CPU utilization of this device. This is the memory. This is the interface that traffic's coming in of. And then actually, this one we're not getting flow data from, it looks like, because there's none here, but yeah. So here's a better example where, and actually, that's a good point, right, where you could have packet loss happening on a device that you're not getting upload data from, right, and you would never see it. But in this case, we're collecting flows from this device. You can kinda see how it looks different where this says none, right, and here I can see, okay, so I can see the traffic coming inbound, I can see how much traffic it is, again, I get my CPU data, my memory data, and all that good stuff, and going a little bit to the right, I can see, all right, so in that particular site to site connection where I have packet loss happening, I can now see, okay, yes, I can confirm the users aren't just complaining. We actually do have a network issue here. So in this rare instance, it was the network, right? And I can see this is the device on the network where I'm seeing loss happening from my synthetic tests. So now I go from this kind of view of if I was only using flow data to now I can kind of get this additional, yeah, we're actually seeing this, and I can maybe SSH into this device, start looking at the logs in the device, and start to unravel what's actually happening here. And if this particular test wasn't just on network devices, right, I'd be able to see all the hops through the internet that the traffic was taking maybe to get into the data center so I could see off network stuff, right? Or like Justin was saying, testing from inside out as well. On that previous graph, the on the previous page Yep. The flow data looked awfully symmetrical. Yeah. I understand it's probably test data. But is that flow data that you're looking at on that page limited to just the data that pertains to that connection? Yeah. So that's kinda what I like, what you kinda think it is when you look at it, but it's not. Not just it's not filtered on just the test. Good. So Make sure it wasn't because that wouldn't make sense. It would be stupid. Yeah. But that's what I thought when I first looked at it. I like, well, that's kinda weird because and the only reason I thought that is because it's such a small amount of traffic. But now if we click in a view in a data explorer here, what you'll see it's actually doing is, right, it's adding the filters for it saying, show me all the traffic from this site to this site because one agent is at this site, one agent's at this site, right? And then we could say, show me the source destination IP addresses that are communicating with maybe the protocol and you can do like source and destination ports or something like that, right? So it's This is where that unbounded ability to do the queries like Phil was talking about earlier really comes into play when you're trying to do this troubleshooting because very quickly he had a few more dimensions, you'll get a lot more information about what's actually making up that traffic pattern. But like you said, it's just test data, so it's just something to kind of show, but it's not just the synthetic traffic. Sure. It would show you all of the traffic that's going through that. And not too bad, ten minutes and fifty seconds here. Anybody got any questions? From the time series data, how far back can you actually go? I'm not sure I'm gonna leave it. Close to forever. So there's two different data sources that we're actually storing here in Kentik. So whenever we ingest a flow record, we create two tables. One's called the fast data series, the other one's called the full data series. Full data series is as the name implies, it's everything. It's every record, it's got all the source and destination ports, protocols, all that type of data, all the enrichment we've been talking about today. That's all stored in the database by default for forty five days. So you can go back and look at that full dataset back forty five days. Then we have the fast dataset and we do a little bit of aggregation of that data. So we downsample it to thirty flows per second, we aggregate the ephemeral source ports, few things like that to compress it down into a smaller size on disk. That we store for, I believe we're doing one hundred and twenty days by default now. Now both those numbers, forty five and one hundred and twenty, that's the default licensing that comes with the packages, but we have skews on the price list that customers who wanna store it longer. We have some customers who wanna store their full data for a lot longer period of time. We have customers, especially when they're looking at like capacity planning and the trends like Phil was talking about earlier, where they want look at regressions over multiple quarters that will store the data for multiple years. About synthetic results? Synthetic results, I believe the last time I asked that question was unlimited, but that's because we've only had the synthetic product for a year and a half. I don't know what we're actually It's not a huge amount of data. Yeah. So it's like I I think it's nothing's unlimited. Right? But as long as you probably need it. But, yeah, like, you know, here, I can go back thirty days and see it for synthetics. Did that answer that question? Yeah. No. That was fine. I had a I had a VP at a previous company who had a habit of telling us that she had a horrible meeting experience, like, three days ago Exactly. Or, you know, a week ago or or something like that. So that's really all I was looking for was that, you know And I'm glad it's not, like, forty eight hours or something. It's more substantial. It's always the it's an interesting thing with with keeping data because if you ask, especially infosec people, like, how long you wanna keep data? It's forever. Right? But in reality, it's, can I come in on a Monday and see what happened on Friday? And I'm typically in pretty good shape, but with Kentik yeah. For compliance or stuff like Yeah. But you can definitely within Kentik purchase the ability to store data longer, but So a couple of years ago, I was able to, and it was purely by accident, sort of intercept a crack failure in a data center because I manually came across the sound level increasing in the data center. So the fans were failing, they were getting louder, temperature wasn't going up because it was still cooling, but the unit was having to work harder. So I was seeing just when I looked at the graph, said, oh, well, the decibels keep going Why is this happening? And then we caught it. Is there a way, just within the Kentik platform, because I saw, you know, there's SNMP monitoring in there and we're talking about trending and there was the camera thing earlier about, oh, it's one meg, now it's one and a half. Is that a problem? As long as it's pollable by SNMP, could you collect that data and alarm on that that change on sound? Oh, Corey's watching right now. We have a guy that's like a super SNMP guru who would probably love this question. I don't know. Think we talk to that. So Yeah. Today, we most of our SNMP metric collection is around interface metadata. So, like, names, descriptions, capacities, that kind of stuff. As you saw in a few of the demos, we collect some CPU and memory as well, but we're not a full SNMP metrics platform today. That's actually one of the things that we're investing in right now is trying to build out a much more broader SNMP metric platform because you're right, those type of failures can have an can have an effect on the performance of the network or can give you a leading indicator when you're gonna have a failure in your data center. So it's not something that we can do today in the platform. It's definitely an area that we've got time we're investing in doing more with the SNMP data. For sure, once we have that data in the system, we already have the underlying algorithms to be able to look at, are we seeing fan speed? Are we seeing the increase in fan speed over time starting to go from, using Phil's example, about a one meg on an interface to one and a half meg on an interface that goes from ten percent fan speed to twenty to fifty to seventy five percent, we can give you an insight saying, hey, you know, here's a fan that's starting to spin up faster over time, right? That's actually not that difficult to do by just looking at the trend in the data. It doesn't require a lot of machine learning. When you start talking about predicting failure, optical power levels. Yeah, it's funny you say that one of our, actually one of our customers, that's one thing they've brought up a number of times. It's not something we've looked at historically as the optical layer, the physical layer, we're usually focused more on, you know, ethernet and IP layer in the network stack, but for sure, are a lot of potential use cases for looking at light levels and decreasing light levels and trying to predict interface failures based on bit rate errors even, for sure. Yeah, I mean, you already have that intelligence or a way to, I mean, obviously it would be different if we're looking at light levels, you're looking at decibel readings, but if you already have the intelligence for how to do that on just the data flow side, I could definitely see that being applicable in other places. Yep, absolutely. So besides the routers and switches, are there anything else you're collecting SNMP data from like UPSs, servers, anything else? No, not today. And again, that's part of why we got this initiative to work on a broader, I'll call it an SMP metrics platform, so we can be able to capture from a lot of the other stuff that would be, I'll call it surrounding infrastructure besides just the networking stack itself. And so then custom OIDs for the routers and switches are on that list too, I imagine. Yep, that's part of what our Kentik's Labs team is working on right now actually is additional MIBs and additional pieces of the infrastructure that surround the networking. Yeah. Yep. Not that we want to be using SNMP forever, but No. It's Some things have worked really well for a long time. I didn't realize how much you could do with it until I worked at Kentik because we have a guy who came from an SNMP company and he's such an advocate for it. And I was like, well, isn't it just like interface counters? He's like, son. There's so much you can do with it. But again, because of that complicated as hell, though. I mean, because he's You can you can make changes with it too, which, you know, we had systems that used to do that.

Presented by Brian Davenport, Solutions Engineer

Brian Davenport explains that flow data is a very powerful and established tool to gain visibility across your network. Coupled with enrichment, Kentik Synthetics, and consistent innovation by network vendors, network engineers can gain macro visibility of network activity and also drill down into the very granular aspects of routing, QoS, security concerns, and so on.