In this episode of Telemetry Now, Justin Ryburn, the VP of Global Solutions Engineering at Kentik, joins us to talk discuss why flow data like NetFlow, IPFIX, and sFlow are still one of the best visibility tools in a network engineer's toolbox.
Justin Ryburn is the VP of Global Solutions Engineering at network observability company Kentik. He has 25 years of experience in network operations, engineering, pre-sales, and pre-sales leadership with service providers and vendors. Justin contributed content to Cyber Forensics (Auerbach Publishing, 2007) and authored Day One: Deploying BGP FlowSpec (Juniper, 2015). He has also spoken at numerous industry conferences on the topics of network monitoring and security.Connect with Justin on LinkedIn
Phil Gervasi: This is Telemetry Now, and I'm your host, Phil Gervasi. And joining me today is Justin Ryburn, the vice president of global solutions engineering at Kentik and an avid traveler, and from what I can tell, a very opinionated person when it comes to how flow data is used by the networking industry right now. In fact, that's what today's episode is all about. So in spite of all the great new telemetry that we have now, cloud logs, streaming, and so on, flow data still matters, and according to Justin, is probably still one of the best and most valuable sources of data that we have. And the problem then isn't the data, it's how it's being used by most engineers. Provocative opinion for sure, so let's get started. So Justin, it's great to have you here today, and I really appreciate that you took some time out of your busy schedule to nerd out with us.
Justin Ryburn: Happy to be here. Thanks for having me, Phil.
Phil Gervasi: Awesome. So before we get started, can you explain just a little bit about your background and what you do now?
Justin Ryburn: Sure. Yeah. So I've been in technology for about 25 years. I started my career building and operating large surf rider networks during the internet heyday, as I like to call it. And then I discovered a love for pre- sales, and I spent about 10 years on the vendor side at Juniper Networks. As you said, I currently run the solutions engineering team globally here at Kentik, and I've been doing that for about the last five to five and a half years.
Phil Gervasi: Yeah, that's pretty cool. I had kind of a similar journey. I went from being a traditional in the field engineer for many years, and then into solutions engineering. My title was solutions architect. But I really love that, being both still close to the tech, and then getting to work directly with customers, discussing design and how to solve problems. I really enjoy that. But I do want to ask you now, you mentioned that you love travel. So how angry were you when you lost all of your airline and hotel points during COVID?
Justin Ryburn: Well, I think I was fortunate enough that I had status with my preferred airline and hotel, so I didn't really lose a lot of my status or my miles for the most part. What I will say though is the hiatus was tough. One of the things I love about my job is that I get to travel all over the world. I get to meet with customers as part of my day job. And then my wife and my family, we enjoy traveling in our personal time as well. It was a really hard adjustment when that all stopped overnight. Being stuck I the house in quarantine, my wife and I both working from home, not going out on the road, not seeing new places, not being out in front of people and talking to customers, it was definitely a big change. I will say on the positive side though, the forced time being at home with the family and being able to slow down a little bit and spend some nice quality family time together was a nice change and I really enjoyed that aspect of it.
Phil Gervasi: Yeah. I remember that too. I was traveling a lot for work. I was in pre sales at the time, and went from traveling almost every week or every other week to zero. And I really feel like we went from zero to 100 again recently. Right? There was very little...
Justin Ryburn: It sure feels like it, being back out on the road quite a bit these days.
Phil Gervasi: So we've both been network engineers in one way, shape, or form, pre sales, you mentioned, post sales. I've been out there in data centers, schlepping routers in my life, working for bars, vendors. And I think we both have some experience in internal corporate IT as well, so I know that we share some experience using a lot of those network visibility tools that have been out for years. Right? And a lot of them were focused pretty much on flows and SNMP, one or the other, maybe both, some combination of both. But recently, it feels like that's almost old school. You know what I mean? Passe. Have you sensed something like that in the industry as well?
Justin Ryburn: Yeah, I have. And this is honestly one of the reasons that I joined Kentik. Obviously, we've had both SNMP and flow collectors for many years, like you said. And on the surface, these technologies don't really seem all that exciting. Seems like it's a solved problem, but the challenge that we have as an industry historically I think is that this telemetry data was collected on these appliances. And these appliances were limited by CPU memory disk. I mean you could only cram so much of that into a single server appliance. So then the approach that the vendors who were manufacturing these appliances took was that they had to roll up the data, they had to aggregate it. They had to somehow make some engineering trade off to be able to fit within that sheet metal. And what we're seeing from vendors today is they're taking a much more modern approach, if I could use a buzz word, I call it big data. And what I really mean by that is they're ingesting and storing large volumes of data and clustering these systems across multiple sheet metal servers. So it gives you a lot larger volume of data, but it also allows you to do a lot more interesting things with that data, and I think that's what we're seeing from the industry from a lot of the vendors, is that we're able to leverage a lot of this newer technology that exists in clustered big data systems to solve this telemetry problem.
Phil Gervasi: So it sounds like that in recent years, one of the reasons that flow data kind of became passe is specifically because we were limited by a compute and the storage that we were able to apply to collecting that kind of telemetry and then doing something with it. And then I guess maybe that just became the culture. Right? It just became, it's not as useful, so let's go look at streaming now.
Justin Ryburn: Yeah. I mean, I think those technologies, streaming telemetry is useful, but I think it solves a very different problem. When I think of streaming telemetry, and I think a lot of the leaders in our industry have made some great fortes into being able to get more of a push model with streaming telemetry than the pull model we have with SNMP. It's more scalable. It solves a lot of problems. But it's still for the most part looking at the same data set, and that's things like interface utilization, Q depths, drops, retransmits of data on a particular interface, CPU memory. All that stuff is very valuable and it is very interesting, and is necessary for a network operator to feel like they know what's going on in the network. But what's really interesting about flow data is it really gives you more information about the make up of what that traffic is. One story I like to tell is when I worked early in my career for a service provider, we started off just collecting interface metric in a platform called MRTG, which I'm assuming a lot of listeners have probably played with that or something similar to that, and that was awesome. I mean, before that, we really didn't have a good way to see graphs of our traffic. We had to log into the devices and look at the interfaces themselves to see how much traffic was on them, so it was definitely a step forward from what we had before that. But I can remember spending long hours troubleshooting denial of service attacks when that first started happening to our service provider and trying to use MRTG graphs to trace back to the source where that traffic was heading to our network and then applying ACL in the inbound interface to accept that traffic, log it, dig through the logs manually, try and figure out what the makeup of the traffic was. And then ultimately, being able to actually block the attack, but it took hours. And then we started, we put a flow based product into our network, started collecting flow, and did the same thing using flow data. And I just remember thinking, " Man, this is what I've been missing," the ability to literally see the attack come in, see what made up the attack, what IP addresses it came in from, what IP addresses it was going to, parts and protocols, the real deep level of knowledge of what the attack made up made it so much easier to do that troubleshooting and that mitigation of the attack. It was a night and day difference.
Phil Gervasi: Yeah. So you're able to see what the traffic is made up of, as opposed to just interface statistics, like you said, so you're getting a different dimension.
Justin Ryburn: Absolutely. Yeah.
Phil Gervasi: I've been calling that over the past couple years working in visibility and observability, diversity of visibility data, or diversions of data, whatever you want to call it. But really, kind of that classic picture of the dog, I don't know if you've seen it in presentations, where it's a picture of a dog with a regular camera, and you see the dog, this dalmatian, whatever. And then the next picture is an X- ray, and you can see the dog swallowed some keys. And then the next picture, it's an MRI, and so you see the muscular skeletal system. So each type of data provides you a different view as to what's going on, and flow is unique in that way.
Justin Ryburn: Yeah. I love that analogy, by the way.
Phil Gervasi: What's that?
Justin Ryburn: I love that analogy, by the way. I think because if I think about that, you're starting off with a high level view of the dog. You see it from the outside. And then you kind of peel back the layers of the onion, if you will, or you get deeper into it. And I really think that's what flow does for us. Right? We can get a high level of what's going through our interfaces by looking at SNMP. But by looking at flow data, you really figure out what makes up that traffic that's on that interface.
Phil Gervasi: I think I would still say that flow data is still kind of macro because it's not like it's packets. You're not doing deep packet inspection and looking at payloads and stuff, so it's still kind of macro in that sense. In fact, it does make me wonder. I mean, if we have that storage capacity now, and we have the access to compute, whether it's I'm going to say virtually in air quotes, virtually unlimited, because we can just access it in the cloud. Right? Why don't we just do everything with packets?
Justin Ryburn: Well, I think there's a couple things that I see with packets when talking to customers that we have here. It becomes expensive to instrument. Right? To do proper deep packet inspection, large enterprises and definitely when you get into service providers and the volume of data that they're dealing with on the network, to put a deep packet inspection appliance in that can actually capture that becomes very expensive. It becomes cost prohibitive, quite honestly, so that's one thing. And then the second thing is: How often do you really need every single packet? I mean, flow gives you enough data typically to know the makeup of your traffic without having to know every single packet in the whole payload, because mostly what you care about is the stuff that's in the headers anyway, which is really what flow is analyzing is the header information. You don't really need to know the payload in most cases. And actually, I guess if I think that through a little more, one of the comments we hear a lot is not having the payload is a benefit because we no longer have to worry about a lot of security things. Right?
Phil Gervasi: Sure.
Justin Ryburn: If I do DPI and I'm storing packets, and that packet has financial transactions in it, or it has medical record information in it, then I have HIPAA or PCI, I'm sorry, the other way around, I have PCI or HIPAA compliance things to worry about, whereas if I'm collecting flow and all I have is the header information, I don't have to worry about those security concerns because I don't have the payload. Nobody can reverse engineer that and figure out, put the packets back together.
Phil Gervasi: Although, I do remember back in the day, doing a lab when I did the CCMP voice, I don't know what they call it now. I think they call it collab, not voice, whatever it's called.
Justin Ryburn: I'm not sure.
Phil Gervasi: But I remember doing that, and the old wire shark packet capture of the voice conversation, and then replaying it back as a wave file, that is pretty cool.
Justin Ryburn: Oh, yeah.
Phil Gervasi: I've got to say that's pretty much the extent of my experience actually needing or acquiring packet level visibility. Well, I take it back because that wasn't even visibility. That was just a fun thing to do, so I hear what you're saying.
Justin Ryburn: That was more of a troubleshooting thing. And that's where I really think packet and DPI really come into play at scale, is when you're doing that troubleshooting, you're doing that deep troubleshooting. You really do need to see the packets. You need to see the entire transaction. You need to be able to put it back together, and like you said, be able to play the call back. I think that's where they still are valuable.
Phil Gervasi: Why do you think that so many network visibility vendors over the past few years or more than a few years, why do you think that it focused so much on other types of telemetry, kind of in more recent days? I mean, for a little while, in all the presentations that I saw online and at the different events, everything was streaming telemetry. Right? I remember doing screen scraping to troubleshoot issues. I remember doing wire shark captures and all this stuff. But everything was just about streaming for a while. Why do you think visibility vendors have focused on that and shied away from flow?
Justin Ryburn: Yeah, it's an interesting question. I think again, streaming telemetry is definitely an improvement over SNMP for those types of data. Right? Being able to get at a much larger volume, being able to get it much faster, so that my data points are more accurate because I'm capturing it faster. So I think that's where a lot of the industry has focused on is the similar types of data they were getting are SNMP and just getting it in a different way, so that it was much more scalable. I think a lot of vendors that have added flow have not really focused on it. They haven't really focused their RND on it, and so they just kind of bolted it onto the side of an existing solution that they have, and therefore, the customers of those vendors, the users of those products have been led to believe that flow doesn't really provide that much additional value, and what is unfortunate about that is that there is a lot of really interesting information in flow data. It's just that it's never been focused on by some of these vendors. And they've never really unlocked the power of it by building a scalable solution really focusing on: What could I do with that data?
Phil Gervasi: Okay. So then what do you think... Not what do you think, but what do you know? What kind of visibility can we get from flow that's unique to flow data that we really can't get from other types of telemetry?
Justin Ryburn: Yeah, like I said, it's really the makeup of the traffic. I mean, if you were to go and really get nerdy and look at the specs for IPFIX, or sFlow, or one of the flow protocols, you'll see there's a lot of really interesting information there. So you've got what I call the five troubles, you got the source IP, desk IP, source port, desk port, protocol. There's a lot of other things about the incoming interface, the outgoing interface. A lot of times, it'll have source and destination AS numbers in there that it gets from the routing tables on a router, for example.
Phil Gervasi: Oh, yeah. That's pretty cool.
Justin Ryburn: Yeah. And a lot of the, over the last, at least in the five years that I've been at Kentik, there's definitely been a lot of vendors who have added their own fields into IPFIX, so IPFIX gives you the ability to sort of add onto it. Right? And so if you think of it kind of directionally similar as an enterprise specific mid for SNMP, it's similar to that in IPFIX, where for example, a lot of our SD- WAN vendors that the enterprises are using are now exporting data in their flow records that talk about the tunnel and the application and the users. I mean, there's a lot of additional rich information that's coming directly in those flow records that tell you about that traffic and what traffic it is, who's consuming it. There's a lot of great information you just can't...
Phil Gervasi: So then in 2022, it's still a very useful method for both ongoing monitoring, but also for real time root cause analysis, troubleshooting, would you say?
Justin Ryburn: Sure. Yep, absolutely. Yeah.
Phil Gervasi: Yeah. So still a valuable tool in I guess a traditional network engineer's toolbox. Right?
Justin Ryburn: Oh, absolutely. For sure. I feel like we still to some degree just scratched the surface of what you can do with it. And a lot of it is not... I mean, there's a lot of great things in the flow records themselves. But where it really becomes powerful is when you figure out: What can I correlate it with? What can I enrich it with? Right? Just some examples that we do her at Kentik, a lot of other people could do this as well, but adding geo tagging to it. What geo does the source IP belong to? What geo does the destination IP belong to? Threat feeds, there are a lot of security companies out there that publish IP reputation data, so that allows you to then say, " Okay, I have traffic coming from this IP, or I have traffic going to that IP." Are those IP addresses known bad actors? Are they known compromised IP addresses? And so from a security angle, it gets you the ability to say, " Okay, I can now proactively get alerted, get notified when I have traffic that starts talking to some known compromised host." These are just a few examples I think of off the top of my head that I've seen customers do. But I think we've probably only scratched the surface of things you could do when you take the data that's in flow, and then like I said, enrich it or correlate it with some of these other pieces of data to solve other more interesting use cases.
Phil Gervasi: Yeah. And that was my original thought starting this episode off. What is it about flow data that we're under utilizing? Why is it that we're stopping when there's so much more? And you highlighted a couple things. Now that we have the ability to... First of all, it's very extensible, so it's more than just looking at 27% of my network is HTTP, and then stopping, it's a lot more than that, but also that we have the storage and compute available to us today to do so much more than we ever could, and therefore, it's still very, very useful. I think the first flow protocols were out in the mid 90s or so, so yeah, I get it.
Justin Ryburn: It's been around for a long time.
Phil Gervasi: Yeah. But then so has TCPIP. Do we throw that out too?
Justin Ryburn: Exactly.
Phil Gervasi: It's kind of silly. So anyway, Justin, this has been a really great episode talking about how important flow data still really is today. And I really do appreciate your perspective on the industry too. That's valuable to me, so thank you for joining me today. And before we go, where can folks find you online to ask questions, maybe get in touch with you in general?
Justin Ryburn: Sure. So you can find me on Twitter @ Justinryburn. That is Justin, R- Y- B- U- R- N, all one word. I'm much more active these days on LinkedIn, if you want to search for my name there and follow me there, that's probably the easiest.
Phil Gervasi: Great you can find me on Twitter @ network_phil. You can search my name, Phillip Gervasi, on LinkedIn, my blog, networkphil.com. And if you're interested in hearing more Telemetry Now episodes, check out our website, tentik. com/telemetrynow. And please feel free to let us know if there are any topics that you'd like to hear about, or if you'd like to be a guest on the show. So until next time, thanks and bye- bye.
Do you dread forgetting to use the “add” command on a trunk port? Do you grit your teeth when the coffee maker isn't working, and everyone says, “It’s the network’s fault?” Do you like to blame DNS for everything because you know deep down, in the bottom of your heart, it probably is DNS?
Well, you're in the right place! Telemetry Now is the podcast for you!
Tune in and let the packets wash over you as host Phil Gervasi and his expert guests talk networking, network engineering and related careers, emerging technologies, and more.