Webinar

The State of Routing Security: Progress, Challenges, and Measurement

Hello, everyone. Welcome, and thank you for joining us for this webinar brought to you by Kentik and Capacity on the state of routing security progress, challenges, and measurement. My name is Ben. I'm from Capacity Media, and I'll be hosting this webinar. And I'd like to introduce the man of the hour, Doug Madory, director of Internet analysis at Kentik. Before I pass over to Doug who'll discuss BGP routing security an in-depth look at the current state of routing security, both progress made and work yet to be done, I'd like to just go through a couple of the, ground rules before we kick off. So Doug will take it from, from here after this, but you'll have fifty minutes for in-depth Q&A at the end. But please keep those questions coming in throughout the webinar, and we'll answer all those at the end. In the unlikely event that you have any technical difficulties at all, please just refresh your window. It should bring you straight back, and it should correct it. If not, drop us a line. We'll be able to sort you. With that, I'll pass you over to the man, the myth, the legend. That is Doug. Over to you, sir. Thanks, Ben. Let's see. Let me get the slides up. Or the see if we get the slides. There we go. Thanks, Tim. Terrific. Thanks. Alright. So my name is Doug Midori. I'm the director of Internet analysis at Kentik, and this is a discussion of the state of routing security, something that I, spent a lot of time, analyzing and writing about. So, one of the things I've done in the last couple of years is looking at ROV adoption statistics. So if you've seen my talks in the last couple of weeks or couple of years, a little bit of this may be, repeats, but I'll I'll kinda quickly go through this, to get a sense of some of the new material. But we have to kinda talk about RPI ROV, so route origin validation. And for those who are new to the concept, this is a mechanism, in the routing PKI that has two steps for it to work. We have to have, address owners, create ROAs, route origin authorizations, typically through the RIRs, that define the correct origins for the address space as well as max previous length, which we'll talk about. That's step one. The ROAs have to, we have to have that information, that ground truth, to compare against. And then step two is that we have to have networks, ASs, out on the Internet that reject invalid routes, invalid being a a route that appears and either has the, the origin and the AS path does not match match the origin and the ROA or, max perfect length issue as well. So we need both steps to work in order to identify what are the bad routes that could cause disruption and, reject them, suppressing them from propagation, and hopefully avoiding any of the, disruption that can that can occur. So that's the that's the basic concept here. And, just a quick aside about, some of the my take on how to how I've been, looking at this problem. So, you know, ROV, all of the all this routing security stuff is about protecting traffic. So why not look at it from the perspective of traffic? I'm a BGP analyst, and when I came to Kentik back in twenty twenty the twenty twenty, one thing that was super cool was that we have, a very large amount of NetFlow data to work with. That's something I'd never had before to understand what's the impact on, you know, Internet traffic. So Kentik, for those who you aren't familiar with, we are, primarily known as a NetFlow analytics company. We have hundreds of customers around the world that send us live feeds of, NetFlow, from their routers. We process this, and analyze it in a number of ways and but we have by virtue of that we have a slice of the traffic we don't have all knowledge we have a slice of the traffic passing through the internet at any given time in our systems, and we have, about half of those customers have opted in to allow that data to be used in aggregate analysis, some of the stuff that we'll see here where we don't know, you know, where what particular customer it came from, but we, it helps, guide our understanding of, RPKI adoption. I will mention that, you know, that this, the dataset, is subject to biases as nearly every dataset is. You know, we are a US company. We have a lot of US customers, that does, may it it could contribute to a SKU. Companies that use our services are typically a little more sophisticated, a little bit more resourced. That too can have a have a bigger difference. We have a lot of good relationships with network service providers, content providers, the folk the companies that make up the ecosystem of the Internet, world as well as, enterprises, which are, you know, just, some other large large network, large company that has a big footprint on the Internet. So having said all that, last last point I'll make here is that, we started integrating, RPI into the NetFlow analytics, at at the request of, Joe Snyder's, who's a frequent collaborator of mine, where at the time in the early phases of RPI adoption, a concern that was, getting aired was that people didn't wanna reject invalid routes, thinking that maybe they're gonna lose traffic that's, important that, that their customers are gonna be upset about. And, and so the best way to answer that question is just to throw it in that flow and say what what what would you lose if you and it and it turned out that it became inconsequential. I feel like that's a settled matter, in general. I don't hear that, that concern so much anymore. Partly, it was due to tools like, the Kentik, and there was other, NetFlow tools that tried to answer this question. If you were to start rejecting invalids, exactly what would you lose? Often, it's it's not anything that you you care about, so you you should go ahead and reject invalids. What's neat is, yeah, I don't know that that that that use case gets used that much anymore, but, I'm able to use this then to say, alright. You know, give it give me the give me the global view of traffic going to routes with ROAS, and, and we'll we'll see some of that here. And just to let you know, like, as we as, the way the system works is as each NetFlow record is, ingested at the time of its, ingestion, the, the router, that prides it, We we annotate each record with a lot of, attributes, and one of them is, what what would, both the source and the, destination IPs, if they from the perspective of the router generating the the NetFlow record, would those routes have been, valid or invalid or, you know, what have you? That's where the data comes from. Alright, so with all that out of the way let me move on to the next slide here. So, now we're back to ROA progress, ROA creation progress. So this is the step one of that step one step two slide from the beginning of the presentation. And as you may know or may not, back in May, last year we crossed this milestone of, that more than fifty percent of v4 routes have ROAS, in their global routing table. This is using the NIST RPI monitor, tool as a, a measure. A graph is from their website. You know, it's it's increased, a bit, since then, although it may be that may be slowing. V six achieved it previous fall that says fall twenty twenty two, but it should be fall twenty twenty three. I think v six is a little ahead of the game, partly because it doesn't have some of the legacy, the legacy address space that exists in v four. In fact, there's kinda no legacy in v six being the the newer addressing scheme. And, and so then then I that's just purely from BGB counting routes, which I think, you know, it's, it's easy to understand, that mechanism, but I think it's also, kind of, gives short shrift to the progress that we're making. And so my take back in twenty twenty two when I've start started getting into this was saying, oh, what can we could we look at this from a perspective, from a traffic perspective? Because, it turned out that we are actually farther ahead than we thought. At that time, we were at, like, just one third of b two b routes had ROAS. And, but the the big insight was that we're seeing over half of the traffic in this per second going to routes with ROAS, meaning that we're, you know, we've actually made more progress than we thought. There's more traffic that's eligible for the protection the ROV would provide. That was due to, our kind of deployments at a bunch of content providers and eyeball networks, entities that push a lot more traffic than their, you know, route counts, would, otherwise suggest. And so that that trend has continued, some some recent numbers back in October twenty twenty four. There it hasn't changed that much since then, but, you know, we at that point, now we're at, over half the routes have ROAS, and we're approaching three three quarters of the traffic in bits per second going to routes with ROAS. Like I said, it hasn't, I've run this recently, and I think we're still hovering around seventy four percent. So I I had, you know, wondered aloud, many times. Like, at one point, are we gonna reach some sort of plateau where all the easy gains are have been made, and now we're into the the tougher stuff, tougher gains. So those are, as I mentioned, we've got legacy address space like the US Department of Defense announces a lot of address space and creates no ROAS. They I'm not sure where the the status is with their relationship with Aaron, but they they don't have to go through Aaron. They they own their, this is the North American RAR, and, they yeah. That it's a complicating factor for them to create ROAs by not having, being this legacy address space. And, and there's other examples of that around the Internet. So I think those those are, you know, tougher nuts to crack that we'll have to, address if we're gonna keep making progress here. But I would say, you know, seventy four percent, about three quarters of the traffic. We've made a lot of progress. I think this is a system that is now, you know, achieved escape velocity here, and it's creating a benefit for the for the entire Internet. So then on the flip side here on that that second side of step two, the re rejecting of of invalids, you know, I I decided to take a look at this from, just a route propagation, standpoint. There's been a a number of efforts out there to try to identify, remotely. I think it's a very challenging thing to do reliably, to figure out what ASs are rejecting and valid and which ones are not. There's a lot of lot of factors that go into that. So I kind of punted on that question and just said, alright. Well, how about, what's the what's the macro what's the net effect of, of a route being invalid versus valid? And so then I started looking at this back in twenty twenty two, this graph on the left, maybe you've seen this, where, we were looking at just how much, we look at the global routing table for v four and v six and, route views, dataset dump for one day. You know, how how far do these routes propagate? So the x axis is number of, BGP sources in route views. So this is basically a measure of propagation going across the x axis. So you have this peak, that appears very similarly in both, v four and v six, both not found and invalid, where those are the routes that are globally routed. And, and then at the bottom in green, we have, RPI invalid routes. As I mentioned, there's, at any given time, there's a lot of persistently invalid, routes, and, we can use those to measure, just how far though they propagate. But I I had kind of been encountering this just in my day to day beach b analysis work, that a route that's invalid just doesn't go very far, around the Internet. And this is, trying to capture that at a, Internet wide scale. You know, this distribution gets pushed over to the left, and I would argue that it's it's pushed it's been pushed even farther over to the left, since this, this analysis was first performed. And so that's, where we're looking at the graph on the right where I was trying to come up with ways to measure this through time, of, like, that that population of invalid routes, how far are they propagating. Again, it they you want them to propagate less, and you want that propagation to be decreasing through time as more networks are rejecting invalids. So, the graph on the right, there's a couple different ways. This was one, way. They all kinda show the, same thing. Here, we're looking at, a bunch of route beacons. So RIPE announces, a variety of beacons, not just this RPI stuff, but, they have beacons that turn on and off at different intervals. It's a lot of stuff for for testing and research, but they also have these routes that are intentionally valid and intentionally invalid just to measure, or to measure a variety of things. In this case, I'm gonna use it to measure, propagation. How far do those routes propagate in the, route views dataset? And then just to corroborate that, Job, Snyder's my frequent collaborator. He operates his own beacons for the same purpose, so, you know, we throw them in here. They all kinda exhibit a similar behavior where, you know, the blue line along the top and all three graphs is kinda ordering up as the the dataset, route use dataset, slowly grows over time. And then, the lines on the bottom are ordering down as, route, invalid routes are propagating less through time as a result of more ASs rejecting invalids. The biggest event there was last year. We had Zio, start rejecting invalids around April of twenty twenty four. And so that leaves us just with one, one transit free network. Telecom Italia is the only, major tier one that is not rejecting invalids, and I'm very hopeful they'll do that soon because then that would create just a a universal cap there at the top of the Internet that when we have, you know, origination leaks or other types of, BGP mishaps, they really can't go very far, and they can't disrupt a lot of traffic. And we've, made a lot of progress on that, that front thus far. But, it's nice to have a, a live demo here. We'll call it a live demo. This was, an incident that took place in North Korea just a couple of weeks ago where, you know, last year, our live demo was the Aranj Espana outage. If you recall that the incident was where a hacker got into the right bank account of Aranj Espana and wanted to create do some vandalism, so they created a a bunch of ROAs that, intentionally, were were wrong, so they had the wrong origin a s and rendered a lot of the routes of or Raj Espana invalid, causing this national outage. The outage could only happen because, you have a lot of networks rejecting invalids, is the way I looked at it, the glass half full, from the outage. And here we have another case here. Another something that we hopefully, we can draw some learn lessons learned from, North Korea here. So officially, they just have one AS. It's been the case like that for the last, you know, fifteen something years. It announces four slash twenty fours. This is an incredibly small, you know, network to represent an entire country. KP is the two letter designation for, North Korea. They published one ROA with a maximum prefix length of slash twenty two. Okay. So that's gonna invalidate those four slash twenty fours because twenty four is gonna be longer than that maximum length. Of course, the routes become invalid, propagation drops off dramatically, they suffer an outage. I'd be fascinated to learn if anybody in this audience was actually affected by that. I think it's a bit more of a academic, development. About sixteen hours later, they then published a correction where they published, ROAs with, new ROAs with, slash twenty fours. But, you know, maybe you can, avoid this mistake, learn from North Korea's, error, and not have this happen to you. So that gave them, the at the end of it, a, a hundred percent ROA coverage, which is, pretty good. There's actually quite a few countries in Asia that have very high ROA coverage. Their neighbors to the south, South Korea, KR, is only at point two percent. They are one of the lowest. So maybe this could be something that prods them into action of, greater adoption of, ROV. This is a little visualization I made. This is one of the routes coming out of ROA. The text may be a little small here, but, this is the one. So North Korea of up until, I guess, let's see, about eight years ago, only went through China. They had one route that goes through Russia, because they've got a small border with Russia. So this is the one route that gets, announced through TransTelecom. And what we're looking at on the left is period of this this collapse, this, routing collapse as the route becomes invalidated. This this graph, the colors are the from the perspective of the all the BGB sources around the Internet going to, in this case, TransTelecom, also known as t TTK. What was the penultimate network? So how do they get to TTK to get to North Korea? And you see this kinda collapsing. And at that time, it was, you know, one seventy four Kogen, thirty four ninety one PCW, sixty four, sixty one Dio. They all, you know, drop out when this becomes invalid. Sixteen hours later, thanks to SpongeBob, we have, a a restoration, and, and then it kinda comes back. It's a little interesting to get see this kinda staggering, of how the different upstreams they have a different set of upstreams on the way back, which is interesting. I mean, it's this is upstreams of, of TTK. I'm not sure what that signifies, but, you know, it's it's important to recognize that when a a route changes state from invalid to valid and back again, you know, the row is row is take a certain amount of time to propagate through the RPI system when you publish something new. On top of that, different networks, update their, RPI information at some sort of, frequency. It's not instantaneous. So some of that shows, shows up here. We have kind of, providers kinda coming, accepting these routes at different times because that's when their system refreshed, data and now would allow this route to propagate. But it's kind of interesting to see, this, kind of fluid dynamic, pattern in, in routing. Something I I personally like about PGP. So lessons learned. Definitely review row is before you publish them, especially the max max length. That seems to be a a common area of, misunderstanding. So and, I've mentioned that, you know, at any given time, there's a whole bunch of persistently invalid routes, that are in, in circulation. If you're to go and look at what caused them to be invalid, a lot of them, I think the majority case is, mismatches in this max length thing. And I can only my theory is that probably when the row was created, it matched the routing, and it was probably fine. Then later, the router was changed, and somebody didn't go back and and, and fix, the ROA. And and somehow the network is not aware or not, maybe they have a covering prefix or something. Something is, otherwise, you'd you'd be so offline that would you have to, remediate this, but, there's networks that just they are persistently invalid for a long period of time. That max length is an optional field that's not well understood. In fact, there was a best common practice published a couple years ago that recommended leaving it out, just for this, this issue of, people making mistakes. And if you leave it off, you're only gonna match on the origin, the the AS origin, and not the max prefix length. So, you know, something to consider. And, as I mentioned, you know, once this issue was resolved, North Korea had a hundred percent ROA, coverage. South Korea is only at point two percent. And if you were to dig into that point two percent, you would see that this is actually not even South Korean providers. These are like it's like Amazon and Google. It's cloud cloud providers that have, operations in, South Korea. That's where the ROAs are. It's, there's almost no native, ROA creation for South Korea. And I believe it has to do with the process, there in that country where KR NIC controls the process. And, I'm I'm hopeful they're they're working on a solution there because they, be great for them to, catch up in that regard. Alright. So let's, shift gears here. So, you know, ROV RPKI ROV is a a more modern, newer technology here. Now we're gonna talk about something that's older but is still super important. This is IRR based route filtering using AS sets. So, just a point of clarification here before we get started that we're talking about AS sets and not AS sets. So, BGP, unfortunately, is one of those, in BGP vernacular, there are multiple, we'll call them overloaded terms. The same thing means this means two different things in very subtle different contexts. So, so this is the IRR ASR route object. This is a record in the IRR database, defines a group of ASNs used to try to simplify routing policies. This is not the BGP ASR construct, which is presently slated for deprecation. I don't know which one's more, widely known. Maybe the maybe the IRR one is widely known, but I I think a lot of people also know the BGP ASET, thing. So I think this is kinda required clarification here in this conversation. But, you know, if you encounter, the BGP AS set in a an announcement, you'll see, like, the little curly braces in a AS path. And so what this is a was an attempt to do was to consolidate a bunch of different BGP messages where a whole bunch of prefixes, get announced in a single message from a bunch bunch of different origins, and all the origins are just contained kinda tossed into this bucket in this, in these curly braces. It it breaks any kind of association between what origin is originating, what prefix, so it breaks any kind of ROV, or or a variety of other routing security mechanisms. So it's something that, is seen less and less, although it's it is still out there. But, anyway, so we're talking about the IRR ASF. That's this conversation. And we'll start with the BGB leak. So there was actually a more recent one with Vauxiliary, the DDoS mitigation provider that I wrote up a couple weeks ago, but got the slides here for this one, from, March, twenty twenty four. It's essentially the same, same situation. In this case, you've got, Russian mobile operator MTS mistakenly propagating about thirty thousand routes, picked up from HCIX in Hong Kong, through its transit providers, Lumen and Aurelion. And, this is a little tweet from QRadar. I do a lot of collaboration with those guys. They do a lot of BGP analysis and are some smart dudes, are friends of mine. So, in this, you know, I mentioned, that we have this, NetFlow data. Again, as a BGP analyst, it's super cool to be able to pivot and look at, like, hey. What was the amount of traffic that got, impacted by this, incident? So this is an example of that where I'll I'll see a a routing leak and then pull all of our data and say, like, what we what do we see going down, this, this particular path? And I can query all of our NetFlow records by a, AS path subsequence that matches what we saw in, that that's, an indicator for the route leak. You get a sense for how much traffic got misdirected or dropped. Usually, there's more traffic that gets dropped and misdirected, but the misdirected stuff's interesting. So then in this case, you've got, like, Hong Kong, Indonesia, Australia being the top three countries that were impacted from a traffic perspective from our customer base. And, and here's a little visualization of one of the routes that got sucked into this, leak. So this is a Netflix route. The route is, on this graphic here, it's, normally only seen by about, I don't know, maybe twenty percent of, our route our, PCB sources. During this bulge, there's this big, blob here in the middle, where HCIX becomes the global upstream for this route. And what's happening is, you know, this is a what we call a regional route. Maybe there's other terms for this, but this is, you know, some content providers will just announce a more specific just to a particular georg geographic region, and, there'll be usually a covering prefix. But, if, just for that region, then, you know, it's meant to do some sort of traffic engineering. But the problem is that, because eighty percent of the Internet doesn't have an equivalent route, then when a leak, occurs, that leak stills in the vacuum, and there's nothing to contend with. So even if the AS path is super long or, there's other kinds of issues with the route, it'll be adopted. And because it's more specific, it also will usually disrupt some traffic. So, I don't know what the impact on Netflix was here, but this is just, you know, one, one of those thirty thousand routes that got misdirected. But, hey, mistakes happen. And, you know, since this is an JCC leak or a path leak, depending on what term you like to use, you know, ROV isn't gonna help. The origin is intact. It's the same origin. So, they're gonna be RPI valid from that perspective. But we've got ASS, right, to enable transit providers to programmatically build, appropriate allow allow list to prevent the propagation of leak routes. Right? Right. Well, it turns out this doesn't always work, and there's, there's a couple of reasons why. So in this particular case, as was the case with Vauxility and a bunch of other leaks that have happened in recent, weeks, you have a if you look at the the provider involved and look at their AS set, you can see that, and go to, like, b g b dot tools is a great resource for exploring AS sets. This expands to forty three thousand eight hundred and twenty three ASNs. That's a lot, given that there's only about eighty three thousand in the global routing table, you know, plus or minus. And if you were to apply this, AS set to a routing interface, you know, you would, you would have a a crazy list of of networks that supposedly are allowed to be transited by this Russian mobile operator, including, some examples here. We have the US Department of Defense, rural telephone company in Kansas, United States, American Express, credit card company, AWS, the national government of Kenya. It's all over the map, what is included in this ASF, and that is supposedly allowed, to be transited, by, MTS. It it shouldn't be. It that doesn't make a lot of sense. In fact, this expands to over a million, e four prefixes. So, you know, one of the one popular tool for building BGP filter lists, based on IRR data is BGP q four. The GitHub link is on the slide here. And, if we looked at this ASMTU and this Junos, configuration to pick one, we'll we would come up with a configuration that's one point three million lines long that you would be applying into your, router's running configuration. We can use the AS option to try to aggregate that, reduce the number of lines to only a quarter of a million, but it's still or a third of a million, but it's still a lot. And most of it shouldn't be there. That's stuff that, MTS shouldn't be, allowed to transit. You know, the routes contained, in this AS set, if you were to go through and look count the unit unique IP addresses, it's about one point eight billion unique v four addresses out of a total of about three billion that are currently in the routing table. This is excessive, and it's not gonna, not gonna, prevent a leak. And this ASMTU is not alone nor is it anywhere near the worst. So I reached out to Ben Kari Cox, the creator and founder of b g b dot tools, and he ran the numbers. I I I could do this too, but I've I, looked at his tool. I was like, I I bet you already have this data on hand, and sure enough, he he had this. So he was, kind enough to run this, for me and, call this some of the biggest, AS sets that are out there, and there are a lot that are excessive. And, you know, this this top ten here, everybody's about a hundred and two, thousand ASs. There's again, there's only eighty three thousand in global routing, so we have, like, another twenty thousand that are unrouted ASs that are included in all these, ASUs. There's almost twenty two hundred ASUs in circulation that are over a thousand ASNs. So, again, there's, you know, what what good can these possibly do? In fact, not only do they, make us so right here. I'm getting ahead of myself here. Yeah. So what's the problem? Yeah. Our only hope to reduce the harm from these BGP mishaps is automation. We can't do this manually, and this IR data enables automated generation of allow lists on session filters. And so we we kinda need this. We need something along these lines, but the these excessively excessively large assets defeats the whole purpose of the allow list or, they're kinda just a permit any any. But but not only that, it's it's, it comes at a cost, that it was breaks a lot of automation. I had conversations with a couple of cloud providers, cool tier one networks that they have to build, you know, custom workarounds to their, IRR, intake based on these AS sets that they can't, either can't consume these or they have to do something to clean them up or something. But, you know, there's a there's a computational expense. If you were to apply this to, one of these AS sets to a router, there's a there's a storage and computational expense. You're pull these, routes get these ASFs get refreshed, you know, maybe multiple times a day. Every time you're pulling this data over, that's all gonna be thrown away, or, or is or is actually, you know, defeating the whole purpose of or the mechanism of IRR, route filtering. So, this is an issue, the, I don't know. I I'm trying to bring some some attention to, and I think some of it has to do with the fact that we have, you know, AS sets are allowed to be nested. And and so this nesting kinda gets out of control to the point where, you know, I think the the people adopting these AS sets, don't know what's even in them. And, you know, the the things to do here are, you know, check yourself if your network do you have an AS set? Are you including what what do you include in it? If you're including other AS sets, what do those include? Do you expand this out and the routes look, legitimate? You know, be careful, what you're including and, you know, try not to pollute the space. This is kind of a tragedy of commons scenario. Only include the things you wanna, transit, and I I, what concerns me is the nesting of AS sets. I think that this, creating a lot of, you know, loss of, of visibility and traction, to to understand what's what is, getting included. And in the back to this leak, this MTS leak, I don't know if Lumen, either they adopted the the, they were using this excessive AS set or it was too excessive and they couldn't use it. Either way, it's not helping, prevent, the destruction that occurred during that leak. Alright. So I'll take a quick, aside here just to talk about one kind of non routed BGP security issue, and that has to do with, proxy services. And this has been around for a while, and it's gonna be with us for a while. This is a pretty hairy, issue. But this has you know, we bring up the issue of, routing, routing security. This needs to be, you know, mentioned that this is a recent case here where we've got a, an actor on the Internet getting address space and registering it as if it was from AT and T, mobile service. And so what proxy services are is, they are trying to to sell access to, networks that can't be blocked, or won't be blocked, those being mobile mobile, network operators and, like residential broadband, things that, you know, kind of providers need to accept, connections. That's, otherwise, they're gonna lose, you know, their whole, customer base. So, proxy services operate in a few different ways. One is they actually, you know, embed into devices, that are on those services and try to sell access to them. And then there's there's also this kind of thing where, they don't they don't go through the trouble of trying to embed. They just, lie about what the address space is and hope that when, you know, somebody looks up an IP address and their security logs sees that it's AT and T Mobility, they know that this is, they they think maybe this is, something they can't block and they can't do something with. When in reality, this is actually coming from, I forget where this where this was, China or somewhere. But this is, you know, this this particular case was resolved, but these things pop up, regularly. It's another issue of, in the realm of BGP security, and, you know, we need better visual inside of the RIRs. The RIR got cut off here in the the box on the right here, but that's what that, the last word there is saying. And, yeah, ROV, not gonna do anything here. These actually have valid r o ROA's. So or they have a yeah. They're valid, routes. So this except for the the one in the middle, but the, so that's, you know, that's not something ROV is gonna be able to address. Alright. So, the last topic here is, we'll talk about, you know, some of the success here. So measuring success is actually kind of challenging here, and I think this is true for any security mechanism. If you're if you're getting somewhere and you how do you measure how many things did not take place, as a result of the security, tactic that you you've adopted? I would just say that, you know, for as someone who lives in this, data and in this space, and not everybody does, in fact, very few people, I think, really do, It's it's aware it's we are aware of the fact that there are routing leaks happening with with regularity, but the fact that most people don't know this is a is a sign of success. A lot of these incidents actually get contained to the region or area where they were, where the leak took place. You know, it affects those people, unfortunately, but, our system of IRR, filtering, RPI, ROV, the Pure Lock, all kinds of other, mechanisms that are out there are preventing these, leaks from, affecting others, and that is a sign of success. So leaks take place, but we just don't, they're not as disruptive. In fact, I would argue that the last really big disruptive BHB leak might be I I keep saying this at presentations, maybe I'll get, there'll be a new one and I'll, I won't be able to say this make the statement anymore. But back in, this is June twenty nineteen, this is the Allegheny Tech, Verizon leak that took out CloudFlare. Like, that was the last really big one, and it's not an accident that this is the last one, I think. It's a product of all of the oldest work that we've done over the years to try to eliminate, the, or to add additional checks to try to prevent disruption due to the inevitable mistakes and errors that take place. So leaks are still happening. Here's another case, you know, it didn't get a lot of attention, and, again, as a as a storyteller in the space, it's kinda hard to tell the story of something that didn't take place, you know, of successes. But but it's important that we understand that this is, you know, behind the scenes, transparently to our daily work, the system is actually doing, is protecting us. So in this case, this is back in September, Brazil. The government of Brazil ordered, that Twitter, now known as x, be blocked nationally. And as happens in a lot of countries, you know, the government the regulator just gives us order out to the ISPs, and they're on their own to try to come up with a way to, a black hole, the, or block Twitter. And, there was a couple of Brazilian ISPs that decide to use BGP. This is, we've seen this in other places where they all announce, Twitter's address base, to attract the traffic and just black hole it. That's how they'll achieve compliance with the government directive. What these guys did was almost identical to what we saw in Myanmar in twenty twenty one and Russia in twenty twenty two. In each of those cases, the government of those, places issued a block of social media. Somebody tried to be do a BGB hijack the black hole traffic for x, and, and then the leaks the the the route leaked out causing disruption outside of those countries. In Myanmar, it was a little more disruptive in South Asia, Southeast Asia, Russia less so. What had happened between those two incidents was that Twitter, adopted RPI, created ROAS for all the routes, and so then it was much easier for networks to, automatically just reject the the the hijack coming out of Russia than it was for, the one that came out of Myanmar. Well, we kinda have another example of that, in September. One of these providers, this little graphic from, b g b to h e dot net, if you're you might recognize the the the style here. You could probably go look it up now, and it's that this little peak, that I've highlighted is probably still on this, graph on the the front page for that AS. But that that leak is the entity leaking out all of the, Twitter routes. But, in the end, it didn't really go very far. It only circulated in Brazil, and it also went through some, you know, peering relationships, but, it this really caused no disruption outside of Brazil. And, again, that's the system working. It did its job. Nobody was involved, and, very few people knew. I'm one of I'm oh, I'm one of them, but there are, we have to kinda know that the stuff is happening. There's other examples. But, I'll I'll wrap it up, here with just a couple of thoughts here. So, you know, there's a little bit of concern here where we have we're seeing some signs of slowing RPKI ROV adoption. As I mentioned earlier, this percentage of valid traffic's been stocked about stuck at about seventy four percent bits per second, since October twenty twenty four. I think this was to be expected that we're gonna start seeing some plateauing of this adoption, but, you know, we want to, wanna keep moving forward on this. V six also seems to be stuck. Maybe maybe, maybe maybe v four is gonna hit a similar plateau and, v six is just ahead of the game again. But, you know, how how much farther can we go in, roller creation and, is a is a open question. And then, you know, I I worry a little about, you know, being a victim of our own success that if we don't have those incidents like the Allegheny Tech leak from June twenty nineteen, how do we keep everybody's attention and and devoting resources to keep, moving this forward? This graphic is, you know, the little highlighted boxes of the v six kind of puttering out here since October twenty twenty four. And, yeah, finally, you know, what's the call to action here? As as we always say, so we want you to create rows, reject in ballots, minimize your AS sets. If you've got one, go check it. What ones are you adopting? Where are you applying? How is this, causing, you know, problems for you? We need help to support these newer, path, prevention technologies. Hopefully, maybe we could get out of the, our reliance on AS sets and IR based, filtering. We have a greater adoption of ASPA. This is a where an AS will assert in RPI, whether it's trans provider trans providers and then enabling others to look at the AS path and recognize that there's a leak taking place and reject the route. And then, additionally, you have RFC ninety two thirty four, this OTC attribute. Again, a router could automatically receive a announcement, recognize that this is not really meant for them, reject the route, and, again, suppress, leaked routes from, causing any more disruption than necessary. But, you know, RPI, ROV, and even even these others, ASPA, ASETs, these don't solve all the routing security issues. Our success in deploying ROV, I think, is the opening salvo towards addressing this the the the ultimate goal of the trying to secure against the determined adversary where you've got a, an entity that's, very knowledgeable and, has a, is very good at getting around some of the mechanisms. Like, ROV can be defeated if you have a determined adversary. They can just append an AS, on the, end of a AS path, and it's now valid, and it's not gonna do anything. So it's it's almost trivial, to to defeat. So it's not really there to address the determined adversary. We need different, stuff to to defeat that. And, there's been a few attacks on cryptocurrency services, that are, you know, good, examples to help us understand what what is unresolved, and we've got a lot of work still to do. But let's build off the progress we made with ROV to try to address some of these more difficult, scenarios. And with that, I will take any questions. Thanks, Doug. It's always a pleasure to hear you eloquently spiel for us. We've got a couple of questions coming in. Please keep those coming in. We'll start off with, we've got a comment here about asset tools, specifically tools like BGP four lack an exclude option. So you can't easily do something like b g p, four q exclude AS bad asset. So try and spin that if I can. Doug, what are the implications of this limitation for network operators? Yeah. I'm trying to understand. Yeah. Lack of exclude seems yeah. Well, Yeah. We have to think through that of, like, what, how would we how would you do that? You'd like to I I think the argument is you like to use, like, b g b four q to say to use a asset and then I just don't know how you would define what would be bad, in this case. But, yeah, maybe there's a way to, I can't I can't think of how you'd how you'd do that, but, I think the the main issue is that these we have ASX that are excessively, recursively nested, so they're just including so much stuff in it. And I think if you knew enough to know that it's got bad stuff in it, you might be taking other measures besides, an exclude. I don't know. It's a it's an interesting question. I I, I'm, I know some of the developers of b g b four q, so I might bring that up with them of of, you know, if there's some way to add something. Maybe maybe that exists and I'm not aware of it, but, yeah. I know that I have an offer another time. Yeah. It's a good it's a good question. Okay. Doug, we've got a couple others. Keep them coming in, of course. One here I mean, Doug is the best person to ask this. Would you recommend, Doug, for someone who is looking to get into more into routing, security, Internet analysis? Would you recommend a certification from, say, Cisco or just getting more into to network engineering experience? What's your kinda take on kinda getting in? Yeah. For this stuff, you definitely need to, yeah, you have to have some some experience exposure to networking. So, whether that's, like, I I I I started off in the space, as a baby network engineer, getting a a CCNA. It's kinda my introduction to that was many, many years ago. And, I thought that was a really good program. I think it's not the only way to do it, but, yeah, having, some network sysadmin, experience is a good base to build off of to, to be able to, you know, get into this type of thing. The other I guess the other, you know, there's a bunch of academic outfits. So we have, Georgia Tech's doing a lot of cool stuff, and, and we call this the field of Internet measurement. UCSD and KEDA is, has been the leader for decades of this topic. Maybe they kind of founded the whole academic area. So there's a bunch of universities that do a lot of cool stuff that, you know, they publish a lot of important papers, that are are worth, looking into. But, as far as, yeah, how to get into this, yeah. I guess, becoming a routing, a network engineer. I guess routing, is, I think it's a common topic in, the, you know, networking space and, Nanox space that, you know, maybe maybe network engineering is a kind of a dying breed, because everything's kinda in the cloud. You still have to have some of the knowledge, but, you know, people are on command at the command line a lot less than they were, once were. It's just not, a practical way to run a very large network anymore. But, yeah, I start with CCNA, if you're you've got, if you wanna start somewhere. Awesome. We've got I think that's a a documentation snippet in response to what we're talking about earlier that the ox exclude options will probably take that offline if we might. We've got a couple more plenty more questions coming in. Wow. This one asks, with the current, route filtering practices, nesting of ASETs, is needed, unfortunately needed rather, when tier one ISPs provide IP transit IP, tier two ISPs, who in turn provide transit to tier three ISPs, It's impossible, they argue, to keep route filters up to date without some sort of recursion done. Yeah. I think I think this this comment is pushing back on this idea of, like, you know, avoid recursion. I just I don't know I don't know how to, maintain, how someone would maintain knowledge of, what's in the AS set. So I think, you know, some recursion is necessary here. I guess, I feel like in going through this data, you could find cases where, like, this I I I can't see a justification for the, the recursion that's taking place in the in a particular ASAT. The comment here is saying that, we can't do without it, that, they need to basically, a smaller provider needs to, be embedded into a larger provider, into another larger provider so that that guy at the top so this is from, somebody who's at a, a tier one network, can inherit you know, the the customer can inherit all of the, routes of its, within its customer cone. So some of that is is necessary. Yeah. I just, you know, we go back to the MTS example. Who are the, you know, who are the components? I I'd have to dig into it, or I don't know. Vauxiliary maybe is a a bad example, because that's a very unique you know, a DDoS mitigation provider is gonna have all kinds of things in its ASR, and it's hard to argue that anything shouldn't be there. But, a Russian mobile operator, probably should just have Russian, stuff or maybe, their subsidiaries in the, Eurasia, part of the world. It shouldn't have the US Department of Defense, prefixes in its AS set. And, you know, however that's making its way in there, it's coming through, some recursion. But, yeah, the the the point stands that, you know, we we can't, we can't operate without it. Awesome. Thank you. Got a question here. Someone's asking around any recommendations. Have you got any recommendations for BGP monitoring platforms at all? Well, funny you said. No. We I don't Kentik Kentik is in this business. I think we, we also we are one of, a number of companies that, sell b two b monitoring if, depending on your, your budget and your time. There's some open source tools that are pretty good. You know, if you, so there's there's a variety of, tools out there. And, I think, you know, from Kentik's perspective, you know, we, we think the the power comes from marrying this up with, NetFlow analytics. So we're not just looking at BGP in isolation. We can kind of look at this, as it pertains to the operational impact of the network. You know, if you're just looking at BGP, it's hard to know what's the, you know, what actually how many packets are getting actually affected by any one thing. But, you know, check us out. Keep us in in on your list for evaluation, but there's a there's a, maybe a dozen, solutions out there. So we talked about certs earlier. Someone else is asking around tools in terms of learning some of the things that we've talked about today. Any anything beyond certifications maybe? Well, you know, if you can stand to read, RFCs, published by, some some of the it it's, you know, dry technical stuff, but if you can, invest the time, to to go through this, it's a good way to learn some of the standards. I think it's kind of a hard, that's a hard, path for someone who's new to try to make heads or tails of it. And and I don't know. I think it's a bit of a challenge. This stuff is a little esoteric. I I had to reach out to, some other experts in this space just to, you know, confirm some of the the facts that, I was, using for for the, like, the AS analysis I put out this week. But, yeah, tools to learn some of the stuff other than, search. I mean, we've got these forums. So and, depending on where you are in the world, you've got, NANOG in the US and RIPE in Europe. These are our network operator forums, and there's a good way to, those are good ways to to meet the people who, work in this space and sort of learn, you know, listen to the presentations and learn some of the stuff that, you know, was getting discussed...

Kentik’s Director of Internet Analysis, Doug Madory, explores the current landscape of BGP routing security. He discusses key progress made in Route Origin Validation (ROV), common pitfalls like AS-SET nesting, and ongoing challenges faced by the networking industry. With real-world examples and detailed traffic analysis from Kentik’s extensive NetFlow data, Doug shares his insights and recommendations for NetOps professionals who want to improve their routing security practices. Watch this webinar replay to learn the latest methods for securing the global internet.

Kentik is the network intelligence platform for modern infrastructure teams.

844-356-3278

Platform

Solutions

Technology

New and Notable

Learn

Company

We use cookies to deliver our services.

By using our website, you agree to the use of cookies as described in our Privacy Policy.