Kentik - Network Observability
More episodes
Telemetry Now  |  Season 1 - Episode 12  |  April 18, 2023

Securing Global Routing with RPKI and BGP Security

Play now


Job Snijders
Job Snijders
Principal Engineer, Fastly

Job Snijders is a Principal Engineer at Fastly where he analyzes and architects global networks for future growth. Job has been actively involved in the internet community in both operational, engineering, and architectural capacity. Job co-chairs the IETF GROW working group, the RIPE Routing Working Group, is vice president of PeeringDB, director of the Route Server Support Foundation, member of the RIPE NCC Executive Board, and an OpenBSD developer.

Doug Madory
Doug Madory
Director of Internet Analysis, Kentik

Doug conducts analysis of events and trends across the global Internet for Kentik (previously Oracle Internet Intel, Dyn Research and Renesys).

Follow Doug on LinkedIn

Transcript

Phil Gervasi: Historically, the internet has operated on a trust relationship. Trust among those advertising their own networks out to the world and learning how to reach remote networks. But there really hasn't been that much to prevent anyone from advertising incorrect information out to the rest of the world and therefore manipulate where others send their traffic, sometimes to the extent that traffic might even be black holed, causing major outages. This problem isn't really new at all, whether that's incompetence, bad engineering or nefarious activity and efforts have been underway for quite some time to address this. So with us today is Job Snijders, a subject matter expert on internet security. And frankly, that's very much an understatement. Job is a principal engineer at Fastly. Co- chair of the IETF Grow Working Group, co- chair of the RIPE Working Group, vice President of PeeringDB, director of the Route Server Support Foundation, volunteer for the IRRDV4 Project, developer for the OpenBSD project. And I'm sure several other roles that I've missed here in this extensive list as well. So suffice it to say that Job is a prolific contributor to the global routing community and I'm very excited to have him with us today to talk about what's really wrong with routing security and what remediations are out there and are being developed to solve this problem. I'm Phil Gervasi and you're listening to Telemetry Now. Let's get started. Job, thank you for joining us today. It really is a pleasure to have you. Before we get going and into this very deep and extensive topic here, I would like our audience to get a little bit of background information on your work experience or technical experience and also how you've contributed to the community overall over the past few years. Because just in the introduction there, that list is pretty extensive of what you're working on.

Job Snijders: Thank you for having me. It's always a joy to be in a conversation with Doug and yourself. You're asking how I got where I am?

Phil Gervasi: It's a complicated question, I know. How do any of us get to where we are?

Job Snijders: I think if I go back to my 20s, I started as a system administrator and somehow noticed that the systems would sometimes be unavailable and this was due to the network. So being the problem chaser that I am, I ended up learning more about how networks work and got a job as a network engineer. And then I noticed how tedious it all was to type things into routers and how many mistakes we make on the keyboard. I myself have mis- originated prefixes because the digits on the keyboard are so close to each other. So from there I rolled into network automation and started programming systems that controlled the network. And from there I, at some point, latched on to NTT. They were a provider of the company I was working before I joined NTT. And I always very much liked working with the NTT people. So I was there for a few years and at NTT, I spent a lot of my time trying to improve routing security. So I took a look at all the BGP decision making processes that NTT applied to their routing system. I introduced RPKI origin validation, managed a full rewrite of the Internet Routing Registry Daemon, IRRD that they used to generate network configurations. And then I joined Fastly because there was a lot of cool stuff to do over there in the realm of routing security. And so that's a bit of an resume overview. But what I recognize in some parts of my career is that I'll bump into some kind of issue and then try to find the root cause. And sometimes the root cause is stuff like the IATF RFC is not being complete or containing some kind of annoyance or mistake. And then to go up there and even fix things at that level. So years ago I noticed that more and more people were getting assigned 4 byte ASNs and the whole world was using BGP communities, which are a 32 bit value and 16 bits of that are your ASN. And the remaining 16 bits are an arbitrary value that you can set yourself. So obviously you cannot fit 32 bits worth of information into a 16 bit field and I, in effort, on a project together with many others to introduce BGP in large communities. So yeah, that's some of what I do. I look at the ecosystem, I try and identify gaps or shortcomings, be it in software or in the specifications themselves. And then I try to fix that in a way that is beneficial to everyone because ultimately my employers benefit from a well- working internet. And if that means that we have to boil the ocean to make it work better, then I will go boil that ocean and make it work better.

Phil Gervasi: Well, I certainly appreciate your effort in the community because I use the internet and from what I hear it's a trend now. A lot of people are into this internet thing. No, no. But seriously, the level of mission criticality of internet connectivity on the local and on the global level is such that this topic I think is very, very relevant. And then the nature of how the internet is built on these trust relationships, by and large, just makes this so much more poignant now, especially as we are looking at volatility among nation states and wars and things like that occurring. I do take personal offense to one of your statements where you said that you found that it was often the network's problem or the network's fault. It's usually DNS, is my experience. Or the developer's just writing their applications poorly. No, I'm joking.

Job Snijders: Or expired certificates.

Phil Gervasi: Now you did mention that Doug is with us today. So Doug, I would love to give you an opportunity to introduce yourself as well. Doug, I think you're no stranger to our audience, but if you wouldn't mind maybe giving us a little background of your relationship working with Job.

Doug Madory: Let's see. So we have some common interests. Clearly, Job is very interested in BGP secure routing. That's been my focus for the last 12 plus years of... But in a slightly different capacity. So I do a lot of analysis of trying to understand things that took place. There's a little bit of storytelling there. There'll be an outage and we're trying to understand or a hijack or something... What happened here and there? So I'm a bit more oriented towards just trying to explore and understand real events as they're happening. But that runs into a lot of the stuff that Job's working on. And so I don't know, a number of years ago, I think Job, you had a blog post you had written and then I thought I saw something that was inconsistent with that I was a routing leak that you had made, I think, a claim about NTT. And I was like, " Well, I think this doesn't make sense with what you said." And you explained there was some nuance there and it did make sense. But it started a conversation that's never stopped of just like... I'll find something interesting that I think you might want to be aware of or maybe you've got some insight or maybe you don't. But it usually makes for a good conversation because you're not spending your day looking at events around the world. I am. And so I can bring these to you and then there's a synergy between those two perspectives. And I feel like we've had a few times where we've been able to get some benefit for mutually beneficial thing. And then all this work you're talking about, there's a lot of... We talk about the internet as this abstract thing, but this is many companies and millions of people, there's a lot of positive downstream impacts that I think we abstract away when we use some of these terms. But there's been a lot of benefit to a lot of people, a lot of the work that you've done, Job. So I really value our relationship and our conversations. And then anytime there's an opportunity to collaborate, I jump on it.

Job Snijders: As do I, Doug. I think it is phenomenal that you throughout your career have maintained access to fast data sets that either offer insight into what transpired or can help confirm suspicions about, " Hey, in theory this and this behavior might exist, but are you actually seeing it in the wild?" So yeah, it's a lot of fun bouncing ideas back and forth with you.

Phil Gervasi: So Job, I'm going to direct this question at you. It's broad and maybe it would take multiple podcasts to answer, but what's wrong with internet security?

Job Snijders: Fundamentally, I think the biggest issue is that we are used to internet routing as a plain text messaging system. So I will pass on to you a message that you can reach a certain network via me and vice versa. You will tell me, " Hey, you can reach the Kentik network via me." And those messages are plain text. That means there's no signature, there's no cryptographic way for me to verify that you were authorized to sent me that routing message. And I think from there, a lot of the issues that we see stem. We have difficulty understanding the authenticity of routing messages. So over the years, many attempts or remediations have been created to limit the risk that this unsigned or plain text messaging back and forth between ISPs to address that. So for instance, providers would ask their customers, " Hey, you want to purchase a transit circuit from me? Can you tell us the list of prefixes you intend to announce?" And then the customer might fill in on the surface order form a list of 10 prefixes that they intend to announce, or maybe they store that information in the IR or maybe elsewhere. And then based on that information, the transit provider creates a filter to only permit those prefixes that were previously agreed upon to mitigate the risk of the customer accidentally or maliciously announcing prefixes they should not be announcing. But as the internet grew, doing this manually through surface order forms that you faxed to each other or letters of agency, they are sometimes called LOAs, this skills poorly, especially in the wholesale market so fully automated systems like the Internet Routing Registry became commonplace to generate filters. But the Internet Routing Registry really is a garbage in, garbage out system. And yes, it is automatable, but again, there is no cryptographic signatures on any of that information. It's transported in plain text and that means that you're receiving unsigned or messages without signature from the customer that are BGP messages. And you are comparing those unsigned BGP messages to unsigned information from the Internet Routing Registry. And then you try to arrive at some kind of conclusion whether you should accept the route or not. Yeah, this is a system that's been alive for decades. It's growing organically as we go we learn and adopt and develop new technologies. And deploying new technologies easily takes up to a decade. So it is no surprise that we are in the situation that we are, but luckily with the advance of RPKI, I believe we are finally seeing some traction in the internet's routing industry to really improve both safety and security of the routing system.

Phil Gervasi: Yeah. I do have two questions though based on what you said. The first is... Now I've configured BGP many times on a customer side, peering with my provider and you create whatever prefix lists and filtering policies and that's fine. But from what you explained, it sounds like it could be... Is it more of a problem between customers and their providers right at the edge or among transit providers or both?

Job Snijders: Absolutely both. Yeah. If I look at the last handful of years, it really took a lot of effort in the entire industry to move the mindset from, " We must accept as many prefixes as possible because that means we have a full routing table," to the opposite mindset where people say, " We must reject all suspicious prefixes even if that means that our routing table becomes smaller." So it's only been five, six years that filtering on internet exchange route servers became commonplace. Because previously internet exchange operators would say, " It's not our role to do filtering. We are a neutral entity and we just take those messages, onsite messages, and we pass them around and that's our job. And the more messages we pass around, the better of a job we are doing." But then the customers of the internet exchanges had to teach the internet exchange operators, " It's very nice that you want to be neutral, but I cannot possibly fit good enough filters on my router because there are so many BGP sessions behind your route server and so many prefixes coming in that it is impossible for me to do correct filtering after the aggregation point that your route server presents." And then slowly internet exchange operators began to see value in like, " Oh, the more trustworthy my route server is, the more trustworthy or valuable my surface offering to the customers is, and the happier my customers are, the happier I am." So this mindset shift is pretty recent. And previously some operators would say, " Hey, if we start filtering, we might lose visibility on say 5% of the routes passing through the route server." And I would always be on the barricades arguing, " But those 5%... That's bad information. Don't propagate it." And there is some analogy to spend filtering in the really old days where originally you try to deliver every possible email to everybody and at some point people are like, " It's very nice of you to try to deliver all these email messages, but 50% of it is junk and I don't want it in my mailbox and it's your job to do the filtering." And I think BGP routing went through a similar transformation where people begin to realize that the goal is not to pass around as much information as possible, including route leagues, the goal is to create a stable system where wrong information does not propagate through the system because wrong information invariably is... Latency increases or unexpected traffic shifts, or even worse, traffic drops where traffic no longer arrives at the intended destination. So I think as the years went by the industry really started to understand that these unsigned BGP messages better be of good quality, otherwise we're in trouble.

Doug Madory: Job, I started into space in 2009 with inaudible and I feel like it's a good time to have started because at that point, like you said, you're talking about using the example of IXPs, but the state of routing hygiene has improved, let's put it that way. But for those first years of digging through events, there was some wildly terrible things happening. And that's part of... I know you've got a message or you've got an outlook on routing security. And I have one... They probably have some similar themes, but the one I've had in the past year wrote some presentations was like... We still have a lot to do. And there's been now these cryptocurrency hijacks and stuff, there's been... It's still happening. Having said that, we do have to take a moment and appreciate how far we've come. Your example of the IXP's using route servers to filter, I think that's just one of many examples that all started to come around in the last, maybe it's five or six years, like the timeline you mentioned. But I think the number of... I call them" bonehead errors" that we just used to have this spectrum of one end is just some inaudible originating the whole internet or something. And then you got the other end, the determined adversary, these folks that are going after cryptocurrency and doing really sophisticated stuff. And we'll get that. We would like to raise the cost of those people on that end, but at least you would hope we could eliminate the bonehead end of that spectrum. And I feel like we've moved the needle. And when's the last massive origination leak that disrupted the internet? It's been a while. And I think there's a lot of different people doing a lot of different things that they make that a reality, but there's been some improvement before we throw up our hands on all this.

Phil Gervasi: That does lead me to my second question though. Job, you were talking about... It sounds like pretty much origination, verifying the authenticity of prefixes that you are ingesting and therefore are you permitted, are you allowed to be advertising these prefixes? Checking that and verifying that? Is that the only kind of problem here? Because I know that RPKI addresses that, but there are some other solutions out there that address different problems that we're seeing?

Job Snijders: Yeah, all of this is you stack on top of each other multiple practices or multiple technologies to arrive at a, let's call it stable, safe state of the routing system. So if we look at the last few years, internet exchanges fully embracing route filtering, now using it as a unique selling point to attract new business. That's a fantastic development because previously internet exchange route servers were, by their very nature, they don't have global visibility. And internet exchange route server only concerns the peers adjacent to that particular route server. So there's the visibility into incidents at that level is very different than say something leaking through a global transit provider like TVR or NTT or level three. So internet exchange route servers, check. But another development was the popularization of a concept called peer lock. For many, many years, a few internet providers of substantial skill had arrived at the conclusion that if you're say NTT cogent level three, you should never ever see level three routes via cogents in the NTT network or any other permutation thereof. And so it's not just about authorizing what is expected in the global routing system, but also discussing amongst providers what is absolutely never expected to show up in the global routing system. So I think inaudible had a route leak detector that used a few permutations of regular expressions to figure out if three or four so- called transit free networks would appear in the same AS path. And if that was the case, then that definitely was a route leak because the transit three networks are not supposed to provide transit for each other. And I took that idea, I was like, " All right, so we have monitoring, cool. And we get alerts, nice. We're aware of these outages, but how about we try and solve them?" So during my NTT tenure, I talked to all those partners and I was like, " Hey, can you implement filters to prevent accepting routes that contain entities AS number anywhere in the AS path on all sessions except the ones with NTT themselves?" And I think this also goes back to a mindset shift that had to happen. Previously, sometimes people would say, " Well, if I properly announce routes to you, but then you in turn leaked them, that's not my problem. It's your responsibility to not leak the routes that you receive from me." Nevertheless, I still suffer from that leak happening even though it's not my equipment or my configuration that allowed the leak to spill. So taking preventive measures that take an effect outside the immediate administrative domain of the internet service provider themselves was a huge step forward. And I think nowadays, as amongst the top 10 providers in the global transit markets, you'll see a lot of peer lock configurations. And this has a tremendous effect on the number of route leagues that we see nowadays compared to say five or 10 years ago.

Phil Gervasi: Tremendous effect. Meaning am increase?

Job Snijders: Yes. A positive effect.

Doug Madory: I would characterize it as... I would agree with the positive effect. What it ends up doing is just suppressing it down. So if there is some sort of leak, it just can't go through what I would return the top of the internet where it gets really widely propagated, widely circulated, that's not possible. And so then these leaks end up just being localized. And in this space, that's the best you can hope for is that any problem that arises, we can't prevent all problems, but as long as they just stay localized, then others don't get harmed by them.

Phil Gervasi: Well, there is still an underlying trust relationship both at the edge and with transit providers. So as much as we're talking about the, we mentioned PeerLock and RPKI, there are some technical remediations that can be done, but a lot of it is administrative or some kind of a service layer on top that prevents, like you said, something that you can't prevent from propagating throughout the rest of the world. And I want to get deeper into defining RPKI for our audience and PeerLock and what those things are and what specifically they solve. But you've mentioned two different problems here, the authentication of the origin of a prefix and I'm seeing, but also path validation. So those are two separate things, correct?

Job Snijders: Actually there's three separate problems.

Phil Gervasi: Okay. Very good. Well, that's why you're here. To educate.

Job Snijders: What most people nowadays refer to as RPKI actually is RPKI route origin validation. And the RPKI is a globally distributed database whose integrity is protected with signatures and the RPKI is a foundation on top of which we can build multiple applications that each somehow leverage or benefit the RPKI's cryptographic properties. So the RPKI, that word by itself should be viewed as a database of delegated authorizations. The RER maintains... I don't know, like 20% of internet resources. And their job is to ensure that those resources, be it IP address prefixes or AS numbers, autonomous system numbers are delegated to ISPs who in turn may further delegate those to their customers. So with the RPKI, we have a system where we can figure out who is authorized to do things where I'm purposefully leaving things a bit vague for now with what's internet number resources. Now fast forward a little bit then there is RPKI route origin validation and that is the first application built on top of this RPKI foundation. And route origin validation, the mechanism is as following, I can publish in the RPKI that a given IP prefix may be originated by a given AS number. And then consumers of that information can use that information to compare the BGP updates that they receive to that cryptographically verifiable information stored in the RPKI. So if your prefix is 10/8 and you announce NBGP 10/ 8 towards me originating from AS 65, 123, it's a private number, then I can check whether the information in the BGP update matches the information that I learned through the RPKI. And if there's a mismatch, then I know that your BGP announcement has an issue and therefore I should reject or ignore your BGP announcement. And in doing so, I ensure that the safety of my routing system is maintained. So route origin validation is you take untrusted input, BGP updates, and you compare that to an inaudible distributed cryptographically verifiable database. But that doesn't solve all issues because you may spoof the origin in your BGP message or there might be something like a route leak in which you are redistributing parts or the entire routing table to me. And you're not supposed to do that, but nevertheless you are doing that for some reason. Could be a misconfiguration, could be a software bug in your router. And then if I apply origin validation, a lot of those announcements will look squeaky clean because when you're doing a route leak, you're not modifying the origin, you're just passing on these messages even though you didn't intend to pass on those messages. So forget the problem of route leaks. What started out as PeerLock, so this mechanism of" Hey, I should never see routes that contain cogents ASN behind level three peering sessions or vice versa." PeerLock is not a democratic approach to this problem. PeerLock requires that you pick up the phone, that you have social relationships with the people managing those other large networks. So it's not accessible to everybody and there are 85,000- ish autonomous systems in the global routing system. So we definitely need a solution that does not require those 85,000 organizations emailing each and every one of the 85, 000 organizations to establish what routes are supposed to go where. So to democratize the solution to the problem of route leaks, Alexander Asimov and some others came up with an idea that is called ASPA, Autonomous System Provider Authorizations. What's really called cool about ASPA is that it leverages this RPKI database, this distributed database of authorization delegations and the ASPA technology is such that you can publish in this database who your providers are. And then consumers of that data can verify or compare given BGP updates too through the original validation trick to see if the original is matching up. And also use that information of the listed providers and compare that to the ASNs that appear in the AS path and from that deduct whether a route leak is happening or not. Because route leaks are business problems. From a protocol perspective perfectly valid to leak routes. That's basically a full transit service is an authorized intended leak of the entire routing table. And then the first application that builds on top of the RPKI has to do with the authenticity of the BGP messages. Because even with origin validation and even with inaudible, the BGP message that you sent me is onsite. It doesn't contain a cryptographic signature. So for all I know is that you may be spoofing an AS path that you are fabricating the information in the AS path such that it complies with origin validation check, such that it complies with the ASPA verification check but still is not supposed to be there because you're not who you say you are. And for that, a solution exists called BGPsec. And in BGPsec, you stick signatures inside the BGP messages that can be verified using public keys distributed through the RPKI. And to figure out which public key belongs to which AS, again the RPKI system of delegated authorizations is used. So you can never ever publish a public key and associate it with an AS number that does not belong to you. So that is the holy trifecta that we need to really secure the whole global routing system. And I think we're halfway through this.

Doug Madory: Couple things. It's a good summation of the... I think the state or the plan. So you just mentioned BGPsec, so we mentioned that one first. I'll just go ahead and say there's been a belief or pushback on that, that this is a technology that's too taxing on a router to be able to handle the verification, the crypto optic verification for messages as they come at line speed and that that is going to be the death of that technology. And so I'm setting you up. So what's the response? Because I know I hear it. I know you do. What's the response for that?

Job Snijders: It is a really good and fair question because we've not, as of yet, 2023 seen a lot of BGPsec uptake. So what gives? What's wrong with this technology? So I have a few theories and some positive outlook in this regard. For BGPsec to exist, the RPKI global distributed database of authorizations first had to exist because you build one application on top of the other thing. And I think the original designers of RPKI technology were super clever or maybe this just happened by accident to first focus efforts on origin validation. Because out of all the technological aspects, origin validation, arguably, is the simplest one. You fetch the information from the RPKI, you do the signature verification. But push comes to shove, what you do on the router is you compare a few integers with each other. You compare the integers that you received out of the RPKI system with some integers that you received out of the BGP update. And getting to that state already was a tremendous effort because all five RERs had to create software and deploy software that facilitated the local internet registries to publish ROAS. Those are used for origin validation. And then ISPs had to start using those ROAS. And that's, I think, super novel development. It was only at the start of 2020 that the large tier one providers started using RPKI origin validation as part of their defensive posture. So I would say origin validation is super young. We've only been using it for three years now at truly global scale in hundreds of networks. And to get there, that took... I don't know, 10 years?

Doug Madory: I'll add, Job, because I know you're too humble to take any credit, but this was in no small part to a lot of your advocacy and traveling to many, many NANOGs and APRICOTs, inaudible, RIPE, giving a lot of talks and making a case so you deserve some credit on that. But obviously there's a lot of people involved as well. Sure.

Job Snijders: Much appreciated. I take credit for evangelizing the technology, but I didn't invent it. It was there and I was looking at it, I was like, " Holy shit, this addresses a really big problem that we have. I need to tell everybody that there's a solution to this problem."

Doug Madory: So that's ROV and then I think you're bringing us up to the case for BGPsec.

Job Snijders: I think BGPsec in the IETF was a little bit ahead of its time. They had finished the specifications for route origin validation, published RRCs and thrown it across the wall. Said, " We are done. Onwards to the next problem, path validation." But they had not waited until the world had deployed original validation and then start the path validation development process. They did things quite quick after each other. And I don't fault them. But this means that when the BGPsec RCs got published... I think this was 2017, 2018, the world had not yet even deployed origin validation. That came in 2020. So BGPsec Standard was laying around, the origin validation standard was also laying around in 2019 and neither of them really deployed at scale. So that, I think, was a big obstacle for BGPsec. And then because a few years later, people look at BGPsec and they're like, " Nobody deployed this so it must be trash." But the same applied to origin validation that lingered around having been published as an RFC. And it took eight years for it to really see some uptake in the real world internet.

Doug Madory: Sir, are you saying that the specification is just too young?

Job Snijders: I think the specification was published in a timeframe where people were not yet receptive to what that specification could mean in real world operations. And there's a few factors that tie into this. Yes, people have expressed concerns about computational costs of this BGPsec mechanism and they're not wrong. But luckily CPUs inside our routers get upgraded over time. Every five, six years, most operators will replace virtually all gear in their deployment. And every time you do a refresh of the equipment that you deployed in the field, you get more RAM, faster or better disk and better CPU. Maybe a CPU with hardware acceleration for cryptographic operations. So as time goes by, the hardware of which we're supposed to run this machinery is getting beefier and beefier to the point that it's actually feasible to do inline signing and verification of BGP messages. But when BGPsec specification was being developed and published, I think a lot of people were looking at their currently deployed hardware and were like, " No way this ever is going to fly." Not realizing that maybe five years from now the hardware might be perfectly suitable to do it. So there's a bit of tension between when technologies exist on paper and when we can actually use them in the fields in commonly deployed machinery. So that's one aspect. CPUs are getting better. Another cool aspect of BGPsec is that you don't need to do it on each and every BGP session. BGPsec protects the integrity of your BGP session and it's worth spending resources on the BGP sessions that matter most to you. So if a particular Beach P session is revenue generating for you, that's worth protecting. But if the session is for instance, a gateway of last resorts, then maybe it's not worth protecting it with BGPsec because you literally are sending packets down that path only because there is no other path available and therefore it doesn't really matter. To provide some real context with this, in the case of Fastly, our private peering connections are the valuable connections. Those are super high bandwidth, they move lots and lots of bits, but generally speaking, the BGP state on both sides of those high bandwidth connections are pretty small compared to the entire global routing table that we receive from our transit providers. So if there's a private peering between Fastly and some residential internet service provider, back and forth, we might be exchanging a few hundred or only a few thousand routes. And signing and verifying a few hundred or few thousand routes arguably takes way less CPU cycles than signing and verifying a million routes that I'm receiving on the transit connections. But the transit connections have a lower local preference because those are the gateway of last resort. That means passing the packets on to a network that acts as an intermediate. And whenever you can cut out the middleman, usually from either a latency or capacity or economic perspective, it's best to create short paths. So I think there's some interesting positive interactions between the economics of how the internet works at large and that people are invested to protect the BGP sessions that are worth most and that generally speaking, those valuable BGP sessions are responsible for the vast, vast majority of internet traffic that is being exchanged and coincidentally represent the least amount of BGP state on both sides of the connection. In other words, yes, BGPsec has computational cost and I think it is feasible that we'll see, in years to come, at global scale that people will opt to protect small BGP sessions that represent large amounts of traffic. And there's the matter of the ecosystem being ready in terms of software capabilities. And in 2018, 2019, I started with an RPKI validator project, inside open BSD called RPKI Clients. And the first thing we developed was the ability to validate ROAS in order to facilitate origin validation. And then later on I added the capability to verify BGPsec router keys. And a few months ago I added support to verify ASPA objects. And as time goes by, these new capabilities have to find their way into all components of these pipelines. Because not only does my validator need to support ASPA or origin validation or BGPsec, but the RER web interface where you configure ROAS or ASPAS or BGP also must support all three. And my router also has to support all three. And that can take years. I think origin validation is pretty new in global deployment. I think we now found most of the bug in the BGPsec's. ASPA verification is in full development. The OpenBGPD project has been working hard on that the last few months. And the next step is BGPsec. And I think once all three are available to operators that they can actually start testing if it works for them, if it gives them benefit maybe in partial deployments, then we have the final verdict on whether a technology was a misfire or super beneficial, but it unfortunately took 10, 15 years for it to get deployed. And finally we've seen the web transform from an HTTP plain text only system to something that is now where we're close to 100% of HTTP traffic runs over TLS protected sessions. And I think in the old days people would argue, " Oh, TLS on the web server is super expensive. Doing cryptography with each and every web server client is crazy." But as years went by, people are like, " Oh, it is worth my while to heavily invest in sufficiency inaudible cycles in order to protect the HTTP connections to my customers because the cost of dealing with network abuse is higher than the cost of just throwing more CPU power at the problem." And maybe we'll see a similar development in the BGP world where some people realize that to their business, the additional cost of the CPU cycles is worth protecting certain BGP sessions. And as I said, I don't think BGPsec needs to happen on each and every session. I think we'll, especially in the beginning, see that people deployed on in a limited context on private peering sessions or sessions facing their customers. Sessions that represent revenue and therefore it can be justified to throw the extra CPU power at it to make it secure.

Phil Gervasi: To me, it does feel like RPKI is almost like a foundational stepping stone to much of the more advanced or subsequent mechanisms that we're using. Would you say that's correct?

Job Snijders: Yeah, absolutely. The RPKI really is a general purpose infrastructure to verify whether someone was authorized to do something with an AS number or an IP address. And so it's really important that going forward we do not consider RPKI synonymous to route origin validation. Route origin validation was just the first simplest application we could come up with that had a dependency on the RPKI. So in the realm of new innovation, last year an RFC was published that specifies a thing called RPKI Science checklists. And what that allows people to do is to produce a cryptographic verifiable signature over an arbitrary hash that you can verify with the deployed RPKI. So for instance, let's talk about bring your own IP space and cloud providers. If I want the likes of Amazon or Google to originate my prefix, a prefix that was assigned to me, Job Snijders, but I don't want to run my own infrastructure, I want to use their cloud infrastructure and have them originate the prefix so I don't need to invest in running a router. Cool. So I sign up with one of these cloud providers and they say, " Thank you for trying to onboard. Please create a ROA that authorizes our autonomous system to originate your prefix on your behalf." Now, if I create a ROA that authorizes either Google's ASN, 15169 or Amazon 16509, at that stage of the onboarding process, Amazon doesn't actually know whether I created the ROA or coincidentally someone else who is also trying to onboard created that ROA. And it could happen that two entities both create an account in the cloud provider's management portal and both claim to own the same prefix. So at that stage, because the ROA exists, Amazon or Google does know, they are authorized to originate the prefix, but they don't know which of their customers the prefix actually belongs to. And over the years, a number of workarounds or hacks have become part of these onboarding procedures. So I believe Google, they would present you with a random string and then ask you to put that random string in the WHOIS record for the prefix and then Google would start scraping that WHOIS entry. And then if they would see that random prefix pop up, then they knew which customer accounts the prefix actually belongs to. But unfortunately this is a pollution of the WHOIS because the business between me and Google is solely between me and Google and the WHOIS is a public resource. So it's a bit dirty to put that private business interaction in the public WHOIS. And WHOIS is transported in plain text. It's not a cryptographically secure channel. So there's a lot of friction in those onboarding procedures. And the mechanism that we came up with is that the cloud provider can tell the customer a random string, the customer can produce a tiny file, an RSC file that is a cryptographic signature over that stream and that file can be sent to the cloud provider and then the cloud provider can verify that file against the existing globally deployed RPKI. So if I ask them to originate an IP prefix, I will create a ROA that authorizes the cloud provider's ASN. The cloud provider will tell me as a random string, I produce an object or a file using my RPKI keys to sign it, send that to the cloud provider via email or a web form upload. And then the cloud provider can verify against the RPKI database of delegated authorizations whether I indeed possess the private keys associated with a given IP prefix. So once those two steps happened, the cloud provider knows two things. A, they are authorized to originate the prefix. Anybody can see this because it's published in the global RPKI as a ROA. And B, which of their customers the prefix is to be associated with. And with those two components, onboarding can happen in a fully automated way, in a fully secure way. And it happens in a private interaction because that second step, the RSC is only between me and the cloud provider. It's not published in the public WHOIS. I don't have to put it on a website. It's a file that exists and only myself and the cloud provider know of the existence of that file and its contents. So this is super, super new. The RFC was published only a few months ago and here again we have to wait for the ecosystem to catch up. The validators need to be extended to support validating such offline signatures. The RER portals may need to receive an update so that people can generate those signature files. It needs to be embedded in the workflow of these cloud providers to use this as an onboarding mechanism. So to give you another example of the applicability of this, anybody can sign up for free in PeeringDB, either as an IXP or an ISP and PeeringDB be needs to somehow figure out whether you signing up for the service are actually a representative of the entity you proclaim to be. So in the case of an ISP, you sign up with the primary unique identifier being your AS number. In the case of an IXP, you sign up with the peering inaudible prefix being your unique identifier. And using this RSC technology PeeringDB no longer needs to send an email to the email addresses listed in the WHOIS record, hoping to somehow confirm that somebody was actually signing up to PeeringDB, but instead PeeringDB's onboarding system could present the user who is trying to sign up with this challenge and say, " Hey, if you claim to have some authorization or if you claim to be representative of a given AS, prove it. Sign this thing with a signature that I can associate with the AS number." And then PeeringDB's signup procedures can be fully automated and are cryptographically secured. And this is great news because PeeringDB offers a single sign on service, inaudible based which many organizations and applications integrated. So PeeringDB being a trustworthy source of where people can rely on the users being who they say they are or representing who they claim to represent is super beneficial.

Phil Gervasi: So Job, I want to interrupt you were involved with an ASPA deployment very recently, is that correct?

Job Snijders: Absolutely. I'm super proud of this.

Phil Gervasi: Yeah. Why don't you tell us that?

Job Snijders: Just last week at Calgary Internet Exchange in Alberta in Canada, we deployed the world's first ASPA filtering route servers. So ASPA is a super new technology, it's still an internet draft status at the IATF, but we're now somewhat close to the publication phase and that means that people got to pony up running codes, people have to do their final review. Is the specification working as intended? And with the production of running codes, you can only be so sure after you start using it yourself. It's like eat your own dog food. You got to walk the walk. So over the last few months in the OpenBSD project with the support of the Route Server Support Foundation, we've been super busy developing ASPA support in both the validator and the BGP daemon, OpenBGPD. And this development effort is very important from my perspective for the ITF standardization process. Because if you end up with specifications that look great on paper that are super hard to implement in real programs, the specification is not going to have a good time and you may need to redo the specification, which is a very time costly exercise. So in the ITF, in some working groups, IDR, which governs BGP, Insider OPS, which governs all things RPKI, there's an expectation that before things are published as RRC, before they become these super formal documents, people can demonstrate that they actually wrote this in software and that the specification and the behavior of the software are aligned with each other. So OpenBSD being a project originating in Calgary, Canada. It is very fitting that we took the bites and are the first to try and use ASPA verification in a real world, real internet deployment. And a funny, amazing thing to me is as a small open source project, it is pretty easy to get this far ahead and embrace new technologies and be the bleeding edge. But I think it might be two, maybe three years before you'll see ASPA support in commercial off the shelf vendors like Arista or Cisco or Juniper. And this is perfectly reasonable and easy to explain because being OpenBGPD, we just provide documentation in English. Being a global commercial off the shelf supplier, you got to translate your manuals into all these languages. You got to train the support staff all around the world about the new technology. And only after you've done all that work... Oh, and you've got to code the ASPA support itself, of course. And only after you've done all of that, you can release it to the world. So yeah, I'm happy we are the world's first and I think for two years we will not see that much traction in the commercial world. But I do believe that initiatives like this help pave the path for the commercial providers because now they have an open source implementation to compare their own implementation against. So yeah, it's world's first, but it's a bit lonely at the moment. And I hope that in the years to come, many vendors and many other internet exchanges and network providers will use ASPA to verify BGP announcements.

Phil Gervasi: And ultimately we've seen progress over the last few decades, though we almost talk about these... We've been focused on these three remediations and now ASPA and RSC as well. So yeah, I'm looking forward to seeing this continue to develop and keeping an eye on the contributions that you're making into the industry, into the community. So it's very much appreciated. We're at time now, so Job and Doug, discussion has been excellent, which is exactly what I expected. Very eye- opening as well in learning about some of the technologies that are being developed literally like last month. And especially consider that we're talking about global internet routing, something that affects most of the planet. So thank you for joining today and especially thank you to you Job, our special guest, and for sharing your knowledge, your vision, what you're working on right now with us. So for comments, questions, to learn more, Job, how can folks in our audience find you online?

Job Snijders: You can either email me, job @ fastly. com or find me on Mastodon, bsd. network/@ job or on Twitter, twitter. com/ jobsnijders or... I don't know. You'll find a way. And I'm happy to take questions or if people are curious about something I mentioned in this podcast, feel free to ask me questions. I'm happy to help folks along and try to move the needle forward a little bit.

Phil Gervasi: Great, thank you for that. And Doug, Kentik's resident director of internet analysis, how can folks find you online?

Doug Madory: Let's see. So I am on Twitter, I am newly on Mastodon in each place. My handle's Doug Madory. I haven't come up with any kind of creative, cute name. And then I'm also on LinkedIn.

Phil Gervasi: Great. And you can find me on Twitter @ network_phil. Still very active there. You can search my name on LinkedIn. And if you'd like to be a guest on Telemetry Now or if you have an idea for an episode that you'd like to share, please feel free to reach out to us at telemetrynow @kentik.com. And you can follow Telemetry Now on Twitter and LinkedIn as well. So for now, thanks for listening. Bye- bye.

About Telemetry Now

Do you dread forgetting to use the “add” command on a trunk port? Do you grit your teeth when the coffee maker isn't working, and everyone says, “It’s the network’s fault?” Do you like to blame DNS for everything because you know deep down, in the bottom of your heart, it probably is DNS?

Well, you're in the right place! Telemetry Now is the podcast for you!

Tune in and let the packets wash over you as host Phil Gervasi and his expert guests talk networking, network engineering and related careers, emerging technologies, and more.

We use cookies to deliver our services.
By using our website, you agree to the use of cookies as described in our Privacy Policy.