Kentik - Network Observability
More episodes
Telemetry Now  |  Season 1 - Episode 12  |  April 18, 2023

Securing Global Routing with RPKI and BGP Security

Play now

 

Historically, the internet has operated on a sort of trust relationship. Trust among those advertising their own networks out to the world and trust from those learning about those networks. But there hasn’t been much to prevent anyone from advertising incorrect information, and therefore manipulate where others send their traffic sometimes to the extent that traffic might even be blackholed causing outages. In this episode of Telemetry Now, Job Snijders, Principal Engineer at Fastly and a prolific contributor to efforts securing global routing, joins us to talk about how we can secure global internet routing with technologies such as ASPA, RPKI validation, Peerlock, and to an extent, BGP Security.

Transcript

Historically, the internet has operated on sort of a trust relationship. Trust among those advertising their own networks out to the world and learning how to reach remote networks.

But there really hasn't been that much to prevent anyone from advertising incorrect information out to the rest of the world, and therefore manipulate where others send their traffic sometimes to the extent that traffic might even be black hole causing major outages.

This problem isn't really new at all, whether that's incompetence, bad engineering, or nefarious activity, and efforts have been underway for quite some time to address this.

So with us today is Job Snyder, a subject matter expert on internet security. And frankly, that's very much an understatement. Jobe is a principal engineer at Fastley, co chair of the IETF Grow Working Group, co chair of the ripe working group, Vice President of Pering DB, Director of the Route server support Foundation volunteer for the IRRD V4 project, developer for the Open BSD project, and I'm sure several other roles that I've missed here in this extensive list as well. So suffice it to say that Job is a prolific contributor to the global routing community, and I'm very excited to have him with us today to talk about what's really wrong with routing security and what remediations are out there and are being developed to solve this problem.

I'm Phil Giovanni, and you're listening to telemetry now. Let's get started.

Job, thank you for joining us today. It really is a pleasure to have you.

Before we get going and, into this very deep and extensive topic here, I I would like our audience to get a little bit of background information on your work experience or technical experience and also how you've contributed to the community overall over the past few years, because I just in the introduction there, that list is pretty extensive of, what you're working on.

Thank you for having me. It's, always, a joy to to be in a conversation with, Doug, and and yourself.

You're asking how I got where I am.

It's a complicated question. Do any of us get to where we are, right?

I I think if if I go back to my twenties, I started as a system administrator and somehow, notice that the systems would sometimes be unavailable, and this was due to the network.

So chasing being the problem chaser that I am, I, I ended up learning more about how networks work and, got a job as a network engineer.

And then I noticed how tedious it all was to type things into routers and many mistakes we make, on the keyboards, I myself, have misoriginated prefixes because, you know, the niches on the keyboard are so close to each other.

So from there, I kind of rolled into network automation and and started programming systems that that control the network.

And and from there, I, at some point, Chuck jumped on on, onto the, latched on to NTT.

They were, a provider of the company I was working, before I joined NTT, and, and I always, very much liked, working with the NTT people.

So I I was there for a few years. And at NTT, I, spent a lot of my time trying to improve, routing security.

So I I took a look at all the BGP decision making processes that NTT applied to their, routing system.

I introduced RPI origination, managed, a full rewrite of the internet routing registry daemon IRD, that they use to generate, network configurations.

And, and then I joined fastlane because there was, a lot of cool stuff to do over there in, in the realm of routing thirteen.

And and so so that's, I guess, a bit of an resume overview.

But what I recognize in, in, some parts of my career is that, that I'll bump into some kind of issue and then try to find the root cause. And sometimes the root cause is, stuff like the IATF, not, being complete or or containing some kind of, annoyance or mistake, and then to to go up there and, and even fix things at that level.

So like years ago, I, noticed that more and more people were getting assigned four byte ASN.

And the whole world was using PHP communities, which are, a thirty two bit value, and, and sixteen bits of that are your ASN and the remaining sixteen bits are an arbitrary value that you can set yourself. So, obviously, you cannot fit, thirty two bits worth of information into a sixteen bit field, and I am efforts on a, a project, together with many others to, to introduce BGP, large communities.

So, yeah, that's, that's that's some of what I do. I, I look at the ecosystem. I try and identify gaps or shortcomings, be it in software or, or in the specifications themselves, and then I try to fix that in a way that is, beneficial to, to everyone.

Because ultimately, my, my employers benefit from a well working internet And, you know, if that's if that means that we have to boil the ocean to to make it work better, then I will go boil that ocean and make it work better.

Well, I certainly appreciate your effort, in the community because, well, I use the internet.

And from what I hear, it's kind of a trend now. A lot of people are into this internet things. No. No. But seriously, that, the the level of mission criticality of of internet connectivity, on the local and on the global level is such that, this topic, I think, is very, very relevant.

And then the nature of how the internet is built on these trust relationships by and large, just makes this so much more poignant. Especially as we are looking at volatility among nation states and and wars and things like that occurring.

I I do take personal offense to one of your statements where you said that you found that it was often the networks problem or the networks fault.

It's usually DNS is my experience. Or or the or the developers just, you know, writing their applications poorly. No.

I'm Or expired certificates.

Now you did mention that Doug is with us today. So, Doug, I I would love to give you an opportunity introduce yourself as well, Doug. I think you're no strangers, our audience, but if you wouldn't mind, maybe giving us a little background of your relationship working with, with Joe.

I see.

So we have some common interests. Clearly, Job is very interested in in BGP secure routing. That's been my focus for the last twelve plus years of, but in a totally different capacity. So I do a lot of analysis of trying to understand things that took place there's a little bit of storytelling there. There'll be an outage and we're trying to understand, you know, or a hijack or something who what happened here here and there.

So I'm a bit more oriented towards, just trying to, explore and understand real events as they're happening.

But that runs into, a lot of the stuff that Job's working on. And so I don't know, a number of years ago, I think, Joe, you had a a blog post. You had written.

And then I thought I saw something that didn't was inconsistent with that. Like, I was a routing leak.

That you had made a, I think, a claim about entity. And I was, well, I think this doesn't make sense with what you said, and you explained, like, what was the, there was some nuance there. And you know, it did make sense.

And then, but it it kind of started a conversation that's never stopped of just like, I'll find something, interesting that I think you might wanna be aware of, or maybe you've got some insight or maybe you don't, but I usually makes for a good conversation.

Because I you're you're not spending your day looking at events around the world. I am, and so, So I can bring these, to you. And then there's a there's a synergy between those two perspectives. And I feel like we've had a few times where we've been able to yeah, get some some benefit, for, you know, mutually beneficial thing. And then, you know, like all these all this work you're talking about, there's a lot of we talk about the internet as this abstract, thing, but this is many companies and peep, millions of people. There's a lot of positive downstream impacts that I think we abstract away when we, talk to you some of these terms.

But there's been a lot of, a lot of benefit to, A lot of people, a lot of the work that you've done, Joe. So, I, I really value our, relationship and our conversations. And, and then anytime, there's opportunity to collaborate.

I I jump on it.

As to I, doc, I, I think it is phenomenal that you, throughout your career have maintained access to to fast data sets, that that either offer insight into, what transpired or, or can help confirm, suspicions about, hey, in theory, this and this behavior might exist, but are you actually seeing it in the wild? So, yeah, it's, it's a lot of fun, bouncing ideas back and forth with you.

So, Joe, I'm gonna direct this question at you.

It's kind of broad, and maybe it would take multiple podcasts to answer. But what's wrong with internet security?

Fundamentally, I think the biggest issue is that we are used to internet routing as a plain text messaging system.

So I will pass on to you a message that you can reach a certain network via me and and vice versa, you will tell me, hey, you can reach the Caltech Network, FIME.

And Those messages are plain text means there's no signature. There's no cryptographic way for me to verify that you were off authorized to, send me that routing message.

And I think from there, a lot of the issues that we see stem.

We we have difficulty understanding the the authenticity of routing messages, So over the years, many, attempts or or, mediations, remediations have been, created to, to, to sort of limit the risk that this unsigned or plain text messaging back and forth between ISPs, to address that.

So for instance, providers would ask our customers, hey, you wanna purchase a transit circuit from me. Can you tell us the list of prefixes you intend to announce? And then the customer might fill in on the surface order form, a list of ten prefixes that they intend to announce, or maybe they store that information in the IRR.

Or or maybe elsewhere.

And then based on that information, the transit provider creates a filter to only permit those prefixes that were previously agreed upon, to mitigate the risk of the customer accidentally or maliciously announcing prefixes. They should not be announcing.

But as the internet drew, doing this manually through, through surface order forms that you fax to each other or, letters of agency, they are sometimes called.

Low ass.

This, this skills poorly, especially in the wholesale market.

So fully automated systems like the internet routing registry, became commonplace to generate filters.

But the internet routing registry really is a garbage in garbage out system. And, yes, it is automatable, but Again, there is no cryptographic signatures on any of that information.

It's transported in plain text. And that means that your you're receiving unsigned or, messages without signature from the customer that are the HP messages, and you're comparing those unsigned BGP messages to unsigned information from the internet routing registry, and then you try to arrive at some kind of conclusion whether you should accept the routes or not.

And yeah, this, this, this is a system that's been live for decades. It's growing organically.

As we go, we, we learn and adapt and, and develop new technologies.

Deploying new technologies easily takes up to a decade.

So, it is no surprise that we are in the situation that we are but luckily with the advent of RPKI, I believe, we are finally, an seeing some traction in the internet routing industry to, really, improve, both safety and security of, of the routing system.

Yeah. I do have two questions though based on what you said. The first is now I've I've configured BGP many times on the customer side peering with my provider, and, you know, you create whatever prefix lists and and filter and policies. And that's fine. But from what you explained, it sounds like it could be is it more of a problem between customers and their providers right at the edge or among transit providers? Or both?

Absolutely both. Yeah. And Okay. If I look at the last handful of years, it really took a lot of effort in, in the entire industry to move the mindset from we must accept as many prefixes as possible because that means we have a full routing table to sort of the opposite mindset where we where people say we must reject all suspicious prefixes, even if that means that our routing table becomes, smaller.

So it's, it's only been five, six years, that filtering on internet exchange route servers became commonplace because previously, internet exchange operators would say, it's not our role to do filtering. We are a neutral entity, and we just take those messages, unsigned messages, and we pass them around, and, and that's our job. And the more messages we pass around, the better of a job we are doing.

But, but then the, the customers of the internet exchanges had to teach the internet exchange operators It's very nice that you want to be neutral.

But I cannot possibly fit good enough filters on my router because there are so many BGP sessions behind your route server and so many prefixes coming in. That that it is impossible for me to, to do correct filtering, after the aggregation point that your route server presents.

And then slowly internet exchange operators began to see value in, like, oh, the more trustworthy my route server is, the more trustworthy or valuable my, my, surface offering to the customer assist, and I have here my customer's art, I have here, So this mindset shift, is is pretty recent. And, yeah, previously some some operators say, Hey, if we start filtering, we might lose visibility on, say, five percent of, of the routes passing through the route server.

And I would always be on the barricades arguing, but those five percent, that's it's bad information. Don't propagate it.

And I guess there is some analogy to, to, like, spam filtering in, in the really old days where originally you tried to deliver every possible email to, to everybody, And at some point, people are like, it's very nice, of you to try to deliver all these email messages, but, like, fifty percent of it is junk, and I don't want it in my mailbox, and it's your job to do the filtering.

And yeah, I think BGP routing went through, a sort of similar transformation, where people began to realize that, that the goal is not to pass around as much information as possible, including route links. The goal is to, to create a stable system, that, that we're wrong information does not propagate through the system.

Because wrong information invariably is latency increases or or unexpected traffic shifts or, even worse, traffic drops where traffic no longer arrives at the intended destination.

So I, I think as the years went by the industry really started to understand that these unsigned BGP messages better be of good quality. Otherwise, we're in trouble.

So, Joba, you know, I I started into space in, in two thousand and nine with Renesys and, And I feel like it's a good time to have started because at that point, like you said, you're talking about using the example of IXPs, but know, transit provider. Like, the state of routing hygiene, has improved. Let's put it that way, but but, but for those, you know, first years of, like, digging through events, there was some wildly terrible things happening. And I and that's part of, you know, I know you've got a message or you've got a, a, outlook on routing security. And I have one that I think probably have some similar themes, but the one I've had in the past year, some presentations, it's like, we still have a lot to do. And there's been now, like, these cryptocurrency hijacks and stuff. There's been still stuff is still still happening.

Having said that, we do have to take a moment and appreciate how we've how far we've come, we've, like, your example of the EXPs using route servers to to filter.

I think that's just one of many examples that all started to come around, you know, in the last Maybe it's five or six years like the timeline you mentioned, but, I think the number of I call them like bonehead errors, and we just used to have the spectrum of you know, one end is just some, you know, telecom Malaysia originating the whole internet or something. And then, and then you got the other end, the determined adversary, these folks that are, you know, going after cryptocurrency and doing really sophisticated stuff. And we'll, we'll get that. We would like to raise the raise the cost of those people on that end. But but at least you would you would hope we could eliminate the the bonehead end of that Spectrum. And I feel like we've moved the needle, and I, you know, when's the last massive origination leak that disrupted the internet It's it's been a while.

And I I think there's there's a lot of there's a lot of different people, doing a lot of different things, that they make that a reality, but there's been some improvement before we throw, you know, give throw up our hands on on all this.

That does lead me to my second question, though. Joe, you were talking about, it sounds like pretty much origination, verifying the author's authenticity of, prefixes that you're ingesting. And therefore, are you are you permitted? Are you allowed to be advertising these prefixes?

Kind of checking that and verifying that. Is that the only kind of problem here? Cause, I, I know that RPKI addresses that, but there are some other solutions out there address different problems. That we're seeing.

Yeah. Yeah. All of this is, is, you, you stack on top of each other multiple practices or multiple technologies to arrive at a, let's call it stable safe state of the, routing system.

So so if we look at, like, the last few years, internet exchanges fully embracing, route filtering, now using it as a unique selling point, to, to attract new business. That's, that's a fantastic development because previously internet exchange route surfers were by their very nature, they don't have global visibility. An internet exchange route surfer only concerns the pierce adjacent to that particular route server.

So there's the, the visibility into incidents, at that level is very different than say, something, leaking through a global transit provider like T. L. R. Or entity or level free.

So internet exchange, route surface, check.

But another development, was the popularization of a concept called peer lock.

For many, many years, a few internet providers, of substantial skill had arrived at the conclusion that if you're, say, NCT, cogent, level three, you should never ever see level three routes via cogens in the NCT network or any other permutation, thereof.

And, so it's, it's not just about authorizing what is expected in the global routing system, but also discussing amongst, providers, what is absolutely never expected to show up in, the global routing system.

So I think, Jared Mow had a, a route leak detector that that used, a few permutations of regular expressions to, to figure out if, three or four so called transit free networks would appear in the same AS path. And if that was the case, then that definitely was a route leak because the transit free networks are not supposed to provide transit for each other. And I took that idea. I was like, alright.

So we have monitoring Cool. And we got alerts. Nice. We're aware of these outages, but how about we try and solve them?

So during my NTT tenure, I, I talked to, to all those partners and I was like, Hey, can you implement filters to prevent accepting routes that contain entities' AS number anywhere in the AS path on all sessions except the ones with entity themselves.

And I think this also goes back to to sort of a mindset shift that had to happen.

Previously, sometimes people would say, well, if I properly announce routes to you, but they knew in turn, leak them. That's not my problem.

It's, it's your responsibility to not leak the routes that you receive from me. Nevertheless, I still suffer from that leak happening even though it's not my equipment or my configuration that allowed the leak to to spill.

So taking preventive measures, that that, take an effect outside the immediate administrative domain, of the internet service provider themselves, was was a huge step forward.

And I think nowadays, as amongst like the top ten providers in, in the global transit, markets, you'll see a lot of peer lock, configurations and, and this has a tremendous effect on the number of route links that we see nowadays compared to, say, five or ten years ago.

Tremendous effect meetings. Yes.

Yeah. Yeah. A pulse assist effect.

Yeah. I would, I I I would, characterize it as, I would agree with the positive effect. What it ends up doing is just suppressing it down. So if there is some sort of leak, it just can't go through Well, I would term the top of the internet where it gets really, widely propagated, widely circulated.

That's not possible. And so then, you know, these leaks end up just being localized. And that's, that's kind of in this space. That's kind of the best you can hope for is that, you were, any any problem that arises, we can't prevent all problems, but as long as they just stay, localized, then others don't get harmed by them.

Well, there is still an underlying trust relationship, both at the edge and with transit providers. So as much as we're talking about, we we mentioned Pierlock and and RPKI. There there are some technical, remediation that can be done, but a lot of it is administrative or some kind of a service layer on top that prevents, like you said, something that you can't prevent from propagating throughout the rest of the world. But I I and I want to get deeper into defining RPKI for our audience in peer lock and what those things are and what specifically they solve.

But you've mentioned two different problems here, the authentication of the origin of a prefix that I'm seeing, but also path validation. Those are two kind of separate things. Correct?

Actually, there's free separate problems. Okay.

Very good. That well, that's why you're here to, to to educate.

Sure.

So what most people nowadays refer to as RPKI, actually is RPI route origin validation.

The RPI is a globally distributed database, that whose integrity is protective with signatures.

And DRPKI is sort of a foundation on top of which we can build multiple applications that each somehow leverage or benefit, the RPCI's cryptographic properties.

So the RPKI, just that word by itself should be viewed as a, database, of of delegated authorizations.

The RER maintains, I don't know, like twenty percent of, of, of, internet resources, and their job is to ensure that those resources be it IP address prefixes or, AS numbers, autonomous system numbers, are delegated to, to, to ISPs, who in turn may further delegate those to to their customers.

So with the RPI, we we have a a system where we can figure out who is authorized for to do things where I'm, I'm purposefully leaving things that fake for now.

With what's internet number resources.

Now, fast forward a little bit. Then there is RPI route origin validation. And that is the first application built on top of this RPTI foundation.

And route origin validation the mechanism is as following.

I can publish in the RPTI that a given IP prefix may be originated by a given AS number.

And then consumers of that information can use that information to compare the BGP updates that they receive to that cryptographically, cryptographically forifiable information stored in the RPCI.

So, if your prefix is, ten slash eight, and you announce in BGP ten slash eight, towards me originating from AS.

Six thousand, sixty five thousand one hundred twenty three. It's a price and number.

Then then I can check whether the the information in the PHP update matches the information that I, learned through to RPA.

And if there's a mismatch, then I know that your BHP announcement has an issue And therefore, I should reject or ignore, your BGP announcements. And in doing so, I ensure that the the safety of my routing system is, maintained.

So route origin validation is, you, you take untrusted input, HP updates, and you compare that to an out of band distributed tripsographically verifiable, database.

But that doesn't solve all issues because, you may spoof the, the, the origin in your BGP message, or there might be something like a route link in which you are redistributing parts or or the entire routing table to me, you're not supposed to do that, but nevertheless you are doing that for some reason, it could be a misconfiguration, it could be a software bucket, your router, and then if I apply region validation, a lot of those announcements will look squeaky clean because you're doing a route leak, you're you're not modifying the origin, you're you're just passing on these messages. Even though you didn't intend to pass on those messages.

So for that problem, the, the problem of route links, what started out as peer lock. So this mechanism of, hey, I should never see routes that contain Cogent's ASN behind level three peering sessions or vice versa.

Peer log is not a democratic approach to this problem. Pier log requires that you, you pick up the phone, that you have social relation ships with with the people managing those other large networks.

So it's it's not accessible to everybody.

And and there are eighty five thousand ish autonomous systems in the global routing system.

So we definitely need a solution that does not require those eighty five thousand organizations emailing each and every one of the eighty five thousand organizations to establish what routes are supposed to to go where.

So to to democratize, the solution to to the problem of route links, Alexander Asimov, and, and some others, came up with an idea that is called ASPA.

Autonomous system provider authorizations.

And what's really cool about ASPA is that it leverages this RPI database, this distributed database of, authorization, delegations, and the the ASPA technology is such that you, can publish in this database who your providers are. And then consumers of that data, can verify or compare given PHP updates to through through the origination validation trick to see if the origin is is matching up and also use that information, of, of the list of providers and compare that to the ASNs that appear in the AS path, and from that, the duct whether a route leak is happening, We're not.

Because route leaks are, business problems. It's it's from a protocol perspective, perfectly valid to, to leak routes. I mean, that's basically, a full transit surface is is an authorized intended leak of of the entire routing table.

And and then, the first application that builds on top of, the RPCI, has to do with the authenticity of the PHP messages.

Because even with origin validation, and even with AsPA, the BeachP message that you sent me, is unsight, it doesn't contain a cryptographic signature. So for all I know is that you may be spoofing an AS path that you're fabricating the information in the SPA, such that it complies with the origin validation check such that it complies with the Aspa verification check, but still is not supposed to be there because you're not, who you say you are. And for that, a solution exists called BGP sec.

And in BGP sec, you, you stick, signatures inside the PHP messages that can be verified, using public keys distributed through the RPI.

And to figure out which public key belongs to which AS, again, the RPI system of, delegated authorizations is used.

So you can never ever publish a public key and associate it with an AS number that does not belong to you. So that is the holy trifecta that we need to really secure, the whole global routing system. And I think we're like halfway through this.

Couple of things. I, it's a good summation of, the the, I think, the state or the, you know, the the the plan.

I guess, you know, well, so you just mentioned B2B six. So we mentioned that one first.

So the the I'll just go ahead and say like there's there's been a believer pushback on that that this is a technology that's to taxing on a router to be able to, handle, the verification, the cryptographic verification for messages as they come at line speed. And that, that is gonna be the death of that technology. And, so setting you up and like, so what's the what's the response?

Because I know I hear it. I know you do.

What's the response for that?

It it is a really good and fair question.

Because we've not, as of yet, twenty twenty three seen a lot of PHPSec uptake So what gifts? What's what's wrong with this technology?

So I have a few theories and some positive outlook in in this regard.

For PHP sec to exist, the RPI global distributed database of authorizations first had to exist because it's you built one application on top of, of the other thing.

And I think the original designers of, of RPI technology were super clever, or maybe this, you know, just happened by accident to first focus efforts on origin validation because out of all the technological aspects, Reaching validation, arguably, is the simplest one.

You, you, you, you fetch the information from the RPI, you do the signature verification, And, but push comes to shove what what you do on the router is you compare a few integers with each other. You, you compare the integers that you received out of the RPI system with some infotures that you received out of the BGP update.

And getting to that state already was a tremendous effort because all five RERs had to create, software and deploy software that facilitated the local internet registries to to publish, roas. Those are used for origin validation.

And then ISPs had to start using those roas, and that's I think it's super novel, development. It was only, at the start of twenty twenty, the, that the large, tier one provider started using RPI origin validation as part of their, defensive posture.

So I would say origin validation is super young. We've only been using it for, like, three years now at truly global scale in, in hundreds of networks.

And to get there, that that took, I don't know, ten years.

I I will add, jokes. I know you're too humble to, to take any credit, but This was in no small part to a lot of your advocacy, and traveling to many, many, nanogs and Africa, slack nicks, right, getting a lot of talks, and making the case. So there was, you know, you deserve some credit on that, but obviously there's a lot of people involved as well. Sure.

Much appreciated. I I I take credits for eventualizing the technology, but I, I didn't invent it. It, it was there, and I was looking at it. I was like, holy shit. This addresses a really big problem that we have. We I I need to tell everybody that there is a solution to this problem.

So that's So that's ROV. And then I think you're you're bringing us up to the the the case for B2B sect.

So I think BeachP sec in the IATF, was a little bit ahead of its time.

They, they had finished the specifications for route origin validation, published the RRCs, and sort of, you know, pronounced across the wall.

So, you know, we, we are done. On onwards to the next problem, path elevation.

But they had not waited, until the world had deployed origin validation and then starts the path validation development, process.

They, they did things quite quick after each other. And I, I don't fault them.

But, but this means that when the Beach P sec RCs got published, So I think this was, like, twenty seventeen, twenty eighteen, the world had not yet even deployed original validation that came in twenty twenty.

So the speech basic standards was laying around, and the original validation standard was also laying around in, like, twenty nineteen. And neither of them really deployed at scale, so so that I think was a big obstacle for, for Beach Psec. And then because then a few years later, people look at Beach Psec, and they're like, nobody deployed this. So it must be trash. But the same apply to rigid validation that that lingered around having been published as an RRC, and it took like eight years for it to really see some uptake in, in, the, the, the real world internet.

Sir, are you saying that it's it's a it's a it's a, the the specification is just too young, to, is that I I think the specification was, published in a time frame where people were not yet receptive to what that specification could mean in real world operations.

And there's a few factors that tie into this.

Yes, there there people have expressed concerns about computational costs of of this BGP sec mechanism, and they're not wrong. But luckily, CPUs in inside our routers, get upgraded over time Like, every five, six years, most operators will replace virtually all gear in their deployment.

And every time you do a refresh of the equipment, that you deployed in the field, you you get more ran faster or better disk and better CPU. Maybe a CPU with heartware acceleration for cryptographic operations.

So, so as time goes by, the hardware of which we're supposed to run this machinery is getting beefier and beefier to the point that it's actually feasible to to do, inline signing and verification of, of BeachP messages.

But when BeachP sec specification was was being developed and and published. I think a lot of people were looking at their currently deployed hardware And we're like, no way this ever is gonna fly, not realizing that maybe five years from now, the hardware might be perfectly suitable to do it.

So so there's a bit of, tension between when technologies exist on paper and when we can actually use them in the fields in commonly deployed machinery.

So so that's one aspect. CPUs are getting better.

Another cool aspect of BGP sec, is that you don't need to do it on each and every BGP session.

BeachP sec protects the integrity of your BGP session, and it's worth spending resources on the BeachP sessions that matter most to you. So if a particular, BeachP session is revenue generating for you, that's worth protecting. But if the session is, is, for instance, a gateway of last resorts, then maybe it's not worth protecting it with each Psec because you literally are sending packets down that path only because there's no path, no other path available, and, and therefore, it doesn't really matter. So to to provide some real context with this, in the case of FASLite, our private peering connections are the valuable connections. Those are super high bandwidth.

They, they move lots and lots of bits. But generally speaking, the PGP states on both sides of those high bandwidth connections are pretty small compared to the entire global routing table that we receive from our transit providers.

So if there's a private peering between Fastley and and some, residential, internet service provider, back and forth, we might be exchanging a few hundreds or only a few thousand, routes and signing and verifying a few hundred or a few thousand routes arguably takes way less CPU cycles than signing a verifying, a million routes that I'm receiving on the transit connections.

But the transit connections have a lower local preference because those are the gateway of last resort that means passing the packets on to, a network that acts as an intermediate.

And whenever you can cut out the middleman, usually from either a latency or capacity or economic active. It's best to to to to create short paths. So I I think there are some interesting, positive, interactions between the economics of how the internet works at large and that people, are invested to protect the BGP sessions that, are worth most. And that generally speaking, those valuable BGP sessions are responsible for the m vast, vast majority of internet traffic that is being exchanged and coincidentally represent the least amount of each piece state on both sides of the connection.

In other words, yes. Beach Psec has computational costs, and I think, it is feasible that we'll see in, in years to come at global scale, that people will opt to, to protect, small BGP sessions that represent large amounts of traffic.

And there's the, the matter of the, the ecosystem being ready in terms of, software capabilities.

I'm in in two thousand eighteen, two thousand nineteen. I started with, an RPCI validator project, inside Open BSD called RPI clients.

And and the first thing we developed was the ability to validate Roas in order to facilitate origin validation, and then later on, I added the capability to verify each Psec router keys and a few months ago, I added, support to verify ASPA objects.

And and as time goes by, these new capabilities have to find their way into all components of these pipelines because Not only does my validator need to support Aspa or origin validation or or DGP sec, But the RER web interface where you configure rollouts or EspaaS or BGP set keys also must support all free.

And my router also has to support all three.

And that can take years. I, I think origin validation is is pretty new in in global deployment. We, we, we, I think we now found most of the bugs in the Beach P stacks.

AsPA verification is is in full development, the Open Beach PD project, is is has been working hard on that the last few months. And then next up is speech Pasek. And I think once ulfree are available to operators that they can actually start testing if it works for them. If it gives them benefit maybe in partial deployments, then we, we have the final verdict on whether a technology, was, was, a misfire or super beneficial, but it, unfortunately, took, you know, ten, fifty years for it to get deployed.

And and finally, we've seen the web transform from an HTTP plain text only, system to, to something that is now, we're, we're close to one hundred percent of, of HTTP traffic runs over TLS protected sessions.

And I think in, in the old days, people would argue, oh, TLS on the web server is super expensive doing cryptography, with with each and every, web server client is is crazy.

But as years went by, people were like, oh, it is worth my while to heavily invest in sufficient CPU cycles in order to protect the HTTP connections to my customers because the cost of dealing, with, with network abuse is higher than the cost of just throwing more CPU power at the problem. And maybe we'll, we'll see a similar development in the PHP worlds where Some people realize that to their business, the additional cost of the the CPU cycles is worth protecting certain BGP sessions.

And as I said, I don't think BGP sec needs to happen on each and every session.

I think, well, especially in the beginning, see that people deployed on, in a limited context, on, on private peering sessions, or sessions facing their, their, customers, sessions that represent revenue, and therefore, it can be justified to you know, throw the the extra CPU power at it to to make it secure.

To me, it does feel like RPKI is almost like a foundational, stepping stone, to much of the, the, the more advanced or subsequent, mechanisms that we're using. Would you say? That's correct?

Yeah. Absolutely.

The the the RPCI really is a general purpose infrastructure to to verify whether someone was was authorized to do something with an AS number or an IP address.

And, so so it's really important that that going forward, we do not consider RPKI synonymous to route origin felonation route origin validation was just the first simplest application we could come up with that, had a dependency on, on the RPI.

So in the realm of new innovation, last year, our team was published, that specifies a thing called RPI science checklists.

And what that allows people to do is to produce, A cryptographically cryptographic verifiable signature over an arbitrary hash that you can verify with the deployed's, RPCI. So For instance, if, like, let's talk about bring your own IP space and, and cloud providers.

If, if if I want the likes of Amazon or Google to originate my prefix, a prefix that was assigned to me, Job Snyder, but I don't want to run my own infrastructure. I want to use their cloud infrastructure and have them originate the prefix.

So I don't need to invest in in running a router.

Cool.

So I I sign up with with one of these cloud providers, and they say thank you for trying to unload please create a role that authorizes our autonomous system to originate your prefix on your behalf.

Now, if I create a roa that authorizes, either Google's ASN one five one six nine or Amazon one six five nine.

Then at that stage of the onboarding process, Amazon doesn't actually know whether I created the roller or coincidentally someone else who is also trying to unload created that rollout. And it could happen that two entities both create an account in, in the cloud providers, management portal and both claim to own the same prefix.

So At that stage, because the robot exists, Amazon or Google, does know they are authorized to originate the prefix but they don't know which of their customers the prefix actually belongs to.

And over the years, a number of workarounds or hacks, have, have, have become part of, of these onboarding procedures. So I believe Google would, would ask you to, they would present you with, a random string and then ask you to put that random string in the who is record for, the prefix, and then Google would start scraping that who is entry and then they would see that random prefix pop up, then they knew which customer accounts the prefix actually belongs to. But unfortunately, this is, a pollution of the Ruis because the business between Me and Google is solely between me and Google and the who is is a public resource. So it's it's a bit dirty to to put that private business interaction in the public who is, and who is is, transported in plain text It's, it's not a cryptographically secure channel.

So there's a lot of friction in those onboarding procedures. And, the mechanism that we came up with is that the cloud provider can tell the customer a random stream the customer can produce a tiny file, an RSC file, that is a cryptographic signature over that string, and that file can be sent to the cloud provider, and then the cloud provider can verify that file against the existing globally deployed RPCI.

So if they if I ask them to originate, an IP prefix, I will create a ROA that authorizes the cloud provider's ASN. The cloud provider will tell me as a random string, I produce an object or a file using my RPTI, keys to, to sign it, send it to the cloud provider via a email or a web form upload, And then the cloud provider can verify against the RPI, database of, delegated authorizations whether I indeed possess the private keys associated with a given IP prefix.

So once those two steps happened, the cloud provider knows two things, a, they are authorized to originate the prefix, anybody can see this because it's published in the global RPI as a roa. And, b, which of their customers the prefix is to be associated with.

And with with those two components, onboarding can happen in a fully automated way in a fully, secure way, and and that's, that's and that happens, in a in a private interaction because that second step, the RSC, the, is only between me and the cloud provider. It's not published in the public who is. It's it's It's not, you know, I don't have to put it on a website.

It's a file that exists and, and only myself and, and the cloud provider, know of the existence of that file, and its contents.

So that's I I so this is super, super new. The RC was published only a few months ago, and, and here again, we have to to wait for the ecosystem to, to catch up, the the validators need to be, extended to support validating such, offline signatures.

The RER portals may need to receive an update, so that people can generate those signature files.

It needs to be embedded in the workflow of these cloud providers to, to use this as an onboarding mechanism.

So to to give you another example of of the applicability of this, anybody can sign up for free in peeringly be to either as an ISP or an ISP, and appearing to be needs to somehow figure out whether you signing up for the service are actually a representative of the entity you proclaim to be.

So in the case of an ISP, you you sign up with the primary unique identifier being your AS number. In the case of an IXP, you sign out with the pairing land prefix being your, unique, identifier.

And using this RSC technology, appearing to be no longer needs to send an email to The email address is listed in the who is record hoping to somehow confirm that somebody was actually signing up to hearing the be, But instead appearing to be, appearing to be's onboarding system could present the the the user who is trying to sign up with with this challenge and say, Hey, if you claim to have, some, some authorization or if you claim to be representative of, a given AS, prove it, sign, sign this thing with a signature that I can associate with the AS number.

And, then then appearing the beast sign up procedures can be fully automated, and, and are, cryptographically secured. And this is great news because peering to be offers a single sign up, sir, surface, sorry, a single sign on surface, o auth based, which many organizations and applications integrated.

So that the trust peer to be being a trustworthy source of, where where people can rely on, the users being who they say they are or representing who they claim to represent is is super beneficial.

So, Joe, if I wanna interrupt, you were involved with, an Aspa deployment very recently. Is that correct?

Absolutely.

I'm super proud of this. Yeah. Yeah. Well, just last week at the Calgary internet exchange in, Alberta and in Canada, we deployed the world's first SPA filtering route servers.

So SPA is the super new technology It's still in internet traffic status at the IATF.

But we're now somewhat close to, to the publication phase. And that means that people got a pony up running codes. People have to, you know, do their final review. It's this is the specification working as intended?

And with the production of running codes, you, you can only be so sure, after you start using it yourself. It's it's like eat your own dog food. Right? You you You gotta walk to walk.

So, over the last few months, in, in the Open BSD project, with the support of the, the route server support foundation. We've been super busy, developing SPA support in both the validator and the beach PDman, open beach PD.

And, the this development effort is very important from my perspective for the ITS standardization process.

Cause if you end up with specifications that look great on paper, are super hard to implement in real programs.

The specification is, is not gonna have a good time and you may need to redo the specification.

Which is a very time costly, exercise.

So in the ITF, in, in some working groups, CIDR, which governs BGP, and CIDR ops, which governs all things, RPCI.

There's an expectation that before things are published as RRC before they become these super formal documents, people can demonstrate that they actually wrote this in software and that the specification and the behavior of the software are aligned with each other. So open BSD being a, a project originating in, in, in Calgary, Canada, it is very fitting that, that we took the the bites and, are are the first to, to try and use SPA verification in a real world, real internet deployments.

And that one of a funny amazing thing to me is, like, as a small open source project, it is pretty easy to to get this far ahead and and embrace, new technologies and, be the bleeding edge.

But I, I think it might be, like, two, maybe three years, before you'll see SPA support in commercial off the shelf, vendors, like Aristile or Cisco or juniper.

And this is perfectly reasonable and easy to explain because you know, being Open BeachPD, we just provide documentation in English, being a global commercial off the shelf, supplier, you, you gotta translate your manuals into all these languages. You, you gotta train, the support staff that or around the world about the new technology.

And only after you've done all that work, oh, and you gotta code the ASPA support itself, of course. And only after you've done all of that, you can release it to the world.

So so, yeah, I'm I'm happy we are the world's first. And I I think for, like, two years, we we will not see that much traction in the commercial world.

But I do believe that initiatives like this help pave the path for the commercial providers because now they have implementation open source implementation to compare their own, implementation, against.

So Yeah. It's it's a world's first, but it's it's a bit lonely at the moment. And I I hope that in the years to come, many vendors and many other internet exchanges and network providers, will, will use SPAP, to verify, HP announcements.

And ultimately, we've seen progress over the last few decades, though we we we almost talk about these. We've been focused on these, these three, remediations and now, Aspa and RSE as well. So, yeah, I'm looking forward to seeing this continue to develop and keeping an eye on the contributions you're making into the, into the industry and to the community. So it's very much appreciated. But I do, we're at time now. So Job and Doug, discussion has been excellent.

Which is exactly what I expected. Very eye opening as well and learning about some of the technologies that are being developed literally, like, last month. So Oh, and especially consider that we're talking about global internet routing, something that affects most of the planet. Right? So thank you for joining today, and especially thank you to you, Job, our special guest and for sharing your knowledge, your vision, what you're working on right now with us. So for comments, questions to learn more, Joe, how can folks in our audience find you online?

You can either email me joep at fastly dot com or, find me on masterdom, b s d dot network slash at sign Job or on Twitter.

Twitter dot com slash job CIDR or, I don't know. You'll you'll you'll find a way. And I'm I'm I'm happy to take questions or, or, or, you know, if if people are are curious about something I mentioned in this podcast. You know, feel free to ask me questions. I'm I'm happy to help folks along and try to move the needle, forwards a little bit.

Thank you for that. And, Doug, Kentic's resident director of internet analysis. How can folks find you online?

Let's see. So I am on Twitter. I am newly on mastodon, in each place, my handles Doug Madori. I haven't come up with any kind of creative cute name.

And then, I'm also on LinkedIn.

Great. And you can find me on Twitter at network underscore fill. Still very active there. You could search my name on LinkedIn.

And if you'd like to be a guest on Salimetry now, or if you have an idea for an episode that you'd like share, please feel free to reach out to us at telemetry now at kentech dot com. And you can follow telemetry now on Twitter and LinkedIn as well. So for now, thanks for listening. Bye bye.

About Telemetry Now

Do you dread forgetting to use the “add” command on a trunk port? Do you grit your teeth when the coffee maker isn't working, and everyone says, “It’s the network’s fault?” Do you like to blame DNS for everything because you know deep down, in the bottom of your heart, it probably is DNS? Well, you're in the right place! Telemetry Now is the podcast for you! Tune in and let the packets wash over you as host Phil Gervasi and his expert guests talk networking, network engineering and related careers, emerging technologies, and more.
We use cookies to deliver our services.
By using our website, you agree to the use of cookies as described in our Privacy Policy.