Kentik - Network Observability
More episodes
Telemetry Now  |  Season 1 - Episode 6  |  January 24, 2023

A Year in Review of Internet Analysis - with Doug Madory

Play now

 

Between volcanoes erupting, misconfigurations, and nations purposely shutting down the internet to stop a protest, 2022 was a busy year for network outages. In this episode, Doug Madory, Kentik's Director of Internet Analysis, joins us to talk about some of the highlights of 2022 and also discuss some of the more common reasons we see large-scale network outages in the first place.


Key Takeaways:

  • [00:00 - 03:05] Introduction to Doug Madory, Director of Internet Analysis at Kentik
  • [03:08 - 06:52] An eruption in Tonga that wiped out its undersea hardline
  • [06:53 - 08:34] The Egyptian government and an Internet chokepoint
  • [08:36 - 12:52] The ins and outs of submarine Internet cables
  • [12:53 - 18:01] The move towards satellite connections on a global scale
  • [18:02 - 23:18] The Rogers outage
  • [23:20 - 26:16] The ripple effect, and a lesson in humility
  • [26:17 - 28:06] 2021, the year of learning and Internet giants falling to their knees
  • [28:07 - 31:29] Decentralization and outages as the flipside of success
  • [31:30 - 34:42] Providers and regulations
  • [34:41 - 40:21] National governments, and shutting down flows of information
  • [40:25 - 46:14] Hijacking and routing security in relation to ransomware
  • [46:14 - 50:21] Spaghetti Westerns and interacting with other countries to keep the peace
  • [50:27 - 52:10] A determined adversary

Transcript

I recently read Doug Maduro's blog post a year in internet analysis twenty twenty two, which was a great overview of all the major events on the internet sort of at a global scale, I guess, for last year.

And it naturally reminded me of major internet events in previous years too, just because of the content he wrote about.

And though I found that very interesting all on its own, I did start to notice a theme among all of these events. The ones in Doug's blog post, of course, but also as I think back over years past.

The biggest events, the biggest global scale internet disruption, probably most memorable for me at least that I can think of seem to be caused by only a few things.

Namely natural disasters, human error, and lately intentional outages caused by national governments for whatever reason.

I really can't remember any major global scale outages that were caused by an SFP going bad or a router U being pegged or or something like that. Maybe a core switch hardware just failing.

Now from experience as a network engineer, I know those things do happen.

But when I think about those huge global scale disruptions, it seems like hardware going down or otherwise good solid code just breaking for no reason. It really doesn't happen that often. At least not often enough and at the scale that affects huge parts of the world, if not the whole world.

So today with me, I have Doug Madori. Director of internet analysis at Kentech to talk about his recent blog post a year in internet analysis in twenty twenty two. Let's get started.

So, Doug, it's great to have you today. It's been a it's been a while since you and I have done something together collaboratively, so it's good to see you. While our audience can't really see you, this is mostly an audio only podcast, but it is good to see you. And I read your blog post recently, right?

A year in internet analysis twenty twenty two. And and I I wanna ask you as we start to get into this. You you heard my intro. Do you agree with me that there is this kind of theme among internet outages, not just in twenty twenty two, but in previous years, that they all seem to be a result of those kind of three main pillars natural disasters, undersea cables being disrupted by a hurricane or a volcano or something like in your blog post.

Human error, you know, configuration problems, people configuring BGP incorrectly, and and you mentioned that in your blog post. And then what I really was interested in in also toward the end of your blog post, the idea that there are countries that are now intentionally disrupting internet service for, the folks in their countries being a cause for a major outage.

Do you see that as kind of a main theme or am I reading too much into it? What are what are your thoughts?

That sounds like, three good categories that covers a lot. I'm not I can't think about a counter example that doesn't fit into one of those, but yeah, I'd agree with that breakdown.

Yeah. I mean, you you started off by talking about the eruption near Tonga.

And, in your blog post, you have that, that graphic image. It's so interesting to watch, but that was a volcano exploding as volcanoes do, right, from time to time. Taking out an undersea cable and therein lies, you know, major disruptions for the nation island of of Tonga. And I assume that region.

I don't know. I have to I have to look into it more. But, why is it that the it doesn't seem like devices themselves like hardware bad code really causes a lot of the disruptions. Do you think that's because the internet is that resilient that so good at right, you know, creating hardware.

All the major vendors or or is it that we're just yearning about those because they're not as sexy.

Well, hang on. So I wouldn't I wouldn't go so far to say bad code doesn't lead to outages. In the Tonga example, there was no bad code was not the not the issue.

On that one, you know, in the in my blog post I kind of went into. I I I was, years ago, my time at, Renesys and then Dine Research, I had a, a real interest in trying to identify submarine cable activations and, cause they were kind of interesting. And then the Tonga one I had spotted back in, like, twenty thirteen, I think it was, a number of years ago. This was a case where, for Pacific Island nations, it's very hard to get them, to the, connected to the global internet. They're relying on satellite.

Satellite's very expensive.

Per megabit, especially given it's especially bad in the South Pacific because the way that the business of satellite service works is that you kind of get to divide the costs by the customer base, but in the Pacific Ocean, a very large, piece of real estate that has very few, customers. So your divide your denominator is pretty small. So then I remember attending a submarine cable conference, or speaking at one in twenty thirteen. And someone was talking about the same, you know, at the time, the wholesale bandwidth costs in North America are, I don't know if you even know if these figures we use these figures anymore, but it's like a dollar a meg a month.

You know, like there's always some sort of figure of what's the wholesale, a bandwidth, cost. It's probably like a penny or something now these days. I don't know. But in the South Pacific, it was on the order of a thousand.

So it was like a thousand times the cost.

And you're living on capacity, high latency, like all these other problems and you're paying a thousand times more. So, it's it was an act of humanity to, as a humanitarian, gift to Tonga that the Asian Development Bank and UN put together the money to put this cable, to try to modernized society of this country. And then I was following that story. And then, you know, part of my interest was there'd be a a press release about a submarine cable act activation, and then I was just curious to see, I would see it. We would see it in our Internet connectivity data.

When when did this thing actually start carrying traffic because those are two different dates the cable may be ready to go. Maybe they there's a lie in the press release. It's it it really did happen, but then there's a moment where it's actually carrying traffic. So we spotted that. So I had a little bit of history on Tonga and with Tehertz. I remember when I saw this, I was like, I remember Tango. Like we we talked about this years ago.

And then, yeah, so then they were at that point had turned down all their satellite They're completely relying on the submarine cable.

And and they had the restore. I think whatever residual satellite antennas were knocked out by the aftermath of the uh-uh that blast from the Undersea volcano.

Yeah. But it wasn't just, that undersea cable that caused major issues last year. I mean, you talked about the issue in Egypt though I don't think that was caused by a natural disaster like a volcano or something. I don't know if there's any volcanoes. No.

I don't know that we got a explanation on that one. I know So I wrote up a blog post on that one as well, and I that's, I know, Egypt as a choke point, and the global internet is a perennial theme certainly in the submarine cable space, of trying to cope with the alternative path.

It's a you know, the Egyptian, government makes a lot of money off of that, two point as they do the suez canal, the the internet as well, you pay to cross your cables, through, that space. But, occasionally there are terrestrial outages. They try to build a lot of redundant overland links to connect the cables that go from the mediterranean to the Red Sea, but every once in a while, there's a there's a an outage. I remember one a number of years ago I mentioned in the blog post was, we saw one. It looked a little like this one. There was hours long outage.

And at the time, I had a, a great contact in telecom Egypt two managers, manage the submarine or the fiber optics, the overland circuits. And I was like, what happened here? Look, obviously, I saw an outage, and he's like, oh my god. He's, you're not gonna believe it. Like, we had people light fire to one of the, like CEOs, the, trying to get copper out of the lines, not knowing these are all fiber optics, and there wasn't much copper to be had, so they just burned this down. And then they were doing so they had to at least at least if it's on land it's a mat, you know, fixing this stuff usually isn't a matter of hours. If it's under the sea, it could be days or weeks depending on where it is.

Well, I mean, I started off by mentioning natural disasters and we talked about the, the volcano eruption near near Tonga, but we're we're we're kind of focused more on undersea cables now. And you did make the comment that there are very few undersea cable, connections specifically in the Pacific.

Is is that pervasive throughout the world? Or do or is that an is is that a very robust method for moving data between continents and among continents because I'm I'm wondering, some of those cables also have to be very old. Correct?

Yeah. So, we'll see. There's a couple of things there. One is, you know, if you look at the submarine submarine cable map, dot com.

I know you've seen this. Yeah. That's a pretty cool thing put up by Telegeography. It's a good reference.

I've got one of these printed on my my dining room table or dining room wall. But, show what kind of nerd I am. But, you know, these things follow follow the same, you know, maritime trade routes. The people have been following ships forever. So the the highly trafficked paths, you know, between, across the Atlantic, across the Mediterranean, South Asia around, you know, afaris. These line those lines are there's lots of cables. There's lots of redundancy.

You know, the only risk is that their ships also going those same paths. They made set an anchor down or dragon anchor hit a cable. In if you were to look at like marine traffic dot com, you compare the two. They're gonna look very of just like where are the cables and where are the ships?

They're going to the same places. And they're also gonna show that they're not there's no ships or there are very few ships in the South Pacific. Very few cables in South Pacific because it's very a a cable is an expensive endeavor. It is, millions of dollars, you at least a hundred million, you know, on on words to a billion depending on, the length and the complexity.

And The ROI, this is another thing that gets talked about at submarine cable conferences usually is, there's a lot of, investors and business people trying to understand the business case around the and how to mitigate the risks and maximize the ROI because you someone has to raise a lot of money and and much money you're gonna make off this? It's not a ton, but you also have this risk if it breaks. You're getting no money and you have to pay to fix it. This is why all the made like Google and Amazon and Facebook have gotten into the submarine cable business because their their business is just different.

They don't have to make money off the cable.

If it serves their greater business, it's good. And so now they're kind of, you're driving that industry.

But anyway, but so so I just won. I'll I'll let you I'll let you to talk about the, in the, in this episode, one other interesting thing that, happens. It's not that often.

But you mentioned that some of these cables are old. So as a cable gets, old, someone had come up with an idea, a while back of and I it's amazing this has actually happened, but, so the cables, they they they'll pull up a cable off the seabed I can only imagine all the life forms, you know, barnacles and things that have attached to this thing as they're pulling this up onto the ship, and then relaying it somewhere else.

And, you know, in the total cost, I mentioned these figures of either hundreds of millions of dollars like half of that or more or, you know, it's more than half. It's it's the vast majority of that is the fabrication of the cable, is the and then there's the installation is the way it's termed in the in the industry of like actually putting it in the water is the installation.

The, the fabrication of cable is the most expensive part. And, if you can pull one off the ocean that at at some at some cost there.

But then so there's been a couple of cables that have been relayed as quote unquote donor cables is the term and and so I forget the one that would so that a cable may be no longer it was this was in the area of Australia, South Pacific.

It no longer served its purposes. It didn't have the capacity to handle a major route But if but it'd be plenty to hook up a smaller island nation, as far as capacity goes. And so you could re reuse the cable. And so this has happened a couple of times. It's just Again, a pretty mind blowing thing that this takes place.

Yeah, it is. That's pretty neat. Do you think that the, the movement toward, more ubiquitous satellite connections and connectivity, on a global scale is going to solve some of the inherent, danger that, you know, undersea cables being dragged up by by ship anchors and natural disasters and never and all that kind of stuff.

Yeah, I guess it could. It could, to some extent. I mean, satellites are never going to have the capacity uh-uh fiber optic cable.

So that we know is never gonna be the case.

You know, the other issue I mentioned earlier with Tonga or other countries that are relying on bulk satellite service, latency is an issue. If it's geostationary satellite than just due to the laws of physics, it takes a certain amount of time for light to travel to outer space and back and it can't be shorter than a four hundred four hundred and eighty millisecond round trip. It's probably gonna be namely more.

Yep. Right.

With so then there was, o three b came out with the, the first EMEO, medium earth orbit, and so these are, closer satellites.

There's more of them, and then, it gets more complicated on the ground because you have to now track, satellites as they're crossing through the sky. Because they're not hand off.

Correct. Yeah. This is this is a medium, medium, Earth orbit food. So do you stationary? You can have one dish pointing in one place, and then let it leave it. And so it's real simple.

How far away is geostationary?

I don't know. Just looking up. I don't have those numbers.

I'm I'm Google made as you talk.

Yeah. I'm sure. There's some great diagrams. I'm not, I don't have those figures memorized.

Twenty two thousand two hundred and thirty six miles above Earth's equator.

So, yeah, the laws of physics kind of govern how long it takes light to travel round trip, out from your location So you had a o three b create a medium earth orbit, and so that came right closer.

There's more complexity on the ground, equipment, the latency is a lot lower and in some places, where this was getting fielded in places where they were never gonna get terrestrial or it was hard to reach places.

Then they were getting latencies similar to what you would have with terrestrial.

But you know, the latest is these low earth orbit, starlink.

It's SpaceX and then one web and your Amazon has a project. They want their they're pretty early on in the project Kiper.

And there's a bunch of Chinese, they call mega constellations, and these require thousands of satellites.

And, but yeah, that you know, could that help in the in a tango situation to undersea volcano takes out tango, Yeah, I guess, in in the case of Tonga StarLink was one of the first, I think there was other Pacific satellite operators that got in there first but they were starlink was providing some of the capacity.

They did need to set up a ground station and Fiji because, it's only recently that they've had, inter satellite links. So this is, their piece to the low earth orbit is that if your satellite's very low, then the footprint's very low and you have to just ricochet up to the satellite and back to the ground. And now you need to have a ground station pretty near close nearby.

And so they're this inter satellite link being able to go up to a satellite and then from one satellite to another satellite is like super complex and really hard to do and they're starting to do it now.

So that has to be that has to be really solved in order to do things like use low earth orbit across the Atlantic right now you can't do that because you can't come back down. You have to go the inner satellite links over.

And that's just starting to happen now.

So it could. You're just never gonna have the capacity.

Right. Not until they invent subspace communication, like in Star Trek, I assume.

Yeah. I mean, there's there's there's belief that the so the I guess there's some science behind the inner inner satellite links because they're going through a vacuum in space. Can actually carry a higher capacity than the link going from the ground, uh-uh to the satellite.

And so there's some gains that we had, you know, in the fidelity of the links and the inter interspace links.

Yeah. And I have to assume that as that technology progresses and improves that it will offer a I mean, I know that it's never gonna be the same as, like, a hard fiber you know, connection here on the ground. But as it improves in resiliency and in data transfer speed bandwidth, you know, it it would be more immune to natural disasters than undersea cables and, you know, things that are affected by hurricanes and earthquakes and things like that. I mean, I I can remember when first learning about all of the, this choke point, you mentioned the choke point in the Middle East.

There's a choke point, I believe, in the tri state area of New York. I don't know if it's on the New Jersey side or in Manhattan, and I just have to imagine if there was one problem in that building where all of these connections go through. You know, that that's it for the the Northeast US, which is, you know, kind of a few people. So now I do wanna move on to an, I could talk about undersea cables for the three hours.

So we definitely have to do that again.

Me too.

If you wanna get into you mentioned you you looked at the Rogers outage, from last year.

That was a big deal. I remember reading a ton of blog posts on that and and all of your analysis as well, really interesting stuff. Terrible, but you don't think to occur, but ultimately I don't know if we know with full certainty what the true cause of that was, what the root cause, but everything points to human error. Correct?

Yeah. So that was, I think arguably Canada's largest internet outage ever, in history. And, and they was long. I I forgot the duration, but this is many hours, maybe twenty four hours.

Before it was completely, you know, getting restored. And, there is some after, some root cause, published. You know, these these things are always uh-uh to people who are pretty techie like maybe we're gonna read these always insufficient. It's never enough detail. I would love to let know some more and they're never gonna you're just never gonna get there. But, in this case, you if you read between the lines, it's seemed like what was happening was, they had basically leaked, the global routing table which is, you know, over eight hundred thousand routes or into their IGP, their whether it's internal BGP or something, their internal routing, which is not gonna be anything on that scale.

And you're if you're using a protocol like OSPF or something that that is very talkative try to maintain total knowledge of of links.

These things do not go well together.

They're very different styles of of routing. So, the there was basically they leaked the table in. It was just too many announcements and these routers were just melting down and, you know, why it took so long is, I think it seemed like there was some commonalities to the historic Facebook outage, the previous fall where you know, there's unforeseen dependencies, the, you know, the, the engineers were also using Rogers mobile service and the the company's, using its own communication services to coordinate its work and, and when that goes down, they don't have a way to coordinate and and so that uh-uh extends the outage because it's it's very hard to if you can't talk, if your normal tools for talking are no longer available.

But it does reveal some stuff about the Canadian internet. You know, this is a an an issue where there's different pockets of essentially monopolies and of Rogers and a bell and this is you know folks in the industry and in Canada wrestle with this and, it'd probably be vendors and actually moving towards a more greater consolidation. Still, if you have your major provider that's got a a near monopoly in a region, go down.

There may not be a lot of great alternatives to use so that that may have contributed to the duration of the outage. Yeah. It was a routing thing. Sounds like a filter was removed. I guess we're not gonna know much more than that. But I would say that, you know, like there was, there was a lot of folks, and when this outage took place, we're all looking at BGP is a thing that is easy to go as a go to for for me and people, like me who do internet measurement.

And there's lots of there was lots of routing instability going on at this time. A lot of routes got pulled.

But we could see because we we have this, I have the the benefit of having our aggregate net flow to look at it. Like, what do we see the our customer base can what can we see, as far as communications with, Rogers we could see, routes that state routes that stayed up and the traffic stop going. And so that means that the route wasn't the problem. The routes were still up and available and at a time stable, but they weren't carrying any traffic.

And so, you know, there's got you have multiple layers of this onion of their network. You've got some that routers that are announcing that are addressed based to the rest of the internet and you got internal routers handling how the traffic move move within the network. Those were down but they were still advertising their space. So, there were some initial claims that they left the little routing table.

Somewhere else did but, you know, that was one thing I tried to pick apart in the in the blog post was to say, alright, we can see traffic stopping to routes that are still up. So that I it's not really a b g b g p thing.

In that case, it's, you know, an internal, I mean, not a exterior BGP. Maybe it's an internal BGP doing what protocol they're using internally. Anyway, so that was Yeah.

It's interesting that you can, you know, you have this, really, deep visibility into what's going on in the public internet, and then you can use that and parse it in such a way where you can actually, infer what's going on in somebody's private network, based.

To some extent. Yeah. Absolutely. Yeah.

Yeah. And I remember, just a few years ago prior to Rogers and then prior to the Facebook outage or or was it prior, didn't didn't Facebook do something in this in Southeast Asia where they accidentally were a transit network for the public internet for a time or something like that?

I'll see.

It's like a major disruption as a result.

I don't know if it was Facebook or somebody else, but No, there's there's been a thing, a couple things.

Like, there's there was a Google incident where they, Google leaked, had a at a BGP league that took down a lot of connectivity in Japan for a while and I wrote something up on that at the time. They got called.

There's a ripple effect. There's a cascade effect when you get into routing and and talking about especially if you're talking about full feeds and things like that and and then redistribution of routes and how you filter like you mentioned. I mean, it a cascade effect all the way down to endpoints sitting on your network. You know, you you really need to be, very aware and careful as a human being, engineering, know, actively engineering and touching wiggling wires on your network.

I remember, the AWS, outage as well. Remember that, was the s three outage and, it came out that it was somebody who misconfigured something. I don't remember what. I wrote a blog post about that and and my blog post was called Amazon s three or something like that. We've all been there because I've been there. Now, I I don't work at the global. I didn't work as a network engineer at the global scale, except for some consultant work I work for, I did for GE, but other than that, it was large enterprise.

But yeah, if you configure something incorrectly, just one little error, in an access list or in a route map or whatever it happens to be, or, The one I like to joke about is on, you know, maybe it's a major trunk port going in your backbone for layer two and you forget the ad command if you're using Cisco devices. And then boom, everything's down. It's a it's very high impact, and it's just you as a human being that can cause all of that disruption. And I I wonder if there is a way.

I mean, now that we talk about network automation and programmability and and this desire to eliminate that which is error prone in manual configuration, Of course, make things more efficient and and cleaner and all that, but also eliminating that error prone component of manual configuration. I wonder if if that is realistic or not. I mean, ultimately the the the code that we write in Python, and I think some people still use ansible playbooks and things like that. Is still written by human beings that require an understanding of how BGP works and how how to write an, you know, an inventory list or or how to write a route map.

It still requires a human being to know that and to figure that out and to parse that in such a way in code where it interacts with everything else going on in the network. So I I don't know. I don't know if we can ever get away from the human error component of these types of outages.

Maybe we can It's a lesson in humility.

Less in humility for sure.

You know, and I think I think probably twenty twenty one. This is this will be with us forever probably, but, I think twenty twenty one was definitely the year of learning of of humbling experiences of the greats of the internet, falling to their knees. He had Facebook You also had Amazon had a couple outages. The second one wasn't as big, but the first one you mentioned where there was an internal DNS, issue had turned out that a lot of the internal services did not use multi region which is kind of funny.

I mean, it's a it's a cloud operator. They can it's their stuff. They could they could be replicating this in every region. Everything was based like the rest of the world in US East one, and they were too, and and singly honed with their internal including their internal DNS.

It's kind of I don't know it's there's some but I I it's easy for us on the outside to be you should have known, but, but, you know, and it's, I think these things, especially the scale, the scale that we're talking of like a Facebook AWS, earlier in the year it was vastly and Akamai had outages in that year. I think every major kind of provider has had has had one of these and it's it is hard to anticipate every dependency.

And, then this, you know, it's out of band. It's easy for us to say, oh, you should have had an out of band communication that doesn't have any dependency on your network and allows you to remote into your stuff and configure it. Like, that's that's actually a hard thing to number one, make, and then two, secure. Can you imagine, and, you're creating backdoors, that have no, you know, reliance on anything. We're trying to secure that. I mean, it's it is not is harder.

It easier said than done, building these things, but I've been there too.

I mean, I I've made those mistakes and and have had those humbling experiences, but never at a global scale. You know, I took I've taken down networks. I I remember, taking down subnets and then entire networks from time to time. But, but, you know, a few enough where I learn from my mistakes and then really took the time to analyze and to investigate and to research prior to making a change.

And even then, you know, during that change window, you're sweating literally sweating because you're like, okay, here we go and the whole team is like, you know, pins and needles. Alright. Hit the enter button or the enter key. And, but it but it was never at this kind of global scale.

And, you know, we've been talking about, what, eight companies ten companies. That's that's all you've mentioned. The names of literally less than a dozen companies thus far in our conversation. And that granted that's just in the scope of our conversation, but The point I'm trying to make is, is this idea of a very few number of companies holding the not the power, but, all of the connectivity and the data transfer and all of the content even for so much of the global internet today maybe that's part of the problem where somebody makes human being makes one problem at Facebook and then boom, you have a huge, huge disruption.

Whereas if there was something more decentralized or there were more companies, and I I don't know. But I seems Yeah.

No. This is a I think this this this topic definitely came up, when, in the, in Amazon, AWS outage in December. I mean, when Facebook went down, it was basically everything owned by Facebook was down. You understood that the rest of the world was essentially unless you you're using Facebook to log in to something else, which some people were doing, with the, the credentials.

The rest of the world carried on, with the AWS outage, we learned how much everybody is using US East one. Just the one region of the one cloud provider is powering so much And so, I got invited on to, the Fox business with Neil Kavuto on live TV and there's like why is this happening and I said well, you know, let's just take a step back. I mean, one thing is this is a service that is wildly popular.

Cloud services are solving problems.

And this outage, is is is kind of the the flip side of that success and it's it's hard to know. This is kind of a noble question, but Let's say there was a thousand companies that got knocked out for that period of time.

In you know, how long have they been on AWS maybe a year?

How many little outages would those companies have had that they didn't have because of they were they had kinda put this on AWS and they they take responsibility for running this. So there is a trade off like you're you're you're you're not having outages normally that you're responsible for. You kind of outsource that AWS is is keeping this online until they, you know, they don't but that's quite rare. In the meantime, everybody's up all the time. And I mean, that didn't put a dent in my opinion at all in the cloud business.

It still is a good.

So then yeah you you still have this issue of consolidation of you have a handful of companies that can take down a lot of connectivity.

And, Well, what about on the provider side then?

You know, we're talking about CDNs and and other types of, you know, Facebook and and Google. However, you wanna define them. But what about actual providers?

Oh, like like network service providers? Yeah.

Yeah. So there's there's two there's two dynamics there. There's one at a national level that ends up being governed by the how much the regulator of that com the country tries to, introduce and foster competition.

And that's the the lack of competition is a measure of the regulator's power to control the market. There's lots of countries that have dominant incumbents, every country started the same way. Everybody started with a state telecom. The government started this thing whether that's a hundred years ago with telegraph wires or something that everybody started with a state telecom and then that is now the whatever version of that exists today is now the incumbent. Sometimes that still is the state government owned thing. Sometimes it's been privatized.

But then there's separately there's a regulator that's trying to foster competition and I think I think at least in the business of telecommunications.

It's an accepted truth that more competition breeze lower prices better service.

But it also, you know, depending on how strong that incumbent is. You know, they may cost them jobs they've got a lot of pull and sway and so they'll they'll fight some of those.

So then at the national level, you have this so Canada is an example of one that's not it's not it's not the worst but it's not they could they could have more competition. I mean, the United States could as well. I mean, you could take any there's lots of countries that you could make this arguing about. Then at a at a much higher level, you've got like the backbone providers, the big, global networks.

And I mean, I guess we've seen some consolidation in there. At that level, it's kind of, it's interesting to me that that's it's not, you have to move a lot of traffic, to make any money. And, you have to be really big and, you know, it's a commodity thing. And anytime you, your product turns into a commodity your and it's just, you know, can you move large amounts?

And it becomes, cutthroat on price and scale. And so you have these very large companies, trying to move huge amounts, of, traffic, with very few company people, a few few engineers as you can as just as much equipment as you need.

And there's been some consolidation. You've got like, I guess they're looming now who owned level three who who bought, global crossing and XO and, like, that all the all those companies are now CenturyLink is all one one thing. So there's been some consolidation there.

And then there's others that are essentially national champions that are probably not gonna be, they're too important to be acquired NTT that Japan's not gonna be acquired, and, even Tata based in India is not gonna be maybe they won't let it.

I guess I Oh, you know, it's interesting that you start to bring up this, this distinction between privatized carriers, carriers that were started as state run businesses, things like that.

Because it started to get me thinking of, the last section or maybe the middle section of your blog post. You talked about, outages in Cuba.

The protests in Iran, and then the resultant, behavior from the national government there to shut things down.

That's something that we're starting to say Ukraine as well. Right?

There is this, you were gonna say something.

Go ahead.

Oh, I didn't say the the last adages in Ukraine, not so much government directed, Oh, fair.

That's it. I mean, they're getting bombed. It's happening, you know, till till till till till to this day.

Nevertheless, a result of sort of geopolitical occurrences For sure.

For sure. For sure.

In this case, war.

Which which is not the same as the government trying to shut down a protest or or stop the phone information. So that that is true. But nevertheless, it's it's, patient states making decisions to to control information. And that's really where I was getting out with Cuba and Iran.

That that is a very different type of of outage, isn't it, and disruption on the internet. We've talked about natural disasters and undersea cables and satellite. And, human error, which is probably the most the most ubiquitous type of of outage. Right? But now we're seeing this happen more and more and you're reporting on it I see I see your Twitter feed. I see your LinkedIn posts all the time about, nations that say, no. We're gonna shut you down citizenry.

How is that playing into this? Is this becoming a more widespread, you know, thing that's happening among countries?

Yeah. I mean, on this topic, I got into it, long ago with the, air spring back when we were, I was with, Renesys you know, we were we were already mapping out, the global internet with a live picture of the internet in the country. And so when Egypt went offline, we could pull that up immediately and understand what was up, it was down, with the timing, and then we work with the, global media outlets tell that story from a technical standpoint.

And I've never stopped, kind of covering this from trying to call ways to contribute, some technical analysis to our understanding of these things, but they they don't seem to stop. It's very hard to It's very hard to, order a sovereign nation to do, to stop doing something bad you know, whether it's shutting off, internet or other things. But this this one, I thought this fall, we saw a couple more instances of what's been termed as a internet curfew, where internet service gets turned out for a period of time deliberately and then restored, later. And it's often in the evening. It's often focusing more on mobile, services, and leave a fixed line up. And what the issue is is, you know, this, in each case, going back to we saw this first in Gabban and then uh-uh, Myanmar last year, did this after the military coup.

In this, in each case, you know, the the rationale is there's definitely some cost to shutting all your services off. There's cost to the the businesses of your country. There's lots lots of things. It's disruptive to the government itself.

And so to mitigate to hedge that, then they wanna try to isolate and be a little more surgical about shutting things off and there's there's blocking, censoring particular types of, of web services is one thing that's done, quite a bit. And then another thing is to, you know, in the case of, like, Iran is a good example this went for, I don't know, more than a couple of weeks.

Where basically the the three major mobile providers basically turn off their service of starting in the early evening and going into the early morning. And what they're trying to do is try to disrupt protesters in the street, their ability to communicate to each other and organize their ability to report in a live, you know, live, I mean, eventually services kinda come back and they can report at that point. But, at the at the moment they're unable to, you know, tell each other what's what's going on. And then at the same time, anybody who's using fixed line, those are people in offices, these are people in those, those people, their service. For the most part, remains online, there's things that are getting for sources aren't blocked.

And so that's their way to try to lower the cost of a government directed shutdown and we may see more of this. You know, it was both Cuba and Iran happening almost simultaneously fall last year with Myanmar. There aren't that many examples of it that I I don't know. I I think these guys look other countries and see what they do. And, others may learn from that and be like, well, that's a good way to, keep, not upset the business community and the government and get those pesky, protesters and not be able to communicate.

These pesky protesters.

Meaning you you say there aren't that many examples, right? But they are a very, top of mind. I mean, they they the this this, notion of authoritarian governments shutting down protests in and of itself by shutting off the full information though it's very few as far as the examples that we can point to, they're very profound, I think, in their impact and in the discussion, the philosophical discussion on the future of the internet. It's used by humanity just to propagate information and to be connected.

Now separate separate note, but just thinking now about the global internet, right? How how BGP operates and how it's so much of a trust relationship. You know, I can I can take in a full feed? I can advertise pretty much whatever I want. Right? Beyond a password, for an adjacency and some other things like that.

Do you see that as a problem? Is now that this is not nineteen ninety four where the internet was this cool idea that's never gonna take off and now is the lifeblood of our society, at least in the United States and in many other nations is the security component of the internet, maybe specifically BGP, but of the internet, an issue because we are seeing, you know, hospitals, hospital systems with eighty thousand, hundred thousand employees, like in the New York tri state area, get you know, what is it hijacked and and shut down until they pay a ransom, expand that to a global scale shutting down a country unless you do what I say. You know, that's sort of thing?

Let's see. I guess on, on the routing topic of routing security.

I think for for those of us that, work in that space, I think about it a lot. I think you have to I think the number one message to communicate to people who don't who aren't in that space is that this is a this is a very hard problem.

And it is a it is a constellation of problems. It is not one thing. There's a whole bunch of sub sub issues here. It's not we're not gonna deploy a single technology in a day that solves this. It's just, you can only imagine trying you've gotta routers all across the world an internet of routers that all have to we've got to get them to do something different. That is really hard to do. So the problem is very hard but I I would also and it's not solved.

But I I guess I'm a glass half full kind of guy and here's why. So there's, it's RPCRI route origin validation is, you know, our the the technology is getting pushed these days, trying to limit certain types of these problems. You can imagine there's a spectrum of these issues And on one head, one end is the bonehead errors, just stupid stuff. And when I started doing this a little over ten years ago, we there was lots of stuff happening on that end of the spectrum and probably lots of stuff happening on the other end too but we were just we had made no progress on even easy or like just dumb errors.

And I feel like these days, with the adoption of, RP ROB. So what this is is basically folks who own, you know, resources. So IP addresses specifically we're talking about can go into their portal. If you're North America, you went to Aaron.

You set up like what is the proper origin that you would you would want who should be originating this in in BGP land and and that gets communicated to everything in the system. If you see if they see something at somebody else then they'll drop if they see another origin they'll drop that route if they're also participating in the system.

It's not foolproof but it does reduce the, the impacts of origination leaks when somebody accidentally barks out a bunch of, a full table. I mean, we haven't had one of those in a long time. And, and I think there's, it's not just RPI. There's things like, I wanna get into all these, but things like, peer lock, something that, that, so this is where the major providers, look at. Just they just filter. They should not be receiving other, like, people in the, we call them the TFC or DFC default free zone, like the top of the internet, top of the internet hierarchy.

So it's a fully connected mesh. There's a way the internet has to operate. Any link gets disconnected, then there's a partition and one part can't touch another. But, at the top, it's a it's a full partition. But they should so let's say take like lumen and let's say entity.

Lumen should not be receiving stuff from one of its customers.

It's it's essentially the the the gist of it. You can come up with a handful of rules, that just what are the what would be the uh-uh the the s paths that should be automatically rejected. That's widely deployed and and so I would say that a lot of these major bonehead things are, it don't happen that much anymore. And I would add also that, you know, I was one of the people who was writing up a lot of, sketchy internet routing issues involving China telecom.

And this got a lot of circulation in national security circles. It led its led to the FCC, to revoke China telecoms, license to operate, telecommunication services in the United States.

I used to see them involved in these leaks. I can't tell you the last one that that that John Halligan was a was it a was a was a part of it. And, you know, they had joined manners which is the, internet societies effort to try to. And there's not really any, you know, mechanism.

You you join and you pledge to do a bunch of things. And then they are kind of vouching that you're actually doing these and there's there's some attempt to try to check up on this.

It's not it's mostly a a pledge and then, you know, that you'd be shamed if you did something wrong. You'd be shamed.

You know, actually, it's a trust relationship.

Yeah. But then I don't know. They they joined it and I there isn't, I haven't seen them, in one of these in a while. And, so maybe we are making progress.

Well, I have to imagine that it it behooves any any country that has any kind of nefarious intention to not do that, when they are interacting with other nations as, you know, as a means of doing business, of interacting with other countries to keep peace, so as much as you might have this nefarious intention, you know, I know this is gonna be a strange analogy, Doug, but I just watched the good the bad and the ugly with Clint Eastwood, one of my favorite movies, right? I love westerns. And my wife was with me and she's like, did they really act like that? And I'm like, not Not really. You know, I did I did some quick googling on my phone while you're watching the movie and, you know, I was just reading that, a lot of this stuff is fan see and and hyperbole for for movies.

That's a good story, though. Yeah.

It's a great story. I love it. I love the spaghetti westerns very, very much.

And and the the the fact is that folks went out to the West US, cowboys and families and ranchers and all these people alike to make a better life for themselves. You had a bad element.

But it wasn't this wild west that we see in Clint Eastwood movies and and, you know, other movies like that, John Wayne movies. It was much, much more calm with, of course, those exceptions. And I and I have to imagine that that's true on a nation state level on a provider level on a, you know, on a global scale with regard to the internet.

We're all trying to do the best we can for ourselves, self interest. I get that, and that's okay.

But it's in my self interest as a country to make sure I'm not screwing up my relationship with other countries because that's a great source of income and, and, again, keeping it's a it's a good point.

It is it is a property of the internet, but I would also add, the, you know, I'll shift gears from being a pollyanna to, like, we, we, there's a lot of assumptions on the internet. Like, for example, you know, when we've run we've exhausted v six. Everybody, all the v six is gonna uh-uh IP address that's that's that's gonna be used. It's already been given out. There's no more.

You mean the v four?

Sorry. V four. Yeah. Right. You're not around the v six yet. Sorry. V four.

We have a few more v six.

Yeah. Excuse me. I've misspoke.

But, you know, if we got to a point where let's say China is like, you know what? We we think, you know, like, they actually, you know, let's say let's say, not that long ago before they, DOD started announcing all this address space. This is the it was a big story last year. They were sitting on a Tom that nobody was using like four billion addresses or something or maybe four billion.

Yeah. There was some large amount, a few hundred million. And you know, Chinese companies would use these internally, use USDAD address space internally, and nothing's to stop them as long as it's not out on the occasionally in the hours. You ran a tracer out, it'd be kinda funny because you'd be like trying to trying to shine a DOD, trying to shine a DOD, trying to shine it.

Yeah. But but you know you can get to a point where countries no longer agree on what are the norms of, respecting these boundaries because there's no way to enforce any of this. And right now, it's still, we're all in this area that's common good.

Even though some countries may be at war with each other, they still respect these boundaries.

But I don't know. I think I think it's one of these things where that's a line that could get crossed at some point. So for like let's say, the Ukraine has, you know, their vice president had, call for this IT army to attack things in Russia if you wanted to support Ukraine, youth attack and people were participating in this. They still this still happens. I mean, if they really wanted to, they could start at just announcing all their Russian address space and just screwing and Russia could start announcing all the Ukrainian address space and No. It would just be a a complete mess.

And that's a line.

I like we we we I think we had just assumed that would never get crossed but could.

And then we could It's a hypothetical, isn't it?

It's really we really could there is a possibility of the stuff breaking apart.

There's no technical reason that There's no technical reasons.

Yeah. Those things have. Or could So, I'll just say one more thing, Phil on the so we talked about the reasons to be optimistic on routing security. But then you know the flip side of that is that there's a lot of things that aren't solved and you know there's been a few cryptocurrency attacks that were very profitable the folks that that pulled these off.

And these are Interesting. Yeah. These are the sophisticated, attacks. So I mentioned the spectrum, but one is bonehead, the other ends sophisticated, we call it a determined adversary.

It's the, phraseology.

Well, is that sound good?

A determined adversary. You are really determined to defeat our KROV. You can do it. And, and there's there's ways to manipulate the system that there's to R and we just haven't solved yet and and there are people who are doing this now. And I think as we clear and clean up the the bonehead section and experts can kind of keep moving moving the needle up towards the uh-uh the determined adversary, then I think we can start to tighten down and make that costlier, if we can't make it completely prevented, and maybe some of these scenarios can be prevented, about, but there's there's a lot happening in that in that space as well in a determined adversary Yeah. B2B hijacks.

Yeah. And with regard to this entire idea of outages that we've been talking about, and major disruptions on a large scale global scale, certainly more complex than, the the few things that we've today, right? The the the theme that I picked out, natural disasters and, human error and nation states, being authoritarian in their control of information. It's it's actually much more than that.

And then we touched on BGP security as well. So in any case, Doug, we're at time. Great conversation. So much we can unpack.

I think we can turn this into about twenty different podcasts, maybe a series on international.

In fact, in fact, I love talking about this stuff. So Yeah.

Yeah. No. I appreciate that.

Happy to do it.

So as we wrap up here, I'd like to give you an opportunity. How can folks reach out to you if they don't know already. How they can reach out to you to ask a question, make a comment?

I am still on Twitter.

I have it departed.

I still think it's probably going to be around for a while. But that's probably the easiest I put some things there. I'm on LinkedIn as well. Feel free to send me a, invite. If you're into business, I usually accept it, and then I try to start a conversation about, like, what is it you're interested in? See if there's any common, commonality with our interests.

Right.

What's a great waste? It's just at Doug Madory, d o u g m a d o r y.

Great. And, I believe you blog pretty, frequently, on the, Kentic.

I tried to. Yeah.

Right. Okay. So make sure to check that out. You can find me on Twitter, network underscore fill. I am still very active there.

I, I'm also on LinkedIn. You can search me there. My blog is network fill dot com, not as not as frequently posting recently, but, you still check it out. So until next time. Thanks very much for, listening. Bye bye.

About Telemetry Now

Do you dread forgetting to use the “add” command on a trunk port? Do you grit your teeth when the coffee maker isn't working, and everyone says, “It’s the network’s fault?” Do you like to blame DNS for everything because you know deep down, in the bottom of your heart, it probably is DNS? Well, you're in the right place! Telemetry Now is the podcast for you! Tune in and let the packets wash over you as host Phil Gervasi and his expert guests talk networking, network engineering and related careers, emerging technologies, and more.
We use cookies to deliver our services.
By using our website, you agree to the use of cookies as described in our Privacy Policy.