Kentik - Network Observability
Back to Blog

Resilience and Redundancy in Networking

Stephen Condon
Stephen CondonProduct Marketing Principal
featured-resilience-redundancy

Summary

Predictability in network flows is the ability to consistently deliver traffic from one point to another, even in the face of disruptions. Yet, establishing predictability has its share of challenges. Learn all about resilience in networking and how it relates to redundancy.


Let’s start with some definitions to set the stage.

Core Concepts

What is resilience in networking?

Resilience in networking is the ability of a network to withstand and quickly recover from failures or changes in its environment. This includes the ability to:

  • Dynamically adjust to changes in network topology
  • Detect and respond to outages
  • Route around faults in order to maintain connectivity and service levels.

Resilience is important to ensure that networks are reliable and provide a consistent level of performance.

What is redundancy in networking?

Redundancy in networking is the use of multiple interchangeable components to increase the reliability of a system. These components can include hardware, software, or services. When one of the components fails, it can automatically be replaced with another component, keeping the network intact.

Redundancy ensures that networks can maintain service levels and remain reliable in the face of outages or disruptions. Redundancy can also be used to increase performance and handle temporary spikes.

Redundancy is a cornerstone of resilience. Eventually, every component will fail. Hardware components have a finite lifetime. Even software components will have bugs, faulty designs, or misconfigurations. Redundancy allows you to establish resilient and reliable networks from fallible and unreliable components.

Now that we’ve laid out these core concepts, let’s take a closer look at how to build redundancy in a network.

How to build redundancy in networking

In IP networks, redundancy is accomplished by providing multiple paths for traffic to travel through. If one path is disrupted or fails, traffic can be rerouted through another path. This allows the network to remain operational even if some parts are unresponsive or slow.

Common redundancy mechanisms include node and link redundancy, redundant power supplies, and load balancers. However, simply installing redundant components is not enough. The network must be able to quickly detect when packets fail (or take too long) to route along a specific path and failover to an alternative path to main connectivity and service levels.

Although redundancy in networks is important, keep in mind that it also has some downsides. First, redundancy increases the complexity of the network. Also, a network must be appropriately configured to leverage redundancy measures to provide the expected benefits. Finally, redundancy adds cost. In some cases, the return on investment might not justify the cost.

While redundancy is a significant contributor to network resilience, other mechanisms, protocols, and methods can also contribute to overall network resilience.

Additional network resilience mechanisms

Successfully routing a packet over the internet from its source to its destination is not trivial. Many network protocols have been designed to handle different aspects of this process. Let’s highlight some of the primary protocols.

Built-in data verification with checksums

Network packets are sent with a checksum. If the payload (or checksum) somehow becomes corrupt along the way, the checksum will not match, and the packet will be resent. This verification mechanism prevents the danger of unreliable data.

Message integrity and guaranteed delivery with TCP/IP

Because IP does not require acknowledgments from endpoints, it does not ensure delivery; therefore, it is considered an unreliable protocol.

On the other hand, Transmission Control Protocol (TCP) provides a connection-based, reliable byte stream. TCP/IP is TCP built on top of IP, and it ensures that bytes sent from a source to a destination will be received in the order sent. Under the hood, TCP handles retransmissions, breaking and recomposing packets, and much more. TCP/IP is one of the fundamental networking protocols and the basis for common protocols like HTTP.

IP address abstraction and failover capabilities with DNS

Domain Name System (DNS) is a system for mapping domain names to IP addresses. By using DNS, users can access a website or other service using a human-readable name without needing to remember the IP address. DNS also provides redundancy and failover capabilities by allowing multiple IP addresses to be associated with a single domain name. This way, if one IP address fails, traffic can be routed to another to maintain service levels.

Maintaining network connectivity with BGP

Border Gateway Protocol (BGP) is a routing protocol that connects different networks over the internet. It is used to maintain network connectivity by helping routers find the best path for traffic to travel through. BGP also helps to create network resilience by allowing routers to quickly detect and respond to outages or changes in the network topology. Traffic can be rerouted around outages or disruptions, ensuring service levels are maintained.

Now that we have looked at several network resilience mechanisms let’s examine how predictability in network flows is related to network resilience.

What is predictability in network flows?

Predictability in network flows is the ability to consistently deliver traffic from one point to another—even in the face of disruptions. This is another facet of network resilience. Let’s consider the most important concepts.

How network flows enter and leave a network

Network flows enter and leave a network through network interfaces or routers, which are devices that connect different networks together. Routers use routing protocols such as Border Gateway Protocol (BGP) to determine the best path for traffic and then forward the traffic to its destination based on that path. Routers also provide security features such as firewalls and access control lists, which can help to protect the network from malicious traffic. Other lower-level pieces, such as Address Resolution Protocol (ARP) and Media Access Control (MAC), are also involved in this routing step.

Ingress and egress stability when a fault occurs on the path between them

“Ingress” and “egress” are the points at which traffic enters and exits a network, respectively. When a fault (such as a link or a node failure) occurs on the path between the ingress and egress points, the ingress or egress points are not affected directly. The various protocols and mechanisms we have discussed should be able to reroute traffic around the fault.

The challenge of maintaining predictable network flows

Predictable network flows are a desirable property of networks, but they are challenging to accomplish. The following are some concrete scenarios that can challenge a network’s ability to maintain predictable network flows.

  • Traffic spikes can cause bottlenecks that lead to packet loss, retransmission, timeouts, and additional overall load on the network.
  • Changes in network topology, such as adding or removing devices, can cause unpredictable changes to a network’s performance and even inhibit the reachability of some endpoints.
  • Hardware failures can be as minimal and local as a router or switch going down or as severe as an entire underwater cable getting cut.
  • Software bugs and device misconfigurations can cause disruption—from lost packets to poor performance—and might be very challenging to detect in large, dynamic networks.
  • Security threats are a constant danger as networks are often the entry point for an attack.

Conclusion

In this post, we’ve covered the basics of resilience and redundancy in networking, providing a detailed treatment of mechanisms for network resilience and predictability of network flows.

Redundancy is essential for creating resilience in IP networks by providing multiple paths for traffic to travel through. Additional mechanisms for network resilience include data verification, TCP/IP, DNS, and BGP.

Predictability in network flows is the ability to consistently deliver traffic from one point to another, even in the face of disruptions. Yet, establishing predictability in network flows has its share of challenges.

In order to operate a reliable, performant, secure, and cost-effective network, it is crucial to understand network resilience and how to improve it using redundancy and other mechanisms.

Continue reading about reinforcing networks and advancing resiliency and redundancy techniques.

Explore more from Kentik

We use cookies to deliver our services.
By using our website, you agree to the use of cookies as described in our Privacy Policy.