Core Concepts for Resilience vs RedundancyWhat is resilience in networking?Why is resilience important?What is redundancy in networking?Why is network redundancy important?How to build redundancy in networkingWhat are the different types of network redundancies?The advantages of redundancy in networkingThe disadvantages of redundancy in networkingAdditional network resilience mechanismsBuilt-in data verification with checksumsMessage integrity and guaranteed delivery with TCP/IPIP address abstraction and network failover capabilities with DNSMaintaining network connectivity with BGPWhat is predictability in network flows?How network flows enter and leave a networkIngress and egress stability when a fault occurs on the path between themThe challenge of maintaining predictable network flowsConclusion
In this blog, we discuss the crucial differences between resilience vs redundancy in networking. Learn how to optimize your network for seamless performance.
Predictability in network flows is the ability to consistently deliver traffic from one point to another, even in the face of disruptions. Yet, establishing predictability has its share of challenges. Learn all about resilience in networking and how it relates to redundancy.
Let’s start with some definitions to set the stage.
Core Concepts for Resilience vs Redundancy
What is resilience in networking?
Resilience in networking is the ability of a network to withstand and quickly recover from failures or changes in its environment. This includes the ability to:
- Dynamically adjust to changes in network topology
- Detect and respond to outages
- Route around faults in order to maintain connectivity and service levels.
Why is resilience important?
Resilience in networking is more than just a measure to ensure smooth operations; it’s an assurance of continuity, reliability, and trust in the network’s ability to perform its function, even in challenging scenarios. Let’s explore why resilience holds such significance:
Fault Tolerance: This is a primary goal of resilience. In the world of networking, faults are inevitable. They can arise from various factors, including hardware malfunctions, software glitches, or unexpected disruptions. A resilient network is designed to tolerate these faults, ensuring that the overall system remains functional even when certain parts of the network malfunction.
High Availability: Organizations rely on networks to perform crucial operations, from facilitating communications to carrying out transactions. A network’s downtime can mean significant losses for businesses. High availability ensures the network remains operational for the maximum possible time, minimizing outages and interruptions. Resilient networks achieve this by dynamically adjusting to changes and rerouting traffic as needed.
Disaster Recovery: Disasters can be natural, like earthquakes, floods, or fires, or they can be artificial, such as cyberattacks or power outages. A resilient network incorporates disaster recovery protocols that enable quick restoration of network functionality after a catastrophic event. This could involve switching to backup systems, rerouting traffic to alternative paths, or activating standby resources.
Business Continuity: Today’s businesses are intrinsically linked with their networks. Whether it’s an e-commerce platform relying on its online storefront or a global corporation depending on its intranet for daily operations, any disruption can have cascading effects on the bottom line and brand reputation. Resilience ensures business continuity by prioritizing critical network functions, mitigating risks, and ensuring that the network can adapt to changing conditions.
Trust and Reputation: Customers, clients, and stakeholders trust businesses that offer consistent and reliable services. Network disruptions can lead to dissatisfied customers, missed opportunities, and tarnished reputations. By emphasizing resilience, companies demonstrate a commitment to delivering uninterrupted services, fostering trust and enhancing their reputation in the market.
What is redundancy in networking?
Redundancy in networking is the use of multiple interchangeable components to increase the reliability of a system. These components can include hardware, software, or services. When one of the components fails, it can automatically be replaced with another component, keeping the network intact.
Why is network redundancy important?
Network redundancy ensures that networks can maintain service levels and remain reliable during outages or disruptions. Redundancy can also be used to increase performance and handle temporary spikes.
Redundancy is a cornerstone of resilience. Eventually, every component will fail. Hardware components have a finite lifetime. Even software components will have bugs, faulty designs, or misconfigurations. Redundancy allows you to establish resilient and reliable networks from fallible and unreliable components.
Now that we’ve laid out these core concepts let’s take a closer look at how to build redundancy in a network.
How to build redundancy in networking
In IP networks, redundancy is accomplished by providing multiple paths for traffic to travel through. If one path is disrupted or fails, traffic can be rerouted through another path. This allows the network to remain operational even if some parts are unresponsive or slow.
Common network redundancy mechanisms include node and link redundancy, redundant power supplies, and load balancers. However, simply installing redundant components is not enough. The network must be able to quickly detect when packets fail (or take too long) to route along a specific path and failover to an alternative path to maintain connectivity and service levels.
What are the different types of network redundancies?
To create a network that can adapt, adjust, and maintain its functionality despite unforeseen circumstances, various forms of network redundancies have been devised. Here are some of the most prominent types of network redundancies that NetOps professionals should be familiar with:
Multiple Spanning Trees (MST) is an extension of the basic Spanning Tree Protocol (STP). It is designed to alleviate some of the limitations of STP, especially in more complex or more extensive networks. MST allows for multiple VLANs to be mapped to fewer spanning-tree instances, reducing the load on network devices and optimizing network resources.
Multi-Protocol Label Switching (MPLS) provides a mechanism to route traffic based on simple label switching instead of traditional IP routing. While MPLS isn’t strictly a “redundancy” mechanism, it can be used with other techniques to create redundant network paths. In essence, MPLS optimizes traffic flow and can be part of a redundant network design.
Diverse Trunking is a straightforward redundancy approach that ensures continuous communication by establishing multiple communication pathways. It’s a classic form of redundancy and is very effective for ensuring network resilience.
Virtual Router Redundancy Protocol (VRRP) is a high-availability protocol designed to eliminate the single point of failure in a static default routed environment. By creating a virtual router (an abstraction over the physical routers), VRRP ensures that if the active router fails, the backup router takes over the IP address and continues the operation.
BGP Multipath: BGP is traditionally known for its path-vector protocol, which selects a single best path based on a list of attributes. However, there are scenarios where multiple paths are available, and they are equally desirable due to having the same attributes (like AS_PATH length, MED, etc.). Instead of selecting one and discarding the rest, BGP multipath allows for the utilization of multiple paths to distribute outgoing traffic. This offers two primary benefits: Load Balancing—By leveraging multiple equally preferable paths, networks can distribute the traffic over these paths, thus efficiently utilizing available bandwidth and preventing a single link from becoming a bottleneck. Redundancy—If one of the paths experiences an issue or becomes unavailable, traffic can continue to flow using the other paths, ensuring uninterrupted network services.
When implementing BGP multipath, it’s crucial to ensure proper configurations and monitor the distribution of traffic to avoid unintentional traffic engineering issues or uneven load distribution.
While these techniques and protocols enhance redundancy and resilience, they should be carefully designed and implemented. Misconfiguration or lack of understanding can lead to network anomalies or even failures. Furthermore, the choice of which redundancies to implement should be based on the specific requirements, constraints, and goals of the network in question.
The advantages of redundancy in networking
Incorporating redundancy within a network design is not just about having backup systems—it’s about ensuring robustness, adaptability, and an enhanced user experience. Here are some key advantages:
Increased Network Reliability: Redundancy is pivotal in ensuring network availability. By having backup systems or pathways, the risk of a total network failure is significantly reduced. This consistent functionality fosters trust among users and stakeholders.
Improved Network Uptime: Downtime can be financially and reputationally costly. Network redundancy ensures that even when certain components fail, the network as a whole remains operational, leading to enhanced uptime and ensuring business continuity.
Enhanced Performance and Load Balancing: Redundant systems often facilitate load balancing, distributing data traffic across multiple paths or servers. This not only optimizes the use of network resources but also ensures a smoother, lag-free user experience.
Resilience Against Cyber Attacks: Redundant networks offer an added layer of security against cyber threats. If one part of the network is compromised, traffic can be rerouted through another safe path, minimizing potential damage and ensuring uninterrupted service.
Scalability and Future-Proofing: As businesses grow and data traffic increases, the network must scale accordingly. Redundant systems are inherently scalable, allowing for the addition of new components without major overhauls. Plus, by accounting for future demands and potential technological advancements, network redundancy future-proofs the system against upcoming challenges.
The disadvantages of redundancy in networking
Although redundancy in networks is important, keep in mind that it also has some downsides. First, redundancy increases the complexity of the network. Also, a network must be appropriately configured to leverage redundancy measures to provide the expected benefits. Finally, redundancy adds cost. In some cases, the return on investment might not justify the cost.
While redundancy significantly contributes to network resilience, other mechanisms, protocols, and methods can also contribute to overall network resilience.
Additional network resilience mechanisms
Successfully routing a packet over the internet from its source to its destination is not trivial. Many network protocols have been designed to handle different aspects of this process. Let’s highlight some of the primary protocols.
Built-in data verification with checksums
Network packets are sent with a checksum. If the payload (or checksum) somehow becomes corrupt along the way, the checksum will not match, and the packet will be resent. This verification mechanism prevents the danger of unreliable data.
Message integrity and guaranteed delivery with TCP/IP
Because IP does not require acknowledgments from endpoints, it does not ensure delivery; therefore, it is considered an unreliable protocol.
On the other hand, Transmission Control Protocol (TCP) provides a connection-based, reliable byte stream. TCP/IP is TCP built on top of IP, and it ensures that bytes sent from a source to a destination will be received in the order sent. Under the hood, TCP handles retransmissions, breaking and recomposing packets, and much more. TCP/IP is one of the fundamental networking protocols and the basis for common protocols like HTTP.
IP address abstraction and network failover capabilities with DNS
Domain Name System (DNS) is a system for mapping domain names to IP addresses. By using DNS, users can access a website or other service using a human-readable name without needing to remember the IP address. DNS also provides redundancy and failover capabilities by allowing multiple IP addresses to be associated with a single domain name. If one IP address fails, traffic can be routed to another to maintain service levels.
Maintaining network connectivity with BGP
Border Gateway Protocol (BGP) is a routing protocol that connects different networks over the internet. It is used to maintain network connectivity by helping routers find the best path for traffic to travel through. BGP also helps to create network resilience by allowing routers to quickly detect and respond to outages or changes in the network topology. Traffic can be rerouted around outages or disruptions, maintaining service levels.
Now that we have looked at several network resilience mechanisms, let’s examine how predictability in network flows is related to network resilience.
What is predictability in network flows?
Predictability in network flows is the ability to consistently deliver traffic from one point to another—even in the face of disruptions. This is another facet of network resilience. Let’s consider the most important concepts.
How network flows enter and leave a network
Network flows enter and leave a network through network interfaces or routers, which connect different networks. Routers use routing protocols such as Border Gateway Protocol (BGP) to determine the best path for traffic and then forward the traffic to its destination based on that path. Routers also provide security features such as firewalls and access control lists, which can help to protect the network from malicious traffic. Other lower-level pieces, such as Address Resolution Protocol (ARP) and Media Access Control (MAC), are also involved in this routing step.
Ingress and egress stability when a fault occurs on the path between them
“Ingress” and “egress” are the points at which traffic enters and exits a network. When a fault (such as a link or a node failure) occurs on the path between the ingress and egress points, the ingress or egress points are not affected directly. The various protocols and mechanisms we have discussed should be able to reroute traffic around the fault.
The challenge of maintaining predictable network flows
Predictable network flows are a desirable property of networks, but they are challenging to accomplish. The following are concrete scenarios that can challenge a network’s ability to maintain predictable network flows.
- Traffic spikes can cause bottlenecks that lead to packet loss, retransmission, timeouts, and additional overall load on the network.
- Changes in network topology, such as adding or removing devices, can cause unpredictable changes to a network’s performance and even inhibit the reachability of some endpoints.
- Hardware failures can be as minimal and local as a router or switch going down or as severe as an entire underwater cable getting cut.
- Software bugs and device misconfigurations can cause disruption—from lost packets to poor performance—and might be very challenging to detect in large, dynamic networks.
- Security threats are a constant danger as networks are often the entry point for an attack.
In this post, we’ve covered the basics of resilience and redundancy in networking, providing a detailed treatment of mechanisms for network resilience and predictability of network flows.
Redundancy is essential for creating resilience in IP networks by providing multiple paths for traffic to travel through. Additional mechanisms for network resilience include data verification, TCP/IP, DNS, and BGP.
Predictability in network flows is the ability to consistently deliver traffic from one point to another, even in the face of disruptions. Yet, establishing predictability in network flows has its share of challenges.
In order to operate a reliable, performant, secure, and cost-effective network, it is crucial to understand network resilience and how to improve it using redundancy and other mechanisms.