When searching on “What is SD-WAN,” the results display plenty of articles and videos explaining the features of any given vendor’s SD-WAN solution, but very few go into the details on how SD-WANs work. How do they handle ARPs, DNS queries and connection requests to domains like Salesforce, Netflix or Facebook? How do they handle redundancy and convergence when connections go down? What about prioritization and troubleshooting insight? This article touches on all of this.
Software-defined wide area network, or SD-WAN, is a new way of implementing WAN connections where the edge router makes fewer local decisions in contrast to a traditional edge router. When a connection request comes into the edge, it forwards the connection request to a controller. The controller decides whether or not to allow the connection and then sends instructions back down to the edge.
The traditional physical WAN connectivity is still there but has been renamed the underlay. In contrast, the logical understanding of the topology and its connectivity is made up of VPNs and is called the overlay. To rephrase that, the underlay is how everything is physically connected up to form a network. It’s the overlay that determines the logical topology: full mesh, partial mesh, hub and spoke or point to point.
As stated above, connection requests are sent by the edge up to the controller to check policy. This is done over a secure connection called the control plane. Once the edge hears back from the controller, a connection is established. The actual end-user connections traverse the SD-WAN fabric over something called the data plane.
To summarize what has been covered to this point:
The above terms are just the beginning when entering the world of SD-WAN. We try to stay generic here, but each vendor makes significant contributions to this list of acronyms.
Let’s say User A would like to make a connection to salesforce.com. First the user may need to ARP for the DNS. The edge receives the ARP and resolves it as specified by the policies set in the controller.
This means the edge may resolve it locally, send it to a specified DNS or send it up to the controller for resolution. This process is vendor-dependent.
Once User A has the MAC for the DNS, a request is then sent to the DNS to resolve salesforce.com (for example) to an IP address. Again, the process is vendor dependent, but in most cases the edge will send this request up to the controller. The controller evaluates the request for salesforce.com and compares it to the configured policy list.
Many policies are based on top level domain (TLD). Here are a few examples:
These instructions are then sent back down to the edge for enforcement, and the edge replies to User A’s DNS query for facebook.com with the proper IP address. Keep in mind that if the end user’s web browser is configured to use DoH (DNS over HTTPS), this could cause problems for some SD-WAN solutions as the browser receives the IP address back from a different DNS instead of the controller.
User A then sends a SYN to the salesforce.com IP address to initiate the TCP handshake that is required to make a HTTPS connection. The edge provisions for the connection as instructed by the controller.
Every now and again something happens like congestion, packet loss, latency or maybe a severed WAN connection. Then what happens? What will the SD-WAN environment do about it? The answer to this is generally a significant vendor differentiator.
Some vendors build technology into the edge devices that will routinely ping high priority destinations (like salesforce.com) to measure things like latency, packet loss and jitter. Think of it as a type of synthetic monitor. When the edge detects that a problem exists on a connection to a high priority domain, it may balance the flows or packets to the target over multiple links. This is because most SD-WAN implementations support something called active/active where there are no secondary links, rather, only additional load-carrying connections.
Beyond saving money by eliminating expensive leased lines such as MPLS and utilizing VPNs over the internet, here are some additional SD-WAN benefits:
Many of the above benefits are vendor-dependent and should be tested under a load to ensure real world operation.
Ultimately, the only SD-WAN features that matter are the ones that will support the business critical applications in the optimal way. Spending extra money on things like sub-second convergence, mesh topologies, multitenancy, and multicast doesn’t matter unless they improve the user’s experience.
At the end of the day, every SD-WAN vendor will tout how necessary their features are and how great their performance is. One of the areas that is sometimes overlooked during the sale is performance monitoring. After the SD-WAN deployment, most NetOps teams want insight into how the SD-WAN is performing.
Companies like Cisco, Silver Peak/HPE and VMware export IPFIX, allowing network observability companies to provide performance insights into the SD-WAN fabric. Flow data is correlated with telemetry that is exported from the SD-WAN management platform. For example, the Kentik architecture provides the capability to ingest vendor-specific fields, which are very important in the SD-WAN space (e.g., Viptela: VPN Identifier, and Silver Peak: Application, Business Intent Overlay).
For example, the Silver Peak-Kentik integration features provide the inclusion of application name dimension and business intent overlay (BIO) into interface traffic metadata source.
The above screen capture from Kentik shows which subnets are speaking with each other. The user can also see the relationship between overlay networks and applications. These combined data sets allow NetOps to discover what applications are running between sites, the internet, and to the data center. They can be used to better understand service providers, link utilizations, and traffic patterns. These details help NetOps fine-tune policies at the SD-WAN controller.
Kentik didn’t leave any SD-WAN vendors out. We provide overlay, underlay and application traffic visibility to Silver Peak, Cisco, VMware and all other major open SD-WAN solutions.
SD-WAN is a powerful new network architecture and technology that helps on multiple fronts. SD-WAN can improve performance, increase security and lower costs all at the same time. However, like any networking technology, SD-WAN delivers more benefit when it is properly managed. Auditing application traffic policies, understanding the SD-WAN traffic paths taken as well as link utilization are critical maintenance functions. Having the ability to troubleshoot, plan capacity, and optimize costs is also important. Network observability solutions like Kentik can help you perform the operational oversight that you need to make SD-WAN successful.
If further clarification is needed on any of the SD-WAN terms used in this post, please reach out to the Kentik team.