NetOps teams have quickly learned the benefits of hosting applications in the cloud. But before they migrated or adopted a few SaaS applications, they knew in the back of their minds that monitoring performance would be difficult. A tiny voice was asking, “How will we monitor packet loss and connection latency, hop-by-hop, when using cloud applications?”
It isn’t so much that packet loss is a huge problem — TCP and QUIC were engineered in anticipation that lost packets would be inevitable. If there is a problem, it rests primarily in two areas:
Most of us have experienced packet loss, like choppy video in a conference call. Or standing in line at the bank or a department store and when the clerk says, “The system is really slow today.” Thinking back on these situations, would you guess that the problem was packet loss or latency? How would you confirm your answer?
Packet loss can be caused by a wide variety of factors, including:
Typically, poor connections at the physical layer, such as bad cables or connectors, are the problem. Congestion in the form of high-connection utilization or an overworked router in the path is another common source of trouble.
Trying to source the location that is causing the packet loss is not always a trivial practice. For example, what if the destination of the connection is somewhere in the cloud? How do you identify where the packets are getting dropped?
Here’s a list of utilities and techniques that many of us are still using today to detect and isolate network packet loss:
After reviewing the above four ways to monitor for packet loss, you might think that TCP traceroute will solve the observability riddle, but there’s a problem. The cloud is made up of thousands of routers. This means we would need a massive global network of traceroute probes to help us test connections to the business applications we depend on.
Remember, with applications in the cloud, we need to test from all the different locations where we have employees and customers. Imagine the amount of deployment work to get them all set up — not to mention the ongoing maintenance!
Kentik solved the deployment and maintenance conundrum by setting up a global network of agents used for synthetic testing — see the Kentik Global Synthetic Network. These lightweight devices are located all over the world, in every major virtual public cloud (VPC) and service provider. These synthetic testing agents can be configured in the Kentik Network Observability Cloud to monitor any business application such as Salesforce, Office365 and more.
Performing a TCP traceroute to IP addresses and host names is just the beginning. Kentik Synthetics can be used to verify availability of specific content in web pages, test DNS servers to make sure they are responding in a timely manner, monitor the availability and responsiveness of API endpoints, and much more. These tests are performed in an automatic and periodic way, with testing intervals as low as the sub-minute range.
When the tests are all configured, you can trend the data collected and set thresholds at levels where you know the business will be impacted.
With the Kentik Network Observability Cloud, you gain the benefits of being able to baseline the performance of applications, websites and networks. You can hold vendors accountable when you know darn well that they are introducing 80% of the delay, which is annoying your customers and could be forcing them to abandon their online checkout. Doing nothing is costing your company money.
If the service provider doesn’t fix a loss or latency problem, you can find new transit and then verify that your new path is avoiding that troublesome network. To learn more, read “How to Monitor Traffic Through Transit Gateways.”
If you want to be proactive, you can deploy or make use of Kentik Synthetics agents in a remote geographic location and then run tests to see if a remote location like Inuvik, Canada can support an application at the necessary service levels.
If you’d like to learn more, reach out to the team at Kentik.