Kentik - Network Observability
Kentik Blog
Kentik Blog

How to Monitor Packet Loss and Latency in the Cloud

Cloud Monitoring

NetOps teams have quickly learned the benefits of hosting applications in the cloud. But before they migrated or adopted a few SaaS applications, they knew in the back of their minds that monitoring performance would be difficult. A tiny voice was asking, “How will we monitor packet loss and connection latency, hop-by-hop, when using cloud applications?”

Packet Loss Causes Problems with Cloud Applications

It isn’t so much that packet loss is a huge problem — TCP and QUIC were engineered in anticipation that lost packets would be inevitable. If there is a problem, it rests primarily in two areas:

  • Lost data: Older technologies (syslogs, SNMP traps, NetFlow, RTP, etc.) that run over UDP with no acknowledgement that the messages sent arrived at the destination. When using these technologies, we hope the data makes it.
  • Latency: Missing packets in connection-oriented protocols such as TCP and QUIC result in lost packets getting resent. These packet retransmits introduce latency.

Most of us have experienced packet loss, like choppy video in a conference call. Or standing in line at the bank or a department store and when the clerk says, “The system is really slow today.” Thinking back on these situations, would you guess that the problem was packet loss or latency? How would you confirm your answer?

What Causes Packet Loss?

Packet loss can be caused by a wide variety of factors, including:

  • Bad cables (electrostatic interference)
  • Poor connectors (poor assembly or not fully plugged in)
  • Bad hardware (the switch or router port)
  • Miss-matched duplexing (full vs. half)
  • Firewall configured to drop packets (all ICMP is dropped)
  • Congestion on the interface (priority queues are congested)
  • Overwhelmed router
  • UDP vs. TCP
  • TTL expired

Typically, poor connections at the physical layer, such as bad cables or connectors, are the problem. Congestion in the form of high-connection utilization or an overworked router in the path is another common source of trouble.

Detecting Packet Loss

Trying to source the location that is causing the packet loss is not always a trivial practice. For example, what if the destination of the connection is somewhere in the cloud? How do you identify where the packets are getting dropped?

Here’s a list of utilities and techniques that many of us are still using today to detect and isolate network packet loss:

  1. Ping: Using ping on the command prompt to measure packet loss is generally done to verify connectivity of a host, but “request timed out” could indicate if there is any packet loss. You have to remember that ping rides on top of ICMP, and in a congested network it is one of the first protocols to get dropped by a busy router. For this reason, ICMP is not a reliable protocol, and more importantly, it doesn’t tell you where in the path the packets were dropped.
  2. SNMP: By polling all the SNMP devices on the network, packet loss details can be collected and a threshold can be set that notifies the NetOps team. Since we are focused on the cloud in this article, we find that SNMP is great for LANs and WANs, but we can’t use it to see inside devices within the cloud.
  3. Packet capture: By strategically locating one or more probes off of mirrored ports on the network, sessions can be monitored, but if the loss occurred in the cloud, we will have no idea where it occurred. In production networks, packet probes are great for deep troubleshooting, but they are expensive and simply can’t be located everywhere we need them.
  4. TCP traceroute: A TCP trace reaches out to every router in the path to a target destination. The log generated by the trace reports on each router in the path as well as any corresponding packet loss.
How to monitor packet loss and latency
Detecting packet loss with TCP: Notice that TCP traceroute reports on the latency of each round trip connection, making it a good diagnostic tool when it is deployed at scale.

After reviewing the above four ways to monitor for packet loss, you might think that TCP traceroute will solve the observability riddle, but there’s a problem. The cloud is made up of thousands of routers. This means we would need a massive global network of traceroute probes to help us test connections to the business applications we depend on.

Remember, with applications in the cloud, we need to test from all the different locations where we have employees and customers. Imagine the amount of deployment work to get them all set up — not to mention the ongoing maintenance!

How to Monitor Packet Loss and Latency in the Cloud

Kentik solved the deployment and maintenance conundrum by setting up a global network of agents used for synthetic testing — see the Kentik Global Synthetic Network. These lightweight devices are located all over the world, in every major virtual public cloud (VPC) and service provider. These synthetic testing agents can be configured in the Kentik Network Observability Cloud to monitor any business application such as Salesforce, Office365 and more.

"[Kentik] is actually showing you a visual representation of how your physical equipment connects via those virtual private connections in the cloud. It also gives us information like latency, throughput, and jitter, across on-prem, through the cloud, and back on-prem."
Jeremy Schulman with Major League Baseball

Performing a TCP traceroute to IP addresses and host names is just the beginning. Kentik Synthetics can be used to verify availability of specific content in web pages, test DNS servers to make sure they are responding in a timely manner, monitor the availability and responsiveness of API endpoints, and much more. These tests are performed in an automatic and periodic way, with testing intervals as low as the sub-minute range.

Establishing Baselines and Setting Thresholds for Packet Loss and Latency

When the tests are all configured, you can trend the data collected and set thresholds at levels where you know the business will be impacted.

Packet Loss, Latency, Jitter Monitoring with the Cloud
Measuring packet loss and latency in the cloud with Kentik Synthetics

With the Kentik Network Observability Cloud, you gain the benefits of being able to baseline the performance of applications, websites and networks. You can hold vendors accountable when you know darn well that they are introducing 80% of the delay, which is annoying your customers and could be forcing them to abandon their online checkout. Doing nothing is costing your company money.

If the service provider doesn’t fix a loss or latency problem, you can find new transit and then verify that your new path is avoiding that troublesome network. To learn more, read “How to Monitor Traffic Through Transit Gateways.”

If you want to be proactive, you can deploy or make use of Kentik Synthetics agents in a remote geographic location and then run tests to see if a remote location like Inuvik, Canada can support an application at the necessary service levels.

If you’d like to learn more, reach out to the team at Kentik.

These might interest you…

Join the Kentik Slack Community
Be part of a community of Kentik users who can help you along the way.
Join Now
We use cookies to deliver our services.
By using our website, you agree to the use of cookies as described in our Privacy Policy.