Network performance monitoring metrics

Network Performance Monitoring (NPM) refers to the process of measuring, diagnosing and optimizing the service quality of a network as experienced by users.  NPM is complementary to Application Performance Management (APM).

Network Performance Monitoring addresses the network and Internet’s role in the end-user experience.  This includes metrics such as:

  • Latency–how much time it takes to get a response to a packet. This is measured bi-directionally. One direction of measurement looks at when a local host such as an application or load-balancing server (like HAProxy or NGINX) sends a packet to a remote host and times how long it takes to get a response back.  The other direction looks at when a packet is received from a remote host and measures how long it takes for the application (server) to send a response.
  • Number and % of out of order packets.  This is an important measure because TCP can’t pass data up to applications until bytes are in the right order.  Small numbers of out of order packets typically don’t affect things much, but when they get too high, they will impact application performance.
  • TCP retransmits.   When a portion of a network path is overloaded or having performance problems, it may drop packets. TCP ensures delivery of data by using ACKs to signal that data has been received.  If a sender doesn’t get a timely ACK from the receiver, it will resend a packet with the unacknowledged TCP segment.  When TCP retransmits go over very low single digit percentage levels, application performance starts to degrade.

NPM solutions have traditionally utilized an appliance deployment model.  An appliance-based PCAP probe with one or more interfaces connects to router or switch span ports or to an intervening packet broker device (such as those offered by Gigamon or Ixia).  The appliance records all packets passing across the span port into memory and then into longer-term storage.  In virtualized datacenters, virtual probes may be used, but they are also dependent on network links in one form or another. 

Physical and virtual appliances are costly from a hardware and (in the case of commercial solutions) software licensing point of view.  As a result, in most cases, it is only fiscally feasible to deploy PCAP probes to a few, selected points in the network.  In addition, the appliance deployment model was developed based on pre-cloud assumptions of centralized datacenters holding relatively monolithic application instances.  As cloud and distributed application models have proliferated, the appliance model for packet capture is less feasible, because in many cloud hosting environments, there is no way to deploy even a virtual appliance.

A cloud-friendly and highly scalable SaaS model for network performance monitoring splits the monitoring function from the storage and analysis functions.  Monitoring is accomplished with the deployment of lightweight monitoring software agents that export PCAP-based statistics gathered on servers and open source proxy servers such as HAProxy and NGNIX.  Exported statistics are sent to a SaaS repository that scales horizontally to store unsummarized data and provides Big Data-based analytics for alerting, diagnostics and other use cases.  While host-based performance metric export doesn’t provide the full granularity of raw PCAP, it provides a highly scalable and cost-effective method for ubiquitously gathering, retaining and analyzing key performance data, and thus complements PCAP.

Kentik offers the industry’s only Big Data-based, SaaS NPM solution that integrates nProbe host agent performance metrics and billions of NetFlow, sFlow, IPFIX, BGP records matched with geolocation data.   Visit the Kentik NPM solution page to get an overview, read the blog post to see how it works, or start a free trial to try it yourself.