Approaches to Network Performance Monitoring
Network Performance Monitoring tools and software have traditionally used PCAP appliance probes with one or more interfaces that connect to router or switch span ports or to an intervening packet broker device, such as those offered by Gigamon. The probe records into memory — and ultimately into longer-term storage — all packets passing across the interface. In virtualized data centers, virtual probes may be used, but they are also dependent on network links in one form or another.
NetFlow data enables extensive, near real-time network monitoring capabilities. Flow-based analysis techniques are utilized to visualize traffic patterns associated with individual routers and switches as well as on a network-wide basis (providing aggregate traffic or application based views) to provide proactive problem detection, efficient troubleshooting, and rapid problem resolution. For Network Performance Monitoring (NPM) use cases, Kentik augments Kentik Detect NetFlow visibility with Kentik kprobe agent software measurements. Kentik kprobe inspects packets, creates augmented flow records, and sends them to Kentik Detect for continuous analytics processing in parallel with other flow data sources in the environment.
This Kentik NPM solution enables network operators to easily pivot between traffic volumetrics and performance analysis to better diagnose issues and fix the root causes. This is facilitated by measurement of key NPM metrics:
- Application and network latency – Measuring how much time it takes to get a response to a packet.
- Number and percentage of out of order packets – This is an important measure because TCP can’t pass data up to applications until bytes are in the right order.
- TCP retransmits – When a portion of a network path is overloaded or having performance problems, it may drop packets. TCP ensures delivery of data by using ACKs to signal that data has been received.
- Receive and Zero-Window – Measures the TCP receive window size on receiving host. This can indicate when there is a problem with a host not being able to keep up with the amount of traffic it is receiving.
- Connection ID – The TCP or UDP Connection ID for the session that the reported flow belongs to.
Kentik NPM lets you see when users’ experience is suffering due to network or application performance issues, and helps you rapidly diagnose the root causes. Top use cases include:
- Application performance optimization – Troubleshoot distributed, micro-service application performance.
- Datacenter traffic management – Troubleshoot intra/inter-datacenter performance issues.
- Internet traffic management – Make efficient routing decisions by monitoring aggregate performance (loss/latency) across first, second, third hop and destination ASNs, and geographies.
- Cloud networking – Monitor relative quality of IaaS and other cloud providers as well as hybrid multi-cloud interactions.
- New deployment validation – For new applications, turning up new servers, network elements, circuit or peering points.
- Proactive alerting & notification – Baseline performance characteristics across any dimension and alert on deviations from normal.
Cloud & Saas
A cloud-friendly and highly scalable model for Network Performance Monitoring combines the deployment of lightweight host-based monitoring Kentik kprobe agents that export PCAP-based statistics gathered on servers and open source proxy servers such as HAProxy and NGNIX. Exported statistics are sent to Kentik’s SaaS big data repository that scales horizontally to store unsummarized data. While a host-based performance metric export doesn’t provide the full granularity of raw PCAP, it provides a highly scalable and cost-effective method for ubiquitously gathering, retaining and analyzing key network performance data.