Network Performance Monitoring (NPM) refers to the process of measuring, diagnosing and optimizing the service quality of a network as experienced by users. NPM requires multiple types of measurement or monitoring data on which engineers can perform diagnoses and analyses, such as:
Bandwidth: Measures the raw versus available maximum rate that information can be transferred though various points of the network, or along a network path.
Throughput: Measures how much information is being or has been transferred.
Latency: Measures network delays from the perspective of clients, servers and applications.
Errors: Measures raw numbers and percentages of errors such as Bit Errors, TCP retransmissions, and out-of-order packets
Application performance optimization – Monitor and troubleshoot performance issues for networked and distributed applications:
Datacenter traffic management – Monitor intra- and inter-datacenter performance issues. Isolate and troubleshoot infrastructure root causes.
Internet traffic management – Make efficient routing decisions by monitoring performance across hops (first, second, and third) and destination ASNs and geographies. Quickly and cost-effectively bypass network roadblocks by serving traffic from alternate PoPs or via alternate first-hop ASNs.
Cloud networking – Monitor relative quality of IaaS and other cloud providers to guide network connectivity architecture, vendor selection, and contract negotiation.
Network change and new deployment validation – Provide instant visibility for network changes and new deployments when building or changing applications, servers, network elements, circuits, or peering/transit.
NPM solutions have traditionally utilized an appliance deployment model. An appliance-based PCAP probe with one or more interfaces connects to router or switch span ports or to an intervening packet broker device (such as those offered by Gigamon or Ixia). The appliance records all packets passing across the interface into memory and then into longer-term storage. In virtualized datacenters, virtual probes may be used, but they are also dependent on network links in one form or another.
Physical and virtual appliances are costly from a hardware and (in the case of commercial solutions) software licensing point of view. As a result, in most cases, it is only fiscally feasible to deploy PCAP probes to a few, selected points in the network. In addition, the appliance deployment model was developed based on pre-cloud assumptions of centralized datacenters holding relatively monolithic application instances. As cloud and distributed application models have proliferated, the appliance model for packet capture is less feasible, because in many cloud hosting environments, there is no way to deploy even a virtual appliance.
A cloud-friendly and highly scalable SaaS model for network performance monitoring splits the monitoring function from the storage and analysis functions. Monitoring is accomplished with the deployment of lightweight monitoring software agents that export PCAP-based statistics gathered on servers and open source proxy servers such as HAProxy and NGNIX. Exported statistics are sent to a SaaS repository that scales horizontally to store unsummarized data and provides Big Data-based analytics for alerting, diagnostics and other use cases. While host-based performance metric export doesn’t provide the full granularity of raw PCAP, it provides a highly scalable and cost-effective method for ubiquitously gathering, retaining and analyzing key performance data, and thus complements PCAP.
Kentik offers the industry’s only Big Data-based, SaaS NPM solution that integrates nProbe host agent performance metrics and billions of NetFlow, sFlow, IPFIX, BGP records matched with geolocation data. Visit the Kentik NPM solution brief to get an overview, read the blog post to see how it works, or start a free trial.