Kentik - Network Observability

Network Performance Monitoring (NPM)

What is Network Performance Monitoring (NPM)?

Network performance monitoring (NPM) is the process of measuring, diagnosing and optimizing the service quality of a network as experienced by users. Network performance monitoring tools combine various types of network data (for example, packet data, network flow data, metrics from various types of network infrastructure devices, and synthetic tests) so that a network’s performance, availability and other important metrics can be analyzed.

NPM solutions may enable real-time, historic or even predictive analysis of a network’s performance over time. NPM solutions can also play a role in understanding the quality of end-user experience, using network performance data—especially data gathered from active, synthetic testing (in contrast to passive forms of network performance monitoring such as packet or flow data collection).

Network Performance Monitoring (NPM)

NPM requires multiple types of measurement or monitoring data on which engineers can perform diagnoses and analyses. Example categories of NPM monitoring data are:

  • Bandwidth: Measures the raw versus available maximum rate that information can be transferred though various points of the network, or along a network path. (Learn more about bandwidth utilization monitoring.)

  • Throughput: Measures how much information is being or has been transferred.

  • Latency: Measures network delays from the perspective of clients, servers and applications.

  • Jitter: Measures the inconsistency of data packet arrival intervals or the variation in latency over time.

  • Errors: Measures raw numbers and percentages of errors such as bit errors, TCP retransmissions, and out-of-order packets

NPM solutions are sometimes referred to as “Network Performance Monitoring and Diagnostics” (NPMD) solutions. Most notably, industry analyst firm Gartner calls this the NPMD market which it defines (in the 2020 Market Guide for Network Performance Monitoring and Diagnostics) as “tools that leverage a combination of data sources. These include network-device-generated health metrics and events; network-device-generated traffic data (e.g., flow-based data sources); and raw network packets that provide historical, real-time and predictive views into the availability and performance of the network and the application traffic running on it.”

Network Performance Monitoring Data Collection

Network performance monitoring has traditionally drawn on data from sources including SNMP polling, traffic flow record export, and packet capture (PCAP) appliances. A host monitoring agent combined with a SaaS/big data back-end model provides an additional, more cloud-friendly approach. Modern NPM solutions also provide the ability to ingest and analyze cloud flow logs created by cloud-based systems (such as AWS, Azure, Google Cloud, etc.).

SNMP Polling

SNMP is an IETF standard protocol, the most common method for gathering total bandwidth, utilization, available bandwidth, and error measurements on a per-interface basis. SNMP uses a polling-based approach via management information bases (MIBs) such as the standards-based SNMP MIB II for TCP/IP-based networks. Typically, large networks only poll in five minute intervals to avoid overloading the network with management data. A downside of SNMP polling is lack of granularity, since multi-minute polling intervals can mask the bursty nature of network data flows, and interface counters only provide an interface-centric view.

Traffic Flow Record Export

Traffic flow records are generated by routers, switches and dedicated software programs by monitoring key statistics for uni-directional “flows” of packets between specific source and destination IP addresses, protocols (TCP, UDP, ICMP), port numbers and ToS (plus other optional criteria). Every time a flow ends or hits a pre-configured timer limit, the flow statistics gathering is stopped and those statistics are written to a flow record, which is sent or “exported” to a flow collector server.

There are several flow collection standards including NetFlow, sFlow and IPFIX. NetFlow is the trade version created by Cisco and has become a defacto industry standard. sFlow and IPFIX are multi-vendor standards, one governed by InMon and the other specified by the Internet Engineering Task Force (IETF).

Flow records are far more voluminous than SNMP records, but provide valuable details on actual flows of traffic. The statistics from flow records can be utilized to create a picture of actual throughput. Flow information can also be used to calculate interface utilization by reference to total interface bandwidth. Furthermore, since flow data must include source and destination IP addresses, it is possible to map recorded flows to routing data such as BGP routing internet paths. This data integration is highly valuable for network performance monitoring because the network or internet path may correlate to performance problems occurring in particular networks (known as Autonomous Systems in BGP parlance) that comprise an internet path.

NetFlow records statistics based only on the packet headers—and not on any packet data payload contents—so the information is meta data, rather than payload data. Secondly, while it is possible to measure every flow, most practical network implementations use some degree of “sampling” where the NetFlow exporter only monitors one in a thousand or more flows. Sampling limits the fidelity of NetFlow data, but in a large network, even 1:8000 sampling is considered statistically accurate for network performance management purposes.

VPC/Cloud Flow Logs

Similar to flow records generated by network infrastructure components, cloud-based applications, systems, and virtual private clouds can also export network flow data. For example, in AWS (Amazon Web Services) virtual private clouds can be configured to capture and export “VPC Flow Logs” which provide information about the IP traffic going to and from network interfaces in a given VPC.

As in NetFlow-type sampling, VPC Flow Logs record a sample of network flows sent from and received by various cloud infrastructure components (such as virtual machine instances, Kubernetes nodes, etc.) and these can be ingested by an NPM solution to provide monitoring and analytics for cloud-based networks.

Packet Capture (PCAP) Software and Appliances

Packet capture involves the recording of every packet that passes across a particular network interface. With PCAP data, the information collected is granular, since it includes both packet headers and full payload. Since an interface will see packets going in and out, PCAP can be used to precisely measure latency between an outbound packet and its inbound response, for example. PCAP provides the richest source of network performance data.

PCAP can be performed using software utilities such as TCPDUMP and Wireshark on an individual server. For a skilled technician, this can be a very effective way to understand network performance issues. However, since it is a manual process, and requires fairly in-depth knowledge of the utilities, it is not a very scalable approach.

To improve on this manual approach, an appliance-based PCAP probe may be used. The probe has multiple interfaces connected to router or switch span ports or to an intervening packet broker device (such as those offered by Gigamon or Ixia). In some cases, virtual probes can be used, but they are dependent on network links in one form or another.

A major downside to PCAP appliances is the expense of deployment. Physical and virtual appliances are costly from a hardware and (in the case of commercial solutions) software licensing point of view. As a result, in most cases, it is only fiscally feasible to deploy PCAP probes to relatively few, selected points in the network. In addition, the appliance deployment model was developed based on pre-cloud assumptions of centralized data centers of limited scale, holding relatively monolithic application instances. 

As cloud and distributed application models have proliferated, the appliance model for packet capture is less feasible, because of the wide distribution of application components in VMs or containers, and because of the fact that in many cloud hosting environments, there is no way to deploy even a virtual appliance.

Host Agent Network Performance Monitoring

A cloud-friendly and highly scalable model for network performance management combines the deployment of lightweight host-based monitoring agents that export PCAP-based statistics gathered on servers and open-source proxy servers such as HAProxy and NGNIX. Exported statistics are sent to a SaaS repository that scales horizontally to store unsummarized data and provides big data-based analytics for alerting, diagnostics and other use cases. 

While host-based performance metric export doesn’t provide the full granularity of raw PCAP, it provides a highly scalable and cost-effective method for ubiquitously gathering, retaining and analyzing key performance data, and thus complements PCAP. An example of a host-based NPM agent is Kentik’s kprobe.

Synthetic Monitoring/Synthetic Testing

Increasingly, modern Network Performance Monitoring solutions are incorporating synthetic monitoring features, which are traditionally associated with a process/market called “Digital Experience Monitoring”. In contrast to flow or packet capture (which we might characterize as passive forms of monitoring), synthetic monitoring is a means of proactively tracking the performance and health of networks, applications and services.

In the networking context, synthetic monitoring means imitating different network conditions and/or simulating differing user conditions and behaviors. Synthetic monitoring achieves this by generating different types of traffic (e.g., network, DNS, HTTP, web, etc.), sending it to a specific target (e.g., IP address, server, host, web page, etc.), measuring metrics associated with that “test” and then building KPIs using those metrics.

(See also: “What is Synthetic Transaction Monitoring”.)

Other Integrations Important to NPM and Network Observability

NPM data sources are not limited to the types discussed above and may encompass many types of events, device metrics, streaming telemetry and contextual information. In this short video, Kentik CEO Avi Freedman discusses the many types of data and integrations that are important to improving network observability.

This video is a brief excerpt from “5 Problems Your Current Network Monitoring Can’t Solve (That Network Observability Can)”—you can watch the entire presentation here.

What are the Benefits of Network Performance Monitoring?

Network performance monitoring (NPM) provides numerous benefits for NetOps professionals and organizations. These benefits contribute to improved network visibility, reliability, and efficiency, ultimately leading to better user experiences and business outcomes. Key benefits of NPM include:

  • Proactive issue detection: NPM tools enable early identification of network performance problems, allowing teams to address and resolve issues before they escalate and impact end users or critical business operations.

  • Faster troubleshooting: With comprehensive visibility into network traffic and performance data, NetOps professionals can more efficiently pinpoint the root cause of network issues, reducing the time it takes to resolve problems and minimize downtime.

  • Optimized network performance: NPM solutions provide insights into network bottlenecks, latency, and other performance-related issues, enabling organizations to optimize their networks for better performance and user experiences.

  • Capacity planning and resource allocation: By monitoring network utilization and performance trends, NPM tools help organizations make informed decisions about capacity planning, resource allocation, and infrastructure investments to meet current and future demands.

  • Enhanced security: NPM solutions can help identify unusual traffic patterns, potential DDoS attacks, and other security threats, allowing organizations to take swift action to protect their networks and data.

  • Improved end-user experience: Proactively monitoring and optimizing network performance ultimately leads to better end-user experiences, which can have a direct impact on customer satisfaction, employee productivity, and overall business success.

Network Performance Monitoring Challenges

Despite the many benefits of network performance monitoring, there are also challenges that NetOps professionals and organizations must contend with. Some of the key challenges include:

  • Complexity and scale: Modern networks are increasingly complex, spanning on-premises, cloud, hybrid and container (e.g., Kubernetes) networking environments. Managing and monitoring performance across these diverse and often large-scale networks can be challenging, particularly as organizations adopt new technologies and services.

  • Data overload: NPM solutions generate vast amounts of data from multiple sources, including flow records, packet capture, and host agents. Managing and analyzing this data to extract meaningful insights can be overwhelming, requiring the right tools and expertise.

  • Integration with other tools and systems: NPM solutions often need to integrate with various other network management tools, security systems, and IT infrastructure components. Ensuring seamless integration and data sharing between these disparate systems can be challenging.

  • Cost and resource constraints: Deploying and maintaining NPM solutions can be expensive, particularly for large-scale networks or when using appliance-based packet capture probes. Organizations must balance the need for comprehensive network visibility and performance monitoring with cost and resource constraints.

  • Keeping pace with technological advancements: As network technologies continue to evolve, NPM tools and methodologies must keep pace. NetOps professionals must stay informed about the latest developments and best practices in network performance monitoring to ensure their organizations remain agile and competitive.

Network Performance Monitoring Best Practices

To maximize the value of network performance monitoring and overcome its challenges, NetOps professionals should consider adopting the following best practices:

  • Leverage multiple data sources: Utilize a variety of data sources, including SNMP polling, traffic flow records, packet capture, host agents, and synthetic monitoring to gain a comprehensive view of network performance.

  • Establish performance baselines: Determine baseline performance metrics for your network to quickly identify deviations and potential issues. Regularly review and update these baselines as your network evolves.

  • Implement proactive monitoring and alerting: Set up proactive monitoring and alerting to detect performance issues before they impact end users or business operations. Fine-tune alert thresholds to minimize false positives and ensure timely notifications.

  • Invest in scalable, cloud-friendly NPM solutions: Choose NPM tools that can scale to meet the needs of your growing network and are compatible with cloud-based and hybrid environments.

  • Continuously review and optimize: Network performance monitoring should be an ongoing, iterative process. As the network environment changes, network administrators should continuously review and adjust monitoring strategies, baselines, and optimization efforts to ensure that the network remains in optimal condition.

About Kentik’s NPM Solution

Kentik offers the industry’s only big data-based, SaaS network observability solution that integrates network agent performance metrics with billions of NetFlow, sFlow, IPFIX, cloud flow log, and BGP records matched with geolocation and other forms of enrichment data. Kentik’s solution also incorporates synthetic monitoring features that allow for proactive monitoring of all types of networks.

Start a free trial to try it yourself or request a personalized demo.

We use cookies to deliver our services.
By using our website, you agree to the use of cookies as described in our Privacy Policy.