In a sentence, NPMD or Network Performance Monitoring and Diagnostics is a proactive effort to collect network telemetry that will be useful in future troubleshooting efforts pertaining to both the end-to-end performance and security of a network.
This short definition, however, is entirely too vague because it doesn’t include details on the types of telemetry collected, how it is evolving, the types of devices sending the data, or the processes that have matured over the last decade. Not even a brief paragraph can fully explain how NPMD is improving the collective effort toward root cause identification. Read on to learn more…
It is well understood that an optimized network is critically important to digital business operations. The push to maintain a finely-tuned infrastructure is fed by a constant urge to stay competitive and grow the business. It has also accelerated the migration to the cloud and the adoption of container architectures which in turn has introduced blind spots into the traditional hop-by-hop visibility that network professionals have come to rely on.
The insights generated by NPMD have led to a recognition that there is a need to better align the goals of network operations with those of security operations. These mutual interests include:
Whether it is a router, switch, or network device, it will likely be monitored the first time it transmits a packet onto the network. It doesn’t matter if the environment is a LAN/WAN, software-defined networking (SDN), or network function virtualization (NFV) component.
One of the goals is to monitor, measure, diagnose and generate alerts for any IP address in the aforementioned environment. This includes Internet of Things (IoT), cloud-hosted services (e.g., containers), wireless endpoints, and servers/VMs. To gain insight into these devices, they must send messages on their health to a collection point.
Devices connected to the network send data in multiple formats. Most of these transmissions are standards-based or in a syntax common enough to be considered a standard such as NetFlow, IPFIX, sFlow, SNMP, Syslog, event logs, and packet capture. More recently, the JSON format has been used when sending network performance and security telemetry.
DNS data exfiltration is another transmission technique that can be used for both the malicious and legitimate transmission of information. There are many others, and more methods are sure to become available. Data sources ingested by NPMD solutions may include many different types of events, device metrics, streaming telemetry and contextual information. In this short video, Kentik CEO Avi Freedman discusses the many types of data and integrations that are important to improving network observability.
This video is a brief excerpt from “5 Problems Your Current Network Monitoring Can’t Solve (That Network Observability Can)” — you can watch the entire presentation here.
Finding the root cause of network problems has been the goal of just about every network troubleshooting tool or platform released over the last 30 years. Although the information from these systems helps form a greater context around a specific problem, in most cases, only the technician can unearth the exact source of the problem.
Seldom will network monitoring by itself tell us exactly the who, what, when, where and why of an issue and even more rarely, the order of events that lead up to the problem. NPMD however, hopes to make strides here as well. NPMD platforms intend to aid the troubleshooter by guiding them with diagnostic workflows. The interfaces supporting this initiative serve up the forensic data needed to more methodically guide the NetOps team, for example, to an ultimate grasp of how exactly the performance degradation was introduced.
To further aid in this effort, artificial intelligence for IT operations (“AIOps”) functionality can be used to provide insight into the quality of the end-user experience or to help surface problems that might not get noticed by the ops team. It’s basically more context. By studying the same network-derived performance telemetry outlined above, some vendors are delivering AI-driven insights. The ultimate root cause of most issues, however, will likely continue to be derived by the human.
Just as its name implies, NPMD tools have the ability to monitor, diagnose and generate alerts for dynamic end-to-end network service delivery. An adjacent technology, DEM or Digital Experience Monitoring, focuses more on the end-user. Although NPMD and DEM share a similar goal—improving performance—one focuses more on how the network is dealing with connections and the other on how the end-user is experiencing the connections. There is certainly a division albeit a bit blurry at times. Consider a few differentiators:
Path: NPMD is aware of the network path from one AS (Autonomous System) to another, or router to router to any destination taken. DEM might issue traceroutes from an end system that returns the hop-by-hop route taken to a very specific destination.
Availability: NPMD might ping all devices on a network to ensure they are up and running and supporting all possible paths whereas DEM focuses largely on the availability of selected applications.
Latency and Jitter: In the past, NPMD predecessors delivered latency information (e.g., Cisco IPSLA, Medianet, etc.) but for the most part, the market hasn’t found tremendous value here. DEM, on the other hand, provides more accurate latency metrics closer to the source (i.e., the application itself) and tends to be more representative of the end-user experience. Synthetic testing can be used to provide network latency, jitter, and packet loss information that complements DEM functionality as it pertains to network performance.
Holistic: NPMD is engineered to ingest data from nearly everything from any device and gives accurate general information about all devices on the network. DEM was developed to share very detailed performance information only from the end systems taking measurements and only on selected applications.
Where there are challenges, there are opportunities. The NPMD space is no different. As just one example, the evolution to cloud computing and cloud (and hybrid-cloud) networking has brought new challenges in observing, monitoring, and diagnosing new types of network infrastructure—where some or all of an organization’s network capabilities and resources are hosted in a public or private cloud platform.
In response to the increasing complexity of today’s networks and the sheer volume of data collected, some vendors have begun incorporating artificial intelligence/machine learning (AI/ML) analytics. Although the logic involved in anomaly detection, event correlation, and root cause analysis (RCA) has had to change, some vendors are seeing an improvement in event detection through the use of AI/ML technologies.
As of late, the emphasis appears to be on optimizing the customer experience. Some NPMD vendors are listening and providing views that are more service- and application-focused. Other vendors are incorporating DEM capabilities in the form of synthetic transaction monitoring (STM) and path awareness into their NPMD platforms.
The Kentik Network Observability Cloud offers a modern, SaaS-based approach to network performance monitoring and diagnostics, combining flow-based monitoring, cloud network observability and synthetic monitoring features that allow for proactive monitoring of all types of networks.
Start a free trial to try it yourself.