Understanding Cloud Network Performance Monitoring: A Tutorial
What is Cloud Network Performance Monitoring?
The rapid growth of the Internet, and the proliferation of cloud services that complement traditional enterprise data centers has increased the complexity of network traffic flow patterns. Enterprises are not only aggressively adopting cloud infrastructure (IaaS), development platforms (PaaS) and software (SaaS), but are creating more digital business initiatives that target end-users across the Internet with digital offerings such as social media, ecommerce, and mobile applications.
Network Performance Monitoring (NPM) refers to the process of measuring, diagnosing and optimizing the service quality of a network as experienced by users. NPM requires multiple types of measurement or monitoring data on which engineers can perform diagnoses and analyses. Example categories of NPM monitoring data are:
- Bandwidth: Measures the raw versus available maximum rate that information can be transferred though various points of the network, or along a network path.
- Throughput: Measures how much information is being or has been transferred.
- Latency: Measures network delays from the perspective of clients, servers and applications.
- Errors: Measures raw numbers and percentages of errors such as Bit Errors, TCP retransmissions, and out-of-order packets
Network managers are confronting a major challenge, which is the mismatch of traditional NPM approaches and solutions with their hybrid cloud and increasingly distributed application realities.
Pre-Cloud NPM approaches
Current network performance monitoring approaches were architected based on pre-cloud assumptions, such as centralized data centers running monolithic applications. As a result, NPM solutions have traditionally utilized an appliance deployment model. An appliance-based PCAP probe with one or more interfaces connects to router or switch span ports or to an intervening packet broker device (such as those offered by Gigamon or Ixia). The appliance records all packets passing across the interface into memory and then into longer-term storage. In virtualized datacenters, virtual probes may be used, but they are also dependent on network links in one form or another.
With application components increasingly distributed across cloud environments, and users located across the Internet as well as within an enterprise WAN, traditional NPM methods of data ingestion become unusable in many cases. For example, oacket capture accomplished with physical or virtual appliances does not have a place to instrument in many public cloud environments.
In addition, the sheer scale of todays’ cloudified application communications and proliferation of network traffic flows means that appliance-based models for storing and performing analysis are becoming outmoded due to pre-cloud storage and computing constraints.
A cloud-friendly and highly scalable SaaS model for network performance monitoring splits the monitoring function from the storage and analysis functions. Monitoring is accomplished with the deployment of lightweight monitoring software agents that export PCAP-based statistics gathered on servers and open source proxy servers such as HAProxy and NGNIX. Exported statistics are sent to a SaaS repository that scales horizontally to store unsummarized data and provides Big Data-based analytics for alerting, diagnostics and other use cases. While host-based performance metric export doesn’t provide the full granularity of raw PCAP, it provides a highly scalable and cost-effective method for ubiquitously gathering, retaining and analyzing key performance data, and thus complements PCAP.
Kentik offers the industry’s only Big Data-based, SaaS NPM solution that integrates nProbe host agent performance metrics and billions of NetFlow, sFlow, IPFIX, BGP records matched with geolocation data. Visit the Kentik NPM solution page to get an overview, read the blog post to see how it works, or start a free trial to try it for yourself.