There is a lot of confusion regarding the two primary data sets in network management: SNMP and flow. This post will help define both and provide more context on where and why to use each.
SNMP is used to collect metadata and metrics about a network device. This critical technology is a basic building block of modeling, measuring, and understanding the network. While there are newer technologies such as streaming telemetry (ST), which users have grand thoughts about, unfortunately ST is not a replacement for many aspects of SNMP (but this is a topic for another blog post!).
Flow technologies, such as NetFlow, sFlow, jFlow, IPFIX, and others, are used to describe traffic on the network. These technologies export data which describes conversations occurring across the specific network device. This includes port, IP source, destination, port numbers, and other markings such as quality of service (QoS). Flow incorporates a sampling configuration, which allows a specific percentage of conversations to export from the device.
Most organizations have a cloud strategy, this could augment data centers or replace them entirely, but regardless of the organizational goals, the network is increasingly important when cloud is being adopted. Cloud doesn’t just put more reliance on the network, but it creates new data sources which network professionals must incorporate into their network visibility strategy. To see into these new cloud environments, the major players in the cloud have introduced flow log technology (e.g., AWS VPC Flow Logs, Google VPC Flow Logs, and Azure NSG Flow Logs) which should be part of all flow strategies.
Flow and SNMP are used by many teams within the organization. Primarily, SNMP is used by operationally focused users trying to understand the key indicators that show issues in the health and operation of the network. This includes the devices themselves, along with the links between devices. SNMP data is also used by network engineers to troubleshoot reported problems along with network architects to do things like capacity planning.
Flow technologies are used by network engineers, network architects, and security professionals to understand traffic, why congestion or heavy usage is occurring, what causes traffic changes, and how these changes can be mitigated. This could mean finding an application or network owner or blocking malicious traffic, such as a DDoS. The flow analytics are used to make decisions on how traffic is being sent or received to other internet-connected peers via traffic engineering and optimization.
Essentially, while SNMP provides the answer to “what” is happening on the network, flow technologies answer the “where” and “who.” You need both sets of data to make intelligent decisions and optimize the entire network.
SNMP is the oldest of the network management protocols in use today. Without going into the details of implementations and the structure of SNMP itself, which has been written about extensively, we will focus on the use-cases.
SNMP is used to actively query (poll) a network device to collect information about the device. This is used during discovery to figure out what kind of device it is, and is used to collect data about the device, such the vendor who made the device, when it was last configured, what kind of hardware is in the device, and many runtime data points about configuration and usage of the device.
Aside from general metadata, you can also collect numerical data. This would be used to understand the status and usage of any particular subsystem on the device. Each time the device is polled, this data is returned and allows us to build a time series of the data. For example, you could see the CPU usage over the last 24 hours.
The challenge with SNMP is that you can define the polling interval as long or short as required. Most solutions poll at 1-5 minute intervals, but as you can imagine during that 5-minute interval, a lot of spikes and other changes can be missed. Remember that we aren’t just collecting one data point about the device, but we could be collecting thousands of data points in each polling interval. This means that as we increase the frequency, we can create load on the device, which in turn could cause performance issues for the primary function. This is why ST was created with a specific function to stream data from the device as a more efficient way to send high frequency data to other systems.
Flow technologies are critical data sources when measuring what, where, and how actual traffic is passed through a device. While this can be viewed on a single device, flow becomes much more interesting when you integrate and combine this traffic data with additional sources to be able to measure traffic moving between devices. Extending this enables an understanding of the end-to-end traffic flow and conversation. That includes adding in high-value data such as threat feed and threat modeling, routing, topology, and other important networking information to model answers to difficult questions.
Flow can also be used to understand consumption of bandwidth in a more granular manner. For example, looking per IP, per network and, oftentimes, into virtual constructs, such as MPLS, VPNs, and other virtual router devices, allows for a much deeper understanding of the traffic.
Similar to the trends we are seeing on the data center side with cloud adoption, we are seeing new application architectures which are creating new network constructs. As organizations adopt microservices architectures, they are most often deploying on top of orchestration platforms built on top of Kubernetes. These include complex networking overlays such as Flannel, Calico, and about a dozen others. Kubernetes architectures also often incorporate proxies like Envoy. Kentik supports enriching flow with these other data sets. That allows for visibility into the application traffic flows all the way down to the clusters, pods, and services. This allows the network and application teams to work better together.
The final advantage of flow is that, with sampling, you can collect a very small set of data to describe high volumes of traffic. This allows flow technologies to scale much better than packet-capture solutions while providing valuable insights into the traffic and traffic patterns on the network. While less accurate than describing and inspecting all of the packets in terms of application depth and performance data, flow can easily monitor terabits of traffic sampled or unsampled with much less required hardware or potential security issues.