NetFlow was originally developed to help network administrators gain a better end-to-end understanding of their network traffic. Once NetFlow is enabled on a router or other network device, it tracks unidirectional packet flow statistics related to TCP/IP, UDP/IP or ICMP sessions, without storing any of the payload data carried in that session. By tracking only the metadata about the flows, NetFlow offers a way to preserve highly useful traffic analysis and troubleshooting details without needing to perform full packet capture, which is very I/O and storage intensive.
When combined and correlated with SNMP device and interface data, BGP, and performance metrics, NetFlow can be used to monitor and diagnose a variety of network issues
NetFlow can be used for many different network troubleshooting issues including:
1. Troubleshoot network congestion problems:
2. Troubleshoot application performance issues:
3. Defend against DDoS attacks
4. Analyze network security anomalies:
NetFlow, and other flow-based analysis solutions, generate flow records based on the volume of traffic flows. Generating a UDP-based flow record for every flow can create a lot of telemetry data — typically 1% of operational traffic, which is significant overhead. Not all troubleshooting use cases require 1:1 flow export.
Fortunately, NetFlow, J-Flow and IPFIX provide for flow sampling, whereby exporting devices can be configured to sample 1:N flows to reduce telemetry traffic volume. For most network operations and DDoS detection/defense purposes, sampling anywhere from 1:1000 to 1:8000+ flows (depending on overall traffic volume) provides sufficient insight. For network security purposes, if NetFlow is required to provide for detailed forensics or a full audit traffic, then 1:1 flow export is required.
Since different portions of the network handle different volumes and types of traffic, it is possible to sample at different rates on different exporters. For example, internet-facing edge routers or datacenter core routers that handle huge volumes of traffic can be configured for high rates of sampling. Routers and switches at the aggregation layer, where security anomalies become apparent, don’t handle nearly as much traffic, and can be configured for 1:1 sampling to provide full granularity of insight.
NetFlow troubleshooting is most effective when sufficient detail is available and can be compared with other data points such as performance metrics, routing and location. Unfortunately, the state of the art of NetFlow analysis tools up until recently has presented a significant challenge to troubleshooting effectiveness, due to data reduction. Even with sampling, flow records can add up to a lot of data. Since most NetFlow collectors and analysis tools are based on scale-up software architectures hosted on single servers or appliances, they have extremely limited storage, compute and memory capacity. As a result, the common practice is to roll-up the details into a series of summary reports and to discard the raw flow record details. The problem with this, of course, is that most of the detail needed for operationally useful troubleshooting is lost.
Cloud-scale computing and big data techniques have opened up a great opportunity to improve both the cost and functionality of NetFlow analysis and troubleshooting. The market has long embraced SaaS as a delivery model for advanced products and capabilities and its now possible to apply this cost effective approach to network traffic visibility and analytics solutions.
Big data storage allows for the storage of huge volumes of augmented raw flow records instead of needing to roll-up the data to predefined aggregates that severely restrict analytical options. SaaS options save the network managers from incurring CAPEX and OPEX costs related to dedicated, on-premises appliances. Scale-out NetFlow analysis can deliver faster response times to operational analysis queries on larger data sets than traditional appliances.
To learn more about troubleshooting with NetFlow, read about Kentik’s network troubleshooting features, and our approach to big data NetFlow analysis. Also, check out this blog post on maximizing the value of network metadata, or view the video presentations and demos of Kentik on our Networking Field Day info hub.