NetFlow is a protocol that was originally developed by Cisco to help network operators gain a better understanding of their network traffic conditions. Once NetFlow is enabled on a router or other network device, it tracks unidirectional statistics for each unique IP traffic flow, without storing any of the payload data carried in that session. By tracking only the metadata about the flows, NetFlow offers a way to preserve highly useful traffic analysis and troubleshooting details without needing to perform full packet capture — the latter of which can be very expensive and yield few incremental benefits.
NetFlow monitoring solutions quickly evolved as commercialized product offerings represented by three main components:
Cisco started by providing NetFlow exporter functions in their various network products running Cisco’s IOS software. Cisco has since developed a vertical ecosystem of NetFlow partners who have mainly focused on developing NetFlow collector and analysis applications to fill various network monitoring functions.
In addition to Cisco, other networking equipment vendors have developed NetFlow-like or compatible protocols, such as J-Flow from Juniper Networks or sFlow from InMon, to create exporter interoperability with third-party collector and analysis application vendors that also support NetFlow, creating a horizontal ecosystem across networking vendors.
The IETF also created a standard flow protocol format called IPFIX that embraces NetFlow from Cisco but now serves as an open, industry-driven standards approach to consistently enhancing flow protocols for the entire networking industry instead of Cisco evolving NetFlow unilaterally.
NetFlow collector and analysis applications represent two key capabilities of NetFlow network monitoring products that are typically implemented on the same server. This is appropriate when the volume of flow data being generated by exporters is relatively low and localized. In cases where flow data generation is high or where sources are geographically dispersed, the collector function can be run on separate and geographically distributed servers (such as rackmount server appliances). In these cases, collectors then synchronize their data to a centralized analyzer server.
Products that support NetFlow components can be classified as follows (with example vendor products listed in each category):
NetFlow exporter support in a device:
Stand-alone NetFlow collector:
Stand-alone NetFlow analyzer:
Bundled NetFlow collector and analyzer:
Open source NetFlow network monitoring:
Network monitoring products that focus on machine and probe data:
Open source network monitoring tools seem like a good option, but they are very difficult to scale horizontally and most do not understand IP addresses as anything more than text, so prefix aggregation and routing data cannot be used.
Network incident monitoring vendors like Splunk collect a lot of machine and probe data. Many vendors in this product category are seeing the value of integrating NetFlow. However, these platforms are designed primarily to deal with unstructured data like logs. Highly structured data like NetFlow often contains fields with formats that require translation or correlation with other data sources to provide value to the end user.
With DDoS attacks on the rise, NetFlow has been increasingly used to identify these threats. NetFlow is most effective for DDoS troubleshooting when sufficient flow record detail is available and can be compared with other data points such as performance metrics, routing and location. Unfortunately, the state-of-the-art NetFlow analysis tools up until recently have been challenged to achieve troubleshooting effectiveness, due to data reduction. The volume of NetFlow data can be overwhelming with millions of flows per second, per collector for large networks.
Since most NetFlow collectors and analysis tools are based on scale-up software architectures hosted on single servers or appliances, they have extremely limited storage, compute and memory capacity. As a result, it is common practice to roll-up the details into a series of summary reports and to discard the raw flow record details. The problem with this approach is that most of the detail needed for operationally useful troubleshooting is lost. This is particularly true when attempting to perform dynamic baselining, which requires scanning massive amounts of NetFlow data to understand what is normal, then looking back days, weeks or months in order to assess whether current conditions are the result of a DDoS attack or an anomaly.
Cloud-scale computing and big data techniques have opened up a great opportunity to improve both the cost and functionality of NetFlow analysis and troubleshooting use cases. These techniques include:
The key to solving the DDoS protection accuracy issue is big data. By using a scale-out system with far more compute and memory resources, a big data approach to DDoS protection can continuously scan network-wide data on a multi-dimensional basis without constraints.
Cloud-scale big data systems make it possible to implement a far more intelligent approach to the problem, since they are able to:
Kentik Detect has the functional breadth for capturing all the necessary network telemetry in a big data repository to isolate even the most obscure DDoS attacks network events — as they happen or predicted in the future. Network visibility using NetFlow is key to managing your network and ensuring the best possible security measures. To understand more about NetFlow see this Kentipedia article and blog post. To see how Kentik Detect can help your organization monitor and adjust to network capacity patterns and stop DDoS threats, read this blog, request a demo or sign up for a free trial today.