How flow protocols adapt as network needs evolve

SpyglassNetwork flow records today are an integral part of the network operations landscape. But while NetFlow and its variants like IPFIX and sFlow are similar overall, beneath the surface there are significant differences in the way the protocols are structured, how they operate, and the types of information they can provide. These variations reflect the different histories and intent of the protocols, particularly how they approached the issues of extensibility and interoperability. NetFlow’s answer is vendor-extensible flow templating, while sFlow took a somewhat different path. In this series we’ll look at the advantages and disadvantages of each, and see what clues we can uncover about where the future of flow protocols might lead.

The origins of NetFlow

NetFlow began as the byproduct of a Cisco effort to enable faster execution of Access Control Lists (ACLs). New switching technology was developed to accelerate processing by treating packets that were going to the same place or traveling on the same path as a single group. To make it work, routers were equipped to read packet headers and use field values to group the packets.

CavepersonsThe original idea was simply to define all of the packets that would be permitted or denied by an ACL as a “flow,” and then to keep track of whether each such flow is accepted or denied by the ACLs for each interface. But then the network ops and architecture folks, who at the time were mostly limited to looking at SNMP totals, asked for a mechanism by which these flow records could be exported so that they could be used for network analytics. In response to those requests, NetFlow was born.

Even at the dawn of flow analytics, people were interested in many different kinds of “behind the SNMP” views. Starting with the basic ACLs of the mid 90s, the first flow fields added were IP addresses, ports, and protocol. By NetFlow V5 (the first widely adopted version), a variety of additional fields were included.

More of a good thing

As the utility of NetFlow became more widely realized, it didn’t take long for people to start asking for more. What about capturing MAC address, or VLAN tag, or IPv6? Later, users began wanting to add MPLS and other data that routers and switches could observe. And then they started wanting to track things that routers and switches couldn’t observe, like URL, DNS query, and application and network performance.

ScaffoldingThe problem of keeping up with these expanding requirements is that NetFlow 5 was built on a fixed data structure that didn’t have a place for the extra data fields. And the process for updating the NetFlow protocols was clumsy, so extending the structure every few months for things people wanted to add was going to be a huge pain. Plus even if you could keep up you’d eventually end up with packet headers that were uber-large because you’d have fields in there for data that many devices couldn’t grab.

While NetFlow v5 remains useable, its lack of additional data types (e.g. IPv6, MAC addresses, VLAN, and MPLS) makes it more limited than other alternatives. Cisco’s response to these limits, while trying to avoid packet bloat, was to use templating to abstract the metadata from the flow data itself. Introduced by Cisco in NetFlow v9 and carried forward in IPFIX — the IETF standard that is sometimes referred to as NetFlow v10 — templates provide a less rigid basis for a collector to interpret flow records. Sent in-band from the exporter to the collector, templates make NetFlow v9 and IPFIX very flexible.

Templating pros and cons

As with any other protocol design decision, templating involves both pros and cons. In NetFlow v9, templates are each identified with a template ID. While that’s helpful it doesn’t prevent a situation in which a given template ID might be used by multiple network equipment vendors, in which case each vendor’s equipment would likely be collecting and sending a different set of NetFlow data. IPFIX addressed this issue by creating vendor IDs that can each contain their own unique set of template IDs, making it easier to avoid namespace conflicts.

Figuring out which types of flow data the collector is getting can be a slow process.

Another issue with templates is that the template packets come very infrequently, and until you get one you don’t know what your collected flow data means. If you sample incoming packets then you may miss the template packets for a while and wind up with even more data of unknown meaning. And even though the template information is really the same from one exporter to another, the protocol specifies that you’re not supposed to cache and remember templates across multiple exporting devices. The result is that figuring out which types of flow data the collector is collecting can be a slow process.
Templating can also be a challenge to implement correctly because it involves a complex multistage process. That explains why many implementations were horribly inaccurate for the first decade or more of NetFlow v9’s existence.

On the positive side, NetFlow v9 is binary and relatively compact. And it fits in UDP packets, which makes it easy to transport. So while it was originally Cisco-only, people started creating new IDs and it became the standard, widely used among many vendors. As the standards-based follow-on version, IPFIX has also gotten a lot of traction.

Next time we’ll take a look at some key differences between NetFlow and sFlow, as well as speculate a bit on where today’s trends in network data are leading flow protocols. In the meantime, a number of our other Kentik blog posts, including our recent post on The Evolution of BGP NetFlow Analysis, can help you learn more about some of the many practical applications of flow data. And please don’t hesitate to contact us if you’re interested in arranging a Kentik Detect demo or starting a free trial.