Looking beyond the NetFlow-sFlow divide
In part 1 of this series, we looked at the origins of NetFlow, why it was extended in v9 through the use of templating, and what some of the pros and cons are of the templating approach. While NetFlow v9 and it’s follow-on protocol IPFIX offer tremendous flexibility there are some tradeoffs including complexity of implementation and the fact that a template must be received before the underlying flow data records can be correctly understood. These factors led to the development of a NetFlow/IPFIX alternative called sFlow®. In this post we’ll look at how sFlow works compared to NetFlow, and then consider where flow data protocols are headed next.
The sFlow difference
sFlow, which has been available in switches and routers since 2001, is the brainchild of InMon Corporation, whose continued control over the protocol is both benevolent and absolute. Instead of the templating approach taken in NetFlow v9 and IPFIX, sFlow employs the similar concept of protocol extensions. These extensions are defined and optional, but you can write them into the code of either your sFlow library or your device. Unlike NetFlow/IPFIX, there’s only one possible set of data types in any given sFlow implementation, so you don’t need to wait for a template before you can begin processing the flow data packets.
sFlow also differs from NetFlow/IPFIX in the way that flow records are generated. Routers and switches running NetFlow/IPFIX designate a collection of packets as a flow by tracking packets, typically looking for packets that come from and go to the same place and share the same protocol, source and dest IP address, and port numbers. This tracking requires CPU and memory — in some circumstances, a huge amount of it. For example, with a forged source-address DDoS attack, every packet can be a flow, and routers have to try to maintain massive tables on the fly to track those flows! Also, to cut down on CPU and network bandwidth, flows are usually only “exported” on average every 10 seconds to a few minutes. This can result in very bursty traffic on sub-minute time scales.
sFlow, on the other hand, is based on interface counters and flow samples created by the network management software of each router or switch. The counters and packet samples are combined into “sFlow datagrams” that are sent across the network to an sFlow collector. The preparation of sFlow datagrams doesn’t require aggregation and the datagrams are streamed as soon as they are prepared. So while NetFlow can be described as observing traffic patterns (“How many buses went from here to there?”), with sFlow you’re just taking snapshots of whatever cars or buses happen to be going by at that particular moment. That takes less work, meaning that the memory and CPU requirements for sFlow are less than for NetFlow/IPFIX.
Like NetFlow/IPFIX, sFlow is extensible and binary, but unlike NetFlow you can’t add or change data types independently by changing your own template, because InMon ultimately controls what can and can’t be done. On the upside, sFlow gives you faster feedback and better accuracy than many NetFlow implementations, and it is relatively sophisticated. From a coding point of view, it’s not trivial to implement, but it’s easier than implementing templated flow.
Flow for the future
Flow records incorporate the standard attributes of network traffic, but it’s often overlooked that today’s flow records can also incorporate many other types of data such as application semantics and network and application performance data. To create flow records that are augmented this way you’ve got to have an extensible method of passing data. Of course you can use IPFIX or NetFlow V9, but you run into the limitations discussed earlier: relatively high CPU/memory requirements in some situations, and not being able to process records until you receive the template, which can cause delays, particularly if you are using a high sample rate (template packets may slip through).
To deal with the latter problem one can create a form of NetFlow where templates are not delivered in-band, but are kept somewhere else that is globally available, allowing the definitions to be retrieved by endpoints. This approach would take a little bit less communication and require a lot less work to match templates to data packets. You could take the well-known data types like IPv4 and IPv6 and build a fast path for them. With templates out-of-band, the protocol would also be ‘re-sample-able’ in transport.
That still leaves the issue of what’s the best form to use for the extended data. You might choose a simple NetFlow v5-like C structure that is extended to add the fields you like, a binary serialization format such as protobufs or Cap’n Proto, or an ASCII format such as XML or JSON. While JSON has gotten very popular, there are two issues with using it for flow data. The first is that even when compressed JSON is bigger than binary formats. The second and more important issue is that with JSON you’re taking ASCII and converting it to binary, which makes JSON less efficient to parse. Most of the data systems that primarily store JSON natively would melt if you tried to do flow analytics with them. There is a way around this, however, which is to put JSON data into Apache Kafka, which can easily take JSON as a data bus into different systems where you could translate it into binary.
At Kentik™, we designed the Kentik Data Engine™ (KDE) to ingest flow records in heterogeneous protocols (NetFlow v5/v9, IPFIX, and sFlow) into a single unified database. So we’re protocol agnostic, but we’ve had lots of opportunity to think about the commonalities and distinctions of the protocols. The flexibility of NetFlow/IPFIX compared to sFlow can be a huge benefit to some of our customers, but processing IPFIX is relatively expensive and we want our data layer to be as efficient as possible. Also NetFlow can run over SCTP, which is encrypted, but that capability is not well supported by many exporting devices.
Given the above, we’ve given a lot of thought to how to structure and implement a protocol that transports flow metadata and allows it to be enriched with data that is only available close to the packets. As a result, we’ve developed our own flow record protocol, called KFlow™, that can be used by any Kentik customer that sends us flow records via the Kentik agent (rather than directly from routers or switches).
KFlow is based on a hybrid concept in which there are certain well-known attributes but also out-of-band templates. That way we know what every packet means as soon as we get it, but the data types are extensible. For our transport layer we use Cap’n Proto — a binary serialization library that provides a way of representing binary data in extensible binary formats — over https. That gives us an encrypted and efficient way to feed augmented flow data to our cloud.
KFlow has worked beautifully for billions of flow records per day since we introduced it in February of this year. Does the hybrid approach represent the future of flow protocols? We don’t know; the market will ultimately decide. But what does seem clear is that both flow data and flow protocols are areas that are ripe for continued innovation.
Want to find out more about KFlow, or learn how your business can benefit from network visibility that unifies NetFlow, IPFIX, and sFlow with BGP, GeoIP, and SNMP? Contact us to ask questions, to request a demo, or to start a free trial today. We’d love to hear from you.