In part 1 of this series, we looked at the incorporation of BGP into the NetFlow protocol and the subsequent use of passive BGP sessions to capture live BGP attribute data. These two innovations enabled some valuable analysis of Internet traffic, including DDoS detection and the assessment of links for peering and transit. But because NetFlow-based visibility systems were architected around scale-up computing and storage, the full potential of NetFlow and BGP data was left unrealized. Big Data and cloud architecture changed all that.
One from many
As the public cloud has grown, scale-up computing has given way to scale-out models, which has opened up a great opportunity to improve both the cost and functionality of NetFlow and BGP analysis. In terms of cost, the scale-out, micro-services approach made popular by public cloud and SaaS vendors offers a big leap in price-performance ratio. Combining commodity bare-metal hardware with open source containers, you can create a purpose-built public cloud platform that delivers exceptional processing and storage capabilities at cost per ingested flow record that is dramatically lower than commercial appliances. Software on top of generic public cloud (AWS or equivalent) is also an option, but the performance overhead of VMs and the commercial overhead make the costs significantly higher.
From a functionality point of view, massive scale-out computing allows augmented NetFlow records to be unified instead of fragmented. In single-server architectures, NetFlow is put in one table, BGP in another, and GeoIP in yet another, and you then have to key-match those tables and rows to correlate the data. In a distributed system with sufficient compute and memory, you can instead ingest all of the NetFlow, BGP, and GeoIP into a single time-series database. As each NetFlow record comes in, you look at its time stamp, grab the latest relevant BGP UPDATE from memory, and augment the NetFlow records with a variety of BGP attribute fields. And you do the same for GeoIP. With a scale-out cluster, this can happen in real-time for millions of inbound flow records.
The performance impact of unifying all data into augmented flow records is pretty impressive. One metric that underscores the advantage is the number of rows that a system can query while returning results in a useable amount of time. In the older, single-server model, that number ranged from hundreds to thousands of rows of data across different tables. With a scale-out approach, the number jumps to millions or even billions of rows. This leap is accomplished by splitting the data into time-based slices, distributing the processing of those slices across different nodes in the cluster, and then aggregating the results. Instead of populating a limited set of report tables and then discarding the raw source data, a scalable, distributed architecture allows you to keep the raw data and build reports on the fly.
Real-world Big Data benefits
The leap in performance, flexibility, and detail that’s possible with a distributed Big Data architecture directly impacts NetFlow and BGP analysis use cases. Let’s take the example of DDoS detection. There is now a wide variety of available mitigation options, from remote-triggered black holes (RTBH) and FlowSpec to on-premises and cloud-based scrubbing. The most cost-effective approach for network operators is to be able to access, at scale, a vendor-neutral range of such tools. But that’s not an option in single-server scenarios, where software detection and mitigation must be tightly coupled and delivered by the same vendor.
BGP traffic analysis is another area where the distributed approach shines. Scale-up software can process a large set of BGP and NetFlow data and produce a picture of destination BGP paths according to traffic volume. However, that picture may take hours. The problem is, if someone wants to drill down on a portion of the picture to understand the “why” behind it, they’re basically out of luck.
Compare that to a Big Data scenario, where you have the speed and capacity to ingest and store raw flow records and to query them ad hoc. Real-time analysis of the full dataset means that the number of operationally relevant use cases explodes, because the number of different questions that you can ask is never limited by predefined reporting tables that you’ve had to populate in advance. In this approach, the combination of terms on which you can run a query in real time is nearly infinite. And because you can ask what you want when you want, it’s possible to enable a completely interactive — and therefore far more intuitive — presentation of BGP traffic paths.
The difference in utility is comparable to the difference between a one-time satellite snapshot of terrain versus a live camera drone that can fly in close and see any desired details. The snapshot is way better than nothing, but the drone makes it clear how limited the snapshot really is.
Evolving business with technology
By now it’s hopefully clear that the implementation of distributed, Big Data architecture has enabled a huge step forward in the evolution of NetFlow-based network visibility. There are multiple ways to take advantage of this advance, and a number of factors to consider when choosing how best to do so. For most organizations the most practical, cost-effective solution is a SaaS platform like the Kentik Network Observability Cloud.
The key point is that as the technical capabilities of NetFlow engines evolve so too does their business utility. If you rely on Internet traffic for revenue or key productivity tools (like IaaS, PaaS or SaaS), then your network isn’t just carrying packets, its carrying your actual business. Snapshots may be good enough for “nice to know” management dashboards, but as your operations grow you need deeper insight to improve the quality of your customer experience (whether internal or external) and your infrastructure ROI. That’s why it makes business sense to invest in NetFlow and BGP analysis capabilities that are based on state-of-the-art Big Data architecture.
We’d love the opportunity to show you the SaaS option. The easiest way to start? Take Kentik Detect for a test drive with a free trial.