Understanding Cloud IPFIX Analysis: A Tutorial
What is Cloud IPFIX Analysis?
Cloud-based IPFIX analysis is where key elements of IPFIX analysis solutions are deployed in public or private cloud environments. There are two major variants of cloud-based IPFIX analysis deployments: virtualized appliances and multi-tenant SaaS.
The first form of cloud IPFIX analysis is achieved by deploying virtualized versions of classic IPFIX analysis elements, such as IPFIX collector and analysis appliances. In this model, when offered as a commercial service, separate virtual appliances are deployed in a cloud datacenter to handle each separate customer network. There are clear advantages to this model over hardware appliances, namely that it leverages commodity private or public cloud infrastructure and provides for flexibility in terms of changing the numbers of collectors or other virtual appliances to meet demand without the need to rack and cable new hardware.
The limitation of this type of cloud-based IPFIX analysis is that it doesn’t actually harness the scale-out potential of true cloud computing architecture. Due to the relatively low cost-performance of scale-up appliance architectures, in practice, virtualized appliances hit very similar performance, storage, and analytical speed constraints as older hardware appliances.
Cloud-Scale IPFIX Analysis
A second variant of cloud-based IPFIX analysis is one where the design is based on scale-out cloud computing principles. In this model, a cluster of computing and storage resources can scale-out on multiple dimensions. Servers are allocated to ingest IPFIX records, augment them with other related network data such as BGP and GeoIP, store them for processing, and support queries from analysis application clients. This approach insures that the capacity can be scaled flexibly to meet stringent performance requirements even as ingest and query rates grow significantly.
Cloud-scale computing has opened up a great opportunity to improve both the cost and functionality of IPFIX Analysis. From a functionality point of view, massive scale-out computing allows for the storage of huge volumes of augmented raw flow records instead of needing to roll-up the data to predefined aggregates that severely restrict analytical options.
In a distributed scale-out model with sufficient compute and memory, IPFIX and other network telemetry such as BGP and GeoIP can all be ingested into a single time-series database. As each IPFIX record comes in, the system can look at its time stamp, grab the latest relevant BGP update from memory, and augment the IPFIX records using a variety of BGP attribute fields. With a cloud-scale cluster, this can happen in real-time for tens of millions of inbound flow records.
IPFIX analysis querying with instant response is key to operational utility. Engineers need to be able to look at aggregate traffic volume across a multi-terabit network, drill down to individual IPs and conversations, pivot views, filter and segment network traffic in any combination on-the-fly, so they can isolate the information that needs to be assessed to make effective decisions.
This type of performance requires a post-Hadoop big data approach more akin to that offered by Google Dremel/BigQuery. Many big data approaches to IPFIX analysis utilizes Hadoop technology to store and query IPFIX records using MapReduce. The key issues with Hadoop/MapReduce are the slow speed and fragility of creating data cubes to support responsive queries.
A ground-up cloud-scale IPFIX analysis design meets the scale, flexibility and response time needs of network operators. For example, a cloud-scale architecture can make data available for querying within a few seconds of ingest, as well as deliver answers to multi-dimensional, custom-filtered traffic analysis queries in a few seconds.
To get other perspectives and details on cloud-based NetFlow, sFlow and IPFIX analysis see…