How Scalable Architecture Boosts DDoS Detection Accuracy
How Scalable Architecture Boosts Accuracy in Detection
Last week’s massive attack on DNS provider Dyn — with its attendant disruption to many web companies and their users — was yet another reminder of the severity of the DDoS threat. Though spectacular exploits against Internet infrastructure providers like Dyn are just a fraction of overall attacks, complacency is not a viable option. Success in digital business requires maximum availability and performance, which depends in turn on effective, comprehensive DDoS defense.
Today’s most common approach to DDoS defense involves out-of-band detection appliances coupled with hybrid cloud mitigation. While the network environment — traffic volume, infrastructure distribution, and the size and sophistication of attacks — has evolved dramatically in recent years, the appliances themselves remain largely unchanged. Is appliance-based detection keeping up, or is it inherently limited in ways that have real consequences for network protection?
In the prevailing appliance-based model, an out-of-band appliance detects attacks based on NetFlow, sFlow, IPFIX, and BGP data. This appliance then signals the network, via network control plane or element management protocols, to either drop traffic at the network edge or redirect traffic to a private or public cloud mitigation device.
Over time, only a fraction of total traffic will need to be mitigated. So the cost-efficiency of the system is maximized by dedicating one or more appliances to detection and selectively pushing traffic to mitigation devices. The out-of-band architecture also provides the option to utilize hybrid mitigation techniques that are tailored to specific needs and objectives. These may include Remote Triggered Black Hole (RTBH), Access Control Lists (ACLs), local mitigation appliances, and cloud-based mitigation services.
The approach makes sense on its face, but the dirty secret of such appliance-based systems is that they are plagued by vexing problems with detection accuracy. These issues are rooted in the inherent compute and storage limitations of scale-up detection architectures. Legacy detection software typically runs on a single, multi-core CPU server using some Linux OS variant. When confronted with a massive volume of flow records, these servers must apply nearly all of their compute and memory resources to unpacking payload data from UDP datagrams, converting it from binary flow-protocol formats to ASCII-style data, and then storing it in a generic MySQL-style relational database (with the attendant high-latency read/write table-schema structure).
The constraints outlined above leave traditional detection appliances with precious little memory and computing capacity to operate detection algorithms. As a result, the state of the art in appliance-based detection leans on a few possible approaches:
- Simplistic static thresholds applied broadly across all potential attack targets.
- A small pool of statically configured objects that perform baselining of IPs. Since there are so few of these, and because it is so difficult to manually change them, most network security teams end up configuring large pools of IP addresses into these objects. Since the traffic behavior towards individual IPs is lost in the averages of the IP address pool, the result is a constant stream of false negatives.
- Segmented rather than network-wide views of traffic data. Since most traditional tools rely on separate tables to track monitoring data for different purposes, there are hard limits to how “wide” of a dataset can be handled by a given table and its corresponding monitoring process. That encourages the segmentation of data into more predictable buckets. Most commonly, detection tools (and NetFlow analytics tools in general) are hard coded to segment data by flow exporter IP. As a result, if there is more than one flow exporter, any baselining is performed on a fraction of the overall network traffic, leading to inaccurate evaluation of anomalous conditions.
How do we get beyond the inaccuracies in legacy DDoS detection systems? By recognizing that DDoS is a big data problem and removing the constraints of scale-up architecture. The fact is that there are billions of traffic flow records to ingest and millions of IPs that need to be tracked individually and measured for anomalies. How is it possible to know which are significant? In a scale-up reality, it’s not.
Luckily, cloud-scale big data systems make it possible to implement a far more intelligent approach to the problem:
- Monitor the network-wide traffic level of millions of individual IP addresses.
- Monitor against multiple data dimensions. While in many cases it is sufficient to look for violations of simple traffic thresholds, for the vast majority of attacks it’s becoming necessary to go beyond a single dimension and recognize the relationships between multiple indicators.
- Automatically identify and track “interesting” IP addresses by auto-learning and continuously updating a list of top-N traffic receivers. Then perform baselining and measurement to detect anomalies on any current member of that list.
This scalable, adaptive approach to monitoring and anomaly detection has been field-proven to be far more accurate than legacy approaches. One Kentik customer, PenTeleData, is reporting greater than 30 percent improvement in catching and stopping DDoS attacks (i.e. less false negatives) since implementing the built-in detection and alerting capabilities of Kentik Detect. For more detail, read our PenTeleData case study.
The big data approach that Kentik uses to deliver more accurate DDoS detection also makes possible long-term retention of raw flow records and related data. Kentik Detect is built on Kentik Data Engine (KDE), a distributed time-series database that correlates NetFlow, sFlow, and IPFIX with BGP routing and GeoIP data, then stores it unsummarized for months. As a post-Hadoop big data solution, KDE can perform ad-hoc analytical queries — using up to eight dimensions and filtered by multiple field values — on billions of records with answers returned in just a few seconds.
By enabling big data in real time, without the delays inherent in MapReduce, Kentik Detect provides deep forensic insight that can be invaluable in understanding the nature of network, traffic, and attack vectors as they change. Where volumetric attacks may be suspected of covering up more intrusive attacks, exploratory, ad-hoc analytics can be used to find them. For a great example, take a look at this blog series on performing source geography DDoS analysis and digging deeper into DDoS attacks.
DDoS detection has been such a difficult problem for legacy approaches that it’s easy to forget that it’s also been an information silo. Siloes are counter-productive because they impede the clarity of insight needed to achieve truly important business and organizational goals. That’s relevant because DDoS protection is about more than just defense. Ultimately, the goal for network operations teams is to deliver a great network experience that supports a superior user/customer experience. To accomplish that higher-level goal, you need to be able to traverse easily between network performance monitoring (NPM), network traffic analysis, Internet routing analysis, DDoS detection/protection, and network security forensics.
Big data is the ideal approach for unifying the data details at scale and providing the compute power to get operational value from analytics fast. Kentik Detect offers all these forms of visibility in one platform for precisely this reason. If you’re interested in learning more about how Kentik’s big data approach enhances DDoS and anomaly detection, check out the Kentik Detect for DDoS Protection solution brief. If you already know that you want to get better DDoS detection and automated mitigation, start a free trial.