One of Kentik’s big ideas about network visibility is that it should be possible to identify and alert on any network anomaly even without advance knowledge of the specific threats you’re guarding against. That’s important because threat vectors are constantly evolving, so today’s network operations tools must excel at revealing anomalies as they occur and also at drilling down on unsummarized network data — flow records, BGP, SNMP — to uncover root causes.
Kentik Detect is ideal in this regard because it’s able to ingest full resolution flow records (NetFlow v5/9, IPFIX, sFlow, etc.) at massive scale, to combine flow data with BGP and SNMP into a single, coherent datastore, and to make this unified data available within moments for responsive, multi-parameter querying. With that kind of power, it’s possible to find anomalies as they happen, not only via our sophisticated alerting system but also through real-time spelunking. Anomalies found in this way can then be used to define alerts that generate notifications upon future recurrences.
One example of how to use Kentik Detect this way comes from a situation faced by one of our customers that is a large infrastructure provider. The issue was a new type of “spam reflection” that was causing headaches by allowing spammers to avoid detection while utilizing servers inside the customer’s network as spam sources. Here’s how these types of schemes work:
From the outside, the spam appeared to be coming from the infrastructure provider’s network, but the provider itself never observes any outbound SMTP traffic.
Traditional NetFlow analysis systems may allow us to see either top SMTP receivers in the network or top GRE sources, but discovering which hosts are doing both would typically require correlation of the two lists. Manual correlation doesn’t make for an efficient network operations workflow, so correlation would typically be handled via custom scripts, which are fragile and tend to break over time.
Fortunately, Kentik Detect makes it relatively easy to construct queries that can quickly perform the correlation required to identify network servers that are possibly being exploited in this way. Kentik Detect uses SQL as the query language, but the underlying data is stored in the Kentik Data Engine (KDE), a custom, distributed, column-store database. Multi-tenant and horizontally scaled, KDE typically returns query results in a few seconds or less even when querying billions or trillions of individual flow records. (For further insight into how we do this, see our blog post on PostgreSQL Foreign Data Wrappers by Ian Pye).
Now that we’ve got some background, let’s see how to use Kentik Detect to uncover the signs of spam reflection. We’ll do so with a set of queries that can be used manually and also built into an alert that sends notifications whenever this type of behavior occurs. First, we’ll write a query to get a list of all the hosts in the network that are receiving traffic from remote source port TCP/25 (the traffic described in Step 2 above). Here’s a breakdown of the elements of the query:
|SELECT ipv4_dst_addr,||Return the destination IP.|
|SUM(both_bytes) AS f_sum_both_bytes||Return the count of bytes. Aside from renaming the bytes column, we’re also passing a Kentik-specific aggregation function to the backend. See Subquery Function Syntax in the Kentik Knowledge Base for more information.|
|FROM all_devices||Look across all flow sources (e.g. routers).|
|WHERE ctimestamp > 3600||Look only at traffic from the last hour.|
|AND protocol = 6||Look only at TCP traffic.|
|AND l4_src_port = 25||Look only at traffic from source port 25 (SMTP).|
|AND dst_flow_tags ILIKE '%MYNETWORK%'||Look only at traffic where the destination IP is inside our own network, as determined by it being tagged with a MYNETWORK tag. This condition makes use of Kentik’s user-defined tags functionality. See Kentik Detect Tags for more information.|
|GROUP BY ipv4_dst_addr||Return one row per unique dest IP.|
|ORDER BY f_sum_both_bytes DESC||Sort the list in order of number of bytes, highest first (descending).|
|LIMIT 10000;||Limit the results to the first 10000 rows.|
Now we can run this query in the Kentik Detect portal’s Query Editor:
This image shows the query in the Query Editor along with the first few rows of results:
Next, we’ll build a query to find the top hosts inside the network that are sending the GRE (protocol 47) traffic discussed in Step 3 from above:
SELECT ipv4_src_addr, SUM(both_bytes) AS f_sum_both_bytes FROM all_devices WHERE ctimestamp > 3600 AND protocol = 47 AND src_flow_tags ILIKE ‘%MYNETWORK%’ GROUP BY ipv4_src_addr ORDER BY f_sum_both_bytes DESC LIMIT 10000;
This second query looks very similar to the first, with just a few changes:
Now let’s combine those two queries using a SQL UNION statement, and wrap the combined query in an outer SELECT that will put SMTP and GRE into two separate columns. We’ll return the traffic volume as Mbps (instead of total bytes) and we’ll also limit the results only to hosts where Mbps > 0 for both traffic types.
SELECT ipv4_src_addr, ROUND(MAX(f_sum_both_bytes$gre_src_mbps) * 8 / 1000000 / 60, 3) AS gre_src_mbps, ROUND(MAX(f_sum_both_bytes$smtp_dst_mbps) * 8 / 1000000 / 60, 3) AS smtp_dst_mbps FROM (( SELECT ipv4_src_addr, SUM(both_bytes) AS f_sum_both_bytes$gre_src_mbps, 0 AS f_sum_both_bytes$smtp_dst_mbps FROM all_devices WHERE ctimestamp > 60 AND protocol=47 AND (src_flow_tags ILIKE ‘%MYNETWORK%’) GROUP BY ipv4_src_addr ORDER BY f_sum_both_bytes$gre_src_mbps DESC LIMIT 10000) UNION ( SELECT ipv4_dst_addr,0 AS f_sum_both_bytes$gre_src_mbps, SUM(both_bytes) AS f_sum_both_bytes$smtp_dst_mbps FROM all_devices WHERE ctimestamp > 60 AND protocol=6 AND l4_src_port=25 AND (dst_flow_tags ILIKE ‘%MYNETWORK%’) GROUP BY ipv4_dst_addr ORDER BY f_sum_both_bytes$smtp_dst_mbps DESC LIMIT 10000)) a GROUP BY ipv4_src_addr HAVING MAX(f_sum_both_bytes$gre_src_mbps) > 0 AND MAX(f_sum_both_bytes$smtp_dst_mbps) > 0 ORDER BY MAX(f_sum_both_bytes$gre_src_mbps) DESC, MAX(f_sum_both_bytes$smtp_dst_mbps) DESC LIMIT 100
The following image shows how the output looks in the Query Editor. The query found three on-net hosts that were receiving traffic from off-net TCP/25 sources and also sending outbound GRE traffic.
So far we’ve seen how straightforward it is to use SQL querying in the Query Editor to find anomalies that may indicate a threat to the network. With Kentik Detect’s alerting system we can turn this kind of one-time spelunking into an ongoing automated query that defends the network by notifying us when suspicious conditions arise. To see how this works in actual practice, let’s start with the query defined above and build it into an alert that runs the query every 60 seconds and notifies us if there are any results. We’ll begin by going to the Add Alert page in the Kentik Portal:
To use our SQL query as the basis of an alert, we’ll need to make a few modifications:
Once we set the alert settings as shown in the image above the alert engine will run this query every 60 seconds (the Lookback window), and generate a notification if we see any results 5 or more times (the Trigger threshold) over the course of a 2 hour rolling window (the Event window). In addition to individual alert settings, we should also check our global alert settings (link at upper right of Alert List), which apply to all alerts for a given organization. As shown in the following image, these settings determine the details of alert notifications, including how the notifications are sent (via email, syslog, and/or JSON pushed to a user-specified URL).
If our new alert is triggered, then events related to that alert will be listed in the Alerts Dashboard (choose Alerts from the portal’s navbar). Here’s a look at how these events will appear:
As described above, the querying and alerting capabilities of Kentik Detect are extraordinarily precise and flexible, enabling customers to quickly find activity of interest on their networks. If you’re not already experiencing Kentik Detect’s powerful NetFlow analysis visibility on your own network, you can sign up for a free trial by clicking the “Get Started” button at the upper right.