Major Alerting v3 updates
Custom dimensions are now supported in Alerting
Anomaly detection users can now leverage all the profiling power of Kentik’s Alerts capabilities with their own Custom Dimensions. What this practically means is that baselining and thresholding are now available on user defined custom dimensions – like location, service name, customer ID, or any other way you’d like to support meaningfully slicing traffic.
A simple use case could be a jump in bits/s for traffic you have classified as “Transit” via custom dimensions. Or a drop in bits/s for traffic you have classified as “Settlement-Free Peering.” Or even major new traffic destinations on a per-application basis.
Alerting JSON webhook triggers
A lot of our anomaly detection users have been asking us to add means to trigger homegrown REST endpoints when alerts are firing, primarily to allow integration to in-house tools and workflow systems.
If you are one of these, your voices have been heard 🙂
Whether you want to integrate Kentik’s Anomaly Detection capabilities into your existing monitoring systems or trigger your own form of remediation, this is now possible!
You can now set up a Notification Channel that corresponds to a webhook URL which can be posted to. The Channel will receive all of the relevant JSON data context for you to code against.
Route Traffic Analytics
Route Traffic analysis is the fruit of a hackathon we held earlier this year at Kentik.
You may have heard about studies finding it isn’t uncommon for a given network to have over 95% of its traffic delivered by a minuscule number of routes.
The reason behind these studies is that the FIB capacity of low-end black box L3 switching gear is limited to around 30K prefixes. If you can find a way to live with only 30K routes in FIB and a default route to cover the rest, you don’t need to purchase very expensive routing gear that has a FIB capacity in the millions of routes. The operational question is which 30K routes?
The Route Traffic Analysis feature, under Analysis → Route Traffic, precisely answers this question.
Accessed from the Analytics menu, Route Traffic Analytics feature provides insight into the number and percentage of traffic flows correlated to the number and percentage of routes, plus Mbps per analyzed tranche of routes. The summary view provides both histogram and tabular data views.
Conveniently, the histogram on top of the table will display stops for p95th, p90th, p80th for Traffic and Routes on its X and Y axises.
A listing of the top 1000 routes by traffic density, which provides more details per routes
Export to CSV of top routes, which could be used to configure routers
A quick calculation of average and max Mbps per route
New Packet Size, Interface Capacity Dimensions
Packet Size Dimension
In our constant effort to bring more and more dimensions for our users to slice and dice from, we have just added Packet Size and Packet Size_100 grouping dimensions and filters to our Data Explorer and Dashboards.
The Packet Size_100 dimension segments packet size statistics in buckets of multiples of 100 Bytes, well suited for Comparison Bar Charts.
Interface Capacity Dimension
Interface Capacity has also been added to flow grouping dimensions and filtering in the Data Explorer and Dashboards.
This allows our users to display a graph of all 10Gig links, another of all 20gig links, etc, so customers can eyeball hot links or capacity issues per link type.
This dimension will come in handy when going through a capacity management exercise in your network: it is well paired with a table view, in which you could for instance list your topX 10Gbps interfaces by order of traffic, as displayed in the screenshot below:
With reports using the Interface Capacity dimension, you can now answer questions such as:
“How is traffic versus capacity for the 1Gbps, 10Gbps, 20Gbps, 30Gbps, 40Gbps, 100Gbps interfaces on our sites? Are any of them maxed out?”
To illustrate the above, we have created a ‘Capacity Management‘ Preset Dashboard readily usable for this purpose, load it directly from the Dashboards Library section:
SNMP / Interface Overrides
This capability lets users manually set interface level information that is usually polled via SNMP.
→ Our Knowledge Base entry for Interfaces has been updated with this feature.
The main use cases for this new features are:
- Providing query-able interface info on a Router/Switch device when SNMP is not enabled.
- Providing query-able interface info on nProbe hosts as SNMP isn’t available for these by default.
The implementation of this feature can be seen in the Device → Interface screen.
Hovering on an interface line will present options to override an interface, as shown below:
Navigating to the Edit button will bring up an in-place edit panel for this interface:
Upon saving, override fields of the interface will be displayed with an orange triangle in the bottom left corner, as in the example here:
An additional handy toggle in the interface table’s header allows you to filter it to only view interfaces with an override:
New User Profile Settings
User Profile settings have been updated to allow enabling or disabling of history, default time-zone and DNS lookups. Settings are in the “User Information” table found by clicking on the username at the upper right of the navigation top bar.
Disabling history in the User Information panel sets the Historical Overlay switch (shown below) to off by default in the Data Explorer. This shortens query response time as data points for the selected number of days of history don’t have to be fetched anymore:
Disabling DNS lookups will also reduce query time, as Hostnames for displayed IPs in the Data Explorer query result table won’t have to be fetched before returning the result. Depending on how many IP addresses are being resolved, disabling lookup can greatly speed any graphs or queries returning IP addresses.
Default landing page
A newly added option in User Information is the ability to configure a landing page, which is the page that will show by default upon login.
The landing page can either be a Dashboard, a Saved View, or your the Alert Summary page if you are a user of our anomaly detection feature-set.
- We now display distinct flow types for NetFlow v9 and IPFIX on the device listing page.
- Alerting learning mode default is now +6 days.