The Kentik product and engineering teams continue to grow, accelerating our velocity and putting more great capabilities and value in the hands of our customers. Here are some of the specifics.
We now offer more test frequency options. We have added 2 minute, 10 minute, 15 minute and 30 minute options to the existing (1 second, 15 second, 1 minute and 5 minute) options when configuring tests.
Network teams told us that they often don’t care about catching bad certificate issues and would rather “bypass” these failures to continue testing network performance. To achieve this, we have added support for ignoring TLS errors and exposed the option to configure this through the advanced settings for HTTP/API test types and Page Load test types.
Previously, custom test thresholds were based on static values only. While this worked well on the surface, static thresholds are limited in practical use. Specifically, static thresholds cannot be used in mesh-type tests because each pair of agents would likely need a different value. Similarly, with multi-agent tests to a single target, if the agents are in different geographic locations, the expected latency for each is different, and it isn’t possible to specify a single value that satisfies them all. For all those reasons, we have added the option to specify thresholds for latency (including HTTP latency and DNS resolution time) and jitter that are based on a multiple of the baseline measurement.
By default, the new algorithm computes a baseline for latency and jitter measurements. It uses that to determine whether to mark a particular measurement as a warning (if it is more than 1.5x the baseline) or critical (if it is more than 3x the baseline). Similarly, for packet loss we default to warning when there is any packet loss (that is >0%) and to critical if it is higher than 50%.
To build on our commitment to include tests to common infrastructure services that network teams rely on, we now include a new DNS Performance tab as a preset on the Synthetics Performance Dashboard.
This tab is built using the “DNS density grid” type of synthetic dashboard and features availability and uptime checks to 7 of the top/common DNS service providers from 15 geographically distributed global agents. Where available, two IPv4 and two IPv6 DNS servers are tested for each service. This allows customers to quickly rule out (or in) any DNS failures they see from “general issues” with the specific provider.
The SaaS Performance tab on the performance dashboard was the very first preset tab when we launched Synthetics back in October 2020. At that time, this was a basic “hostname” test which is essentially just a ping and trace. Since then we have introduced much better tests that are capable of testing both HTTP layer and network layer performance, so in the spirit of “upgrading” to the latest, we have migrated all our existing SaaS performance tests to the HTTP or API test type.
Our browser Page Load tests (only supported on app agents) are enhanced with “Asset Validation.”
One of the common use cases for Synthetics is to benchmark one’s own performance against competitors as well as to compare and contrast one’s services’ performance from different parts of the world. Customers will often configure multiple tests (one set for their services and one for competitors, or tests in different parts of the world, labeled by regions) and will want to see an aggregated view. To help, we are leveraging the Kentik dashboards (Library).
There are two possible widget types:
When creating a custom dashboard widget with Synthetic data, users now have the option to choose a “Panel Type,” which may be either “Single Test” (the existing type) or the new “Multiple Test” type, which aggregates data across multiple tests. When the “Multiple Test” option is selected, the user has the option to specify the “Test Type,” “Display Type” and “Metric” that they want to aggregate data on. Optionally they can also specify agent labels that will be used to group results by in the table.
Results are displayed either as a table with each row showing aggregated values for the selected metric grouped by agent labels (user configurable).
Or as gauges showing the average metric:
These widgets can be combined with the existing widget types to produce some useful dashboards.
We now have notifications available for BGP tests via email and other notification channels.
Further burnishing our credentials as the cloud network engineers’ tool of choice for troubleshooting connectivity issues in AWS, we’ve just added a new sidebar feature to the Kentik Map, Security Groups & Network ACLs.
This sidebar enhancement enables network engineers to find traffic that is currently being dropped by AWS security groups or network ACLs applied to the selected VPC or subnet. The component analyzes the selected VPC or subnet for denied traffic into or out of the network environment and then crawls through the company’s AWS metadata to allow users to determine exactly what traffic has been dropped. The component also helps users understand which security group or network ACL policies caused the traffic to be dropped.
The system works by running a query of the flow logs to or from the selected VPC or subnet to find any traffic that had been marked by AWS as REJECTED. It then analyzes the direction of the traffic to provide an at-a-glance view of these traffic flows, as well as a convenient method for searching through the traffic to find a particular source or destination.
If a user wants to find more information about why particular traffic was dropped, they only need to click on the row to open an analysis window:
The system highlights rows that contributed to the specific traffic being dropped, making it easy to determine what policy needs to be updated and even which rule could be modified in order to rectify a misconfiguration.
Users can also view these access control policies directly from within the map — a very cumbersome task using only the AWS console and/or CLI. Kentik Cloud users now need only click on View Security Groups or View Network ACLs buttons in the sidebar and the system will open up a dialog showing exactly which policies are applied to the selected object and allow the user to browse the rules associated with each policy.
Several months ago, AWS introduced support for the following dimensions in AWS flow logs:
Flows are generated from network interfaces that attach infrastructure to the network. In AWS parlance, these interfaces are called ENIs (elastic network interfaces). Mapping flows based on ENIs provides an opportunity to add new dimensions to group and filter by ENI type, as well as group or filter traffic by source and destination ENI. These new dimensions allow our users to construct super-precise flow queries that don’t double count traffic to or from instances, through gateway and load balancers as well as special infrastructure like Lambdas. This is an important advantage for Kentik Cloud users.
We also created a more welcoming experience in the Kentik Map for cloud-native/cloud-only customers. Our previous version of the map assumed that users always had an on-prem network (or would soon be adding one). The result was that the cloud infrastructure was tucked away in the Cloud Block, while the large on-prem block remained a bare focal point on the map.
No longer! Now, when single cloud users without an on-prem network register their clouds in Kentik, the map will open up either directly in their cloud’s most appropriate view — and multi-cloud users without an on-prem network will be presented with a new multi-cloud view in the center of the map. If and when users decide to add on-prem network devices to Kentik, their experience will go back to what we are used to today (an on-prem centric view of the Kentik map).
Did you know that sites don’t need to be directly connected to each other in order to show traffic lines in the Kentik Map? Several quarters ago, we introduced a feature called “Draw Links Using…” which enabled users to select an option to draw links based on BGP Ultimate Exit as well as Site IP addresses configured in the site architecture dialog. This enables “island” networks (networks without a backbone) or SD-WAN networks to configure their sites and easily run traffic queries between sites.
These lines are drawn by queries using new dimensions called Source/Dest Site by IP and Site Type by IP. Because we’d heard that some new business was based on this, we’ve responded by adding these dimensions into the sidebar for convenient analysis in the map.
Another quick but important usability improvement was to create a new sidebar section titled “Details.” This prevents map objects (subnets, VPCs, gateways) with lots of metadata from making the sidebar unusable.
A major improvement we’ve added for Azure is the ability for companies that centralize the collection of NSG flow logs into a single storage account to create “metadata-only” exports for resource groups within the same region. To make this work, simply disable the slider called “Enable Flow Logs for this Export” on any resource groups that don’t have their own storage account associated.
We’ve also implemented some improvements to our Azure services based on customer feedback as well as added infrastructure resiliency and backend code improvements. Stay tuned for more improvements this and next quarter as we continue to round out our cloud offerings.
We’ve added a sixth tour to Kentik’s Demo Mode, which walks users through a troubleshooting scenario involving connectivity problems between AWS resources and an on-premise database. The new tour highlights the difficulty of conducting this kind of troubleshooting in complex cloud environments with existing tools, and makes very clear Kentik’s strength in helping solve these issues.
This month we are excited to announce beta availability for our new Weather Map — a new core feature of Kentik Maps.
Our new Weather Map shows network engineers how their network looks so that network architectures and the current traffic patterns can be understood at a glance. This feature was one of the most requested enhancements to Kentik Maps since we went live, and we’ve only begun to scratch the surface in terms of what we plan to do here.
Today, the Weather Map is simple. It renders a company’s sites over a geo-political map, using the customer’s configured site addresses to translate to latitude and longitude coordinates. We also cluster groups of sites within the same region to declutter the map; as users zoom towards these clusters, the cluster breaks open, revealing the sites positions on the map below. Between sites (and clusters) of sites, we’ll draw links using the connected interfaces so customers can view their backbone network utilization and click on links for easy traffic analysis.
We’ve got an amazing roadmap of features coming out for Kentik maps this quarter, so stay tuned for future updates to Weather Map, AWS map and site maps in Q4.
Another great new feature enhancement is our ability to rewind the clock and show users how their AWS network (and associated traffic) looked in the past, using historical metadata.
When we launched the Kentik Map for AWS, we began with a metadata service that only stored metadata describing the current state of the user’s network. However, if a user adjusted the time window to find specific flows, we assumed that the AWS architecture was the same during the specified query window as it was when the query was actually run. We knew this would eventually require historical support, which took time to design and implement.
However, that day is here! Users can change the to/from dates in the Kentik Map and we will update the map to show the user what the environment looked like during that time. If we took multiple “snapshots” of metadata during the specified time, we will show the most current we have for the time window.
This means that if traffic used to flow through a gateway that was subsequently deleted, we’ll show that gateway on the map. If traffic entered a subnet that only existed for a day or an hour — we’ll draw that subnet on the map.
We’ve added the ability to click on a line within AWS and get instantaneous traffic details for the line! In prior versions of the Map for AWS, users could only click on Map elements such as Subnets, Gateways, etc. Understanding and analyzing traffic between elements was left as an exercise for the user to construct queries using the Data Explorer. Now users can click on lines between subnets (“Show Connections”), lines between gateways, and lines to and from internet ASNs.
We also improved upon the way that the Kentik Map rendered traffic to and from gateway objects. Previous versions of the Kentik Map couldn’t determine the amount of traffic entering a subnet from a gateway. Now that we’ve switched our flow enrichment over to using network interfaces rather than only IP addresses, we can indeed show traffic from this infrastructure entering your customer’s environments.
The custom webhook notification method in v2 notifications (/v4/settings/notifications) now supports the ability to customize the HTTP headers and values sent with the request, in addition to the request body.
Among other uses, this allows users to provide authorization credentials for API endpoints that require it.
Notifications v2 now supports all of the notification methods that were supported in notifications v1, along with a few new ones like Microsoft Teams, VictorOps and xmatters.
In addition, with customizable HTTP headers and request body templates in the Custom Webhook method, it should now be possible to do one-off integrations with virtually any third party API.
NOTE: Some notification methods are not yet available to select as destinations for Synthetics notifications. Template updates are required for these methods to properly present the different data fields associated with Syn notifications.
v4 DDoS policies now support the selection of both v1 and v2 notification methods as destinations for alert notifications. In the thresholds section of the policy configuration, users will now see both v1 and v2 methods shown in the drop-down list.
Each available notification channel is labeled with the notification method type, though we do not distinguish between v1 and v2 types since these are not user-facing designations. We’ve also temporarily removed the link to the v1 notifications configuration page until we have migrated all v1 methods to v2.
Mitigation platforms and methods are now configured via a native v4 UI form. The new form combines platform and method configuration onto a single page with a better UX that shows which methods are associated with each platform.
The new form also removes the limitation on configuring both RTBH and flowspec mitigation methods on the same router.
We’ve added an additional threshold type for DDoS policies, which allows the user to compare two different metrics that are measured by the policy. Along with this, we’ve added some additional metrics that measure separate inbound and outbound packets/sec and bits/sec rates. The metrics that are compared in a ratio-based threshold must be metrics that are configured as primary or secondary metrics for the policy.
Some use cases where ratio-based thresholds can be useful:
Ratio-based policy thresholds allow the ratio to be compared in both directions (i.e. A:B and B:A) or one direction only. In the both directions case, an optional margin parameter effectively lets the user define a “band” of acceptable ratios, with values above or below the band triggering the threshold condition.