Kentik - Network Flow Analytics
Product Updates
More Product Updates

June/July 2021

Kentik’s product and engineering teams continue to deliver the new capabilities and features that matter the most to our customers, driven by the valuable feedback that we collect. Here is a summary of new features as well more detailed descriptions and relevant screen captures.

Synthetics

New App Agents

We released a new set of agents that will enable tests at the application layer. These are what we’re calling “App Agents” and they are capable of running a full headless Chrome browser instance. These agents will enable us to offer our customers tests like Page Load tests and Transaction tests. When used in conjunction with our rich network layer functionality, these new agents and test types will allow network engineering and network operations teams to quickly determine if the issue is at the network layer or at the application layer.

Full Browser Page Load Test

We activated the Page Load Test type that performs a full browser page load using the new App agents.

  • The new test type can be set up by clicking “Page Load” under “Web Tests”
  • Agent selection uses the new App agents but this is seamless to the end user.
    • Once in the test setup, clicking the “edit agents” button will only display the App agents in the list (that is currently a subset of all locations but growing to include all currently supported locations).
  • The test set up is similar to the HTTP or API test except:
    • It performs a full browser page load (while the HTTP test stops after the page contents are retrieved using a GET)
    • Only includes GET option (since it is a page load). Does not include ping and trace alongside the page load (we have plans to support that in the future).
  • The results are presented in a similar fashion to the HTTP or API test, with the following differences:
    • Table columns include new “DOM Processing Time” and “Navigation & App Cache” that are specifically relevant to the time taken for the browser to load and display the contents of the page respectively

BGP Route Viewer

BGP Route viewer is the first of a series of capabilities planned to help proactively monitor BGP-related conditions that can impact performance. In response to customer requests and feedback, we have developed a comprehensive roadmap for BGP monitoring, and we believe our solution will have significant performance advantages over alternative solutions.

The first part of Kentik’s solution is BGP Route Viewer. BGP Router Viewer appears as a tab along with the existing SaaS and Cloud Performance tabs. For customers who have entered prefixes in their Network Classification settings, we will automatically load BGP update data for those prefixes in this tab. For customers who have not entered any prefixes in their Network Classification settings, we will show an interface that allows you to do so and give you the option to save the entered prefixes to the Network Classifications page.

HTTP Stage Timing and Charts

With the new Page Load tests, results can be plotted in a time series along with HTTP stages and the timing for each stage. This new view helps network teams isolate network layer issues from HTTP layer issues.

Major Path UX Overhaul

The (traceroute driven) path experience has been one of the most valued features of Synthetics and while it works well, we felt like we could go back and revisit the design holistically after having added a bunch of small and big features iteratively since it first launched (back in November). The updates we made can be summarized in two main buckets:

  1. Improve the overall usability of the product/feature by:
    • Reducing the number of clicks to do things (like setting the thresholds and other config knobs)
    • Reducing the quantity of information presented on default load (intelligently collapsing things to reduce information overload)
    • Reducing the amount of whitespace used so it is more compact and requires less or no scrolling
    • Preventing the path from exceeding the bounds of the page
    • Avoiding the side pane (which required knowing that one should click and would cover a third of the path when open)
    • Remove the ping-driven health timeline, as this data does not necessarily correlate with latency seen on the path and can lead to confusion.
  2. Support collapse/group by sites. This has been requested by a few customers, particularly ones that run tests within their own network and find the ASN option of less use (everything collapsed into their ASN). Having the ability to group by sites lets these customers know if a path change caused traffic to go through a different site instead of the expected/desired one.

Here is a list of the main changes:

  • Tabs have been moved up to reduce whitespace and reduce tab overload. Metrics tab is all about ping metrics (tabular + map, mesh or time series) vs. other tabs that are trace- or insights-data driven.
  • Health timeline is removed from path tab. It was ping-results driven and could confuse users when it showed issues, but no issues were present in the path. (The path is trace-probe driven, which may not show the same issues for short-lived spikes.)
  • The “Show Options” box is removed. It caused extra whitespace and required avoidable clicks to open/close, and would cover the path when opened. The options have been moved to the top and located in a straight line, maximizing space use and improving usability.
  • All group/collapse functions (ASN, left/right) have been rolled into one main “Group by” selector and the option to group by “Sites” has been added.
  • Copy for geodistance-based latency comparison has been improved and helper text/icon added.
  • Option to “Reset” everything back to default quickly has been added.
  • Overall look and feel of the geo distance vs. hop count chart has been improved so it also serves as a scrollable timeline.
  • The ASN legend has been moved below the path UI and is displayed in a line, moving the path higher up in the page and reducing the amount of whitespace.
  • The main path trace visualization has received the most significant overhaul and results in a much less overwhelming and much more fluid experience than before. Highlights include:
    • On load, some nodes are collapsed to reduce visual overload while optimizing for the most common sources of problems (source and destination networking issues) by keeping the end nodes on both sides open!
    • Extra effort has been put into avoiding overlapping traces that cross other traces and make the overall UI very busy and confusing.
    • You can hover over any node (without needing to click) and it will show you all the information available.
    • Similar to the node hover, hovering over an agent (source) or target will show you all the information cleanly organized in sections, and will give you a link to view the raw traceroute output. There is also an option to quickly collapse nodes for this or other agents with just one click, right there.
    • The traceroute output itself has been improved in a couple of ways. First, each consecutive measurement (from the three probes being sent) for each hop is cleanly organized on separate lines indented from the hop number. Second, for multiple targets, the full output for each appear next to each other to improve readability.
    • Previously we would only show latency for (red) links that exceeded the threshold and packet loss for (red) nodes that exceeded that threshold. Now this information is shown for all nodes and links. In cases where the metrics exceed the threshold, a red font is used to highlight. Further, previously high packet loss nodes were identified with a full red circle, which was confusing if there was an ASN with a similar color. Now this is made clearer with a red border.
    • Previously, traces that timed out before they reached their destination were indicated differently (like abandoned/frayed traces that seemed broken) from ones that had a few timeouts in the middle, but reached their destination. Now, all traces join the target and it is clear which ones had intermediate timeouts vs. which ones ended in timeouts.

Density Grid Groups Dashboard

In response to customer feedback, we have added a new type of visualization option under Synthetics in the dashboards (library) portion of the product. One of the key use cases is customers who set up DNS servers by zones and want to see a global view of the performance of their whole DNS infrastructure.

  • Select “Add” a “Synthetics Test View” dashboard element and then pick the new “Density Grid Group” allows you to multi-select any tests configured in the system that are of type “DNS Grid” or “Network Grid.”
  • Select a few tests and save the widget to display agents in the first column and then test results aggregated by target in the columns to the right of that.
  • For each cell in the results, each square represents that specific agent hitting one DNS server to resolve the specific target.
  • A holistic view lets the user quickly pinpoint any issues from a large number of DNS servers distributed across the world.

9 New LATAM Global Agents

We deployed nine new global agents throughout LATAM, improving our coverage in the region and increasing count from four agents to thirteen.

Cloud

AWS Entity Explorer

A quiet but mighty addition to our product, the AWS Entity Explorer puts important network metadata at our user’s fingertips. You might not know it, but the details that dictate how cloud networks behave are buried behind APIs or inside cloud interfaces — which were built for automated consumption — and certainly not for solving problems for network engineers. With this new feature, engineers can answer questions like “What VPC is this internet gateway associated with?”

Features include:

  • Instantly find any network element using our quick search utility. Search on owner IDs, entity IDs, tags, and names.
  • Jump from gateways to attached VPCs to quickly navigate around complex metadata.
  • Our new “Open in Map” feature allows users to quickly locate and understand how infrastructure is placed within their environment.
  • Open cloud networks elements in Quick Views and Data Explorer.

Support for Peered Transit Gateway Traffic Queries

The Transit Gateway in AWS continues to stymie network engineers trying to get a handle on how their traffic is routed within their AWS cloud network. Our original implementation of TGW support only looked at traffic that had originated on a directly-attached VPC. However, Transit Gateways can be peered with each other — meaning that a single Transit Gateway can actually be forwarding traffic to or from an adjacent Transit Gateway. Being that we are awesome, and because we are the only network observability company with a solution to monitor traffic through Transit Gateways, we solved this problem by writing an algorithm that discovers peered Transit Gateways — so you can always see the correct amount of traffic flowing to or though your TGWs.

AWS “Show Path” Feature

A truly kick-ass, differentiating feature for Kentik Cloud. Understanding how traffic flows from one VPC to another over a cloud network is truly a painful experience — one that has network engineers switching back and forth between their command lines and the AWS console for minutes before arriving at a simple answer. The AWS Show Path feature eliminates this pain and replaces it with an intuitive, complete and beautiful way to see paths between sources and destinations in the cloud.

Show Path works across peering connections, transit gateways, over direct-connects and site-to-site VPNs and also works locally, within a VPC. The feature elegantly handles default and covering routes by suggesting specific routes from adjacent devices ensuring that the path drawn is as complete as possible.

AWS Configuration Status

One thing that has become clear over the last few months is that we need to continue to strengthen our ability to quickly and easily onboard AWS flow logs and metadata. However, with the multitude of architectures we support and data + flow logs coming in from tens or sometimes hundreds of different sources per customer, we never had a way to concisely convey the health of a customer’s Kentik implementation… until today.

The AWS Configuration Status page aims to make this easier by helping users get an at-a-glance overview of how complete (or incomplete) a customer’s AWS/Kentik configuration is. For each region that a customer has configured an export for, we extract the account ID, and display a high-level overview of the API and Flow status. Clicking on a row allows customers to get more details such as a listing of exactly which APIs our system requires and a success state for each. Warning messages are detailed and complete on the mouseover. Below the APIs, we enumerate the flow logs configured for each entity within a given account/region and flag any accounts that don’t appear to have flow logs configured such that Kentik could ingest them.

Search Feature for Kentik Map + Performance Monitor

Building a map for large customers with hundreds or thousands of accounts is definitely possible, but doesn’t always result in the most useful of visualizations. That’s why we added a search and filtering feature to both the Kentik Map and the Performance Monitor. This feature allows users to quickly find ‘needle in the haystack’ entities like VPCs, subnets, and gateways. Our search intelligently recognizes the format of each search string entered and builds a complex search query that can be saved for quick reuse.

Support for External ID

At the request of one of our customers, we’ve added support for External IDs in the API and S3 calls that we initiate to AWS. External ID helps protect our customers from “Confused Deputy” attacks that could allow our service to be abused by malicious 3rd parties to attack our customers. (We don’t believe that the access we request could ever be used in such a way, but better safe than sorry!) As this feature has become more front-and-center in AWS’ role configuration dialogs, we are glad to support this enhancement. The feature now injects a unique string per customer with each request that we send to AWS. This string is set to be the Kentik customer CID.

OTT

We added a service tracking workflow extension: OTT Service Capacity.

Overall UI/UX refresh

While working on adding this new functionality to the OTT workflow, information density has significantly increased, making the existing workflow harder to read. We have taken this opportunity to streamline it and reorganize content in it to make it clearer for users.

These screenshots illustrate the newly streamlined UX of the OTT workflow. We are now showing OTT traffic ranking in each category, where we initially only showed links to the different OTTs.

The engine classification rate which used to be a pie chart has also been reworked into a horizontal gauge to give more room to the actual usable data.

We have also reworked the OTT Service Details pages, not only to include OTT capacity, but also to better segment information in it:

  • An overview screen has been added that now shows comparisons/rankings to similar OTT services and ranking within the provider it’s part of.
  • The connectivity tab holds the usual former traffic slice-and-dice for the selected OTT service.
  • A new “subscribers” tab has been added which both holds augmented subscribership data for this OTT, and also a performance section allowing the users to slice and dice Mbps/Subscriber based on site, provider, connectivity type or any combination of these. The latter was not previously available in the OTT workflow.

Capacity as a Net New Functionality for the OTT Workflow

This extension of the OTT Service Tracking workflow was designed to meet the following requirements:

  • Being able to scorecard OTT services based on the capacity available to deliver them to the subscribers
  • Being able to dive into the details and see each interface on the edge of the network participating to the delivery of these OTT services to the subscribers
  • In case of high interface utilization, being able to determine the performance impact to subscribers whenever possible as well as the amount of impacted users.

We are now by default scanning the capacity for the top 100 OTT services for each customer and presenting the results in the Capacity tab of the landing page for the OTT workflow. Each treemap displayed on this page is a representation of traffic delivered by each interface (all devices included) and its utilization status.

The OTT service details page now also includes a tab for Capacity, providing an in-depth look at all devices and interfaces. The treemap shows all interfaces, and the list underneath is the list of devices involved. Each device entry listed below the treemap displays the contribution of its interfaces to the current OTT service and can be expanded to get details.

When expanding any device from the list, the user will see the details for each interface on the device involved in the delivery of this OTT Service.

Any interface within a device can also be expanded to display metric details around performance and the number of impacted users, taking into account the thresholds set for capacity in the workflow configuration.

Join the Kentik Slack Community
Be part of a community of Kentik users who can help you along the way.
Join Now
We use cookies to deliver our services.
By using our website, you agree to the use of cookies as described in our Privacy Policy.