Resolution Overview

Kentik creates two fully independent dataseries at ingest, one at full resolution and another optimized for faster execution of queries that cover long timespans. The use of parallel dataseries makes it feasible to run long-timespan queries that return in seconds without compromising the detail of shorter timespan queries.

The following topics explain how these two dataseries are used by the system:

 

About Dataseries Resolution

Both the Full and the Fast dataseries are built at time of ingest rather than created retroactively. Kentik simultaneously ingests data at two resolutions and maintains that data in the two independent dataseries:

  • Full dataseries: Includes every flow record sent by a given customer to Kentik (within applicable limits of the governing service agreement).
  • Fast dataseries: Includes only a subset of the flow records, enabling faster response to queries spanning 24 hours or more. The following steps are taken at ingest to reduce the amount of data stored in the Fast dataseries:
    - All ports above 32767 (i.e. ephemeral ports) are grouped together and represented in the dataseries as port 65535.
    - Flows that share the 7-tuple are aggregated.
    - Duplicate sequential flow records over a short timeframe (minutes) are combined into a single entry.
    - Data is further downsampled to 30 flows per second(fps) or 1800 aggregated flows per minute(fpm) for each device.
    Note: To discuss the possibility of different downsampling values, contact Customer Support.

By default, the dataseries on which to run a given query is determined by the timespan covered by that query:

  • For timespans of less than 24 hours the default is the Full dataseries.
  • For timespans of 24 hours or more the default is the Fast dataseries.

Notes:
- The 24-hour determinant refers only to the duration of the query's timespan, not to how far back in time the query is looking. In other words, a query whose timespan duration is less than 24 hours will always run on the Full dataseries (unless manually overridden; see Overriding the Default Dataseries), even when the date-time of the timespan is more than 24 hours before "now" (e.g. weeks in the past).

Resolution Intervals

The dataseries resolution determines the interval (granularity) used for reporting packets and bytes:

  • Queries run on the Fast dataseries "snap" to the closest hour.
    Example: If a query's timespan is from 1:05 PM on 5/12 to 10:31 PM on 5/15, returned results will cover 1:00 PM on 5/12 to 11:00 PM on 5/15.
  • For queries run on the Full dataseries the interval varies depending on the width of the query.

Note: For further information refer to Time Rounding.

Overriding the Default Dataseries

In some circumstances (see Query Resolution Selection) the default choice of dataseries may be manually overridden:

  • Fast dataseries may be manually selected for timespans as short as three hours, enabling ultra-fast response to queries not requiring full-resolution detail.
  • Full dataseries may be manually selected for timespans of up to 72 hours, enabling full detail in results for timespans of three full days.
 

Query Resolution Selection

The default dataseries selection may be overridden in both the portal and the API. Manual selection of the dataseries for a given query is covered in the following topics:

 
top  |  section

Portal Resolution Selection

In the Kentik portal, the dataseries to query depends on the duration set in the timespan fields at upper left in Data Explorer and in Dashboards:

  • In Dashboards the dataseries is always chosen automatically according to the defaults described in About Dataseries Resolution:
    - Full for less than 24 hours;
    - Fast for 24 hours or more.
  • In Data Explorer, the dataseries is chosen as follows:
    - If the specified timespan is three hours or less, the query is always run on the Full dataseries.
    - If the specified timespan is greater than three hours but less than 24 hours, the query is run on the Full dataseries by default, but the default may be manually overridden by selecting Fast from the Dataseries drop-down.
    - If the specified timespan is between 24 hours and 72 hours, inclusive, the query is run on the Fast dataseries by default, but the default may be manually overridden by selecting Full from the Dataseries drop-down.
    - If the specified timespan is more than 72 hours, the query is always run on the Fast dataseries.
 
top  |  section

API Resolution Selection

The same dataseries defaults that apply when working in the Kentik portal are active when querying with the Kentik API. The default may be overridden using the query parameter i_fast_dataset:

  • If i_fast_dataset isn't included in the query then the dataseries is chosen automatically by the backend according to the defaults described in About Dataseries Resolution:
    - Full for less than 24 hours;
    - Fast for 24 hours or more.
  • If i_fast_dataset is specified as false, the Full dataseries is used for any timespan up to 72 hours (three full days).
  • If i_fast_dataset is specified as true, the Fast dataseries is used regardless of the timespan.

Note: The following dataseries/timespan combinations are not recommended:
- Fast dataseries for timespans of less than three hours.
- Full dataseries for timespans of greater than 72 hours.

© 2014- Kentik
In this article:
×