It’s no secret that the migration of applications from traditional data centers to cloud infrastructure is well underway. And it’s tempting to think that “the network” is just one of the many infrastructure management headaches that disappear after migrating to the cloud. However, most organizations find that understanding the network behavior of cloud-deployed applications is still a critical part of ensuring their availability and performance. In many cases, even more important than before, with the increasing scale and distributed nature of modern applications.
Because the underlying network devices are abstracted away from the end user in cloud environments, cloud providers haven’t historically been able to provide detailed telemetry about network activity. So ops teams weren’t able to rely on data like NetFlow, sFlow, and IPFIX that they used to use in traditional data center environments to get a full understanding of how applications talk to each other on the network. This loss of visibility and traditional tooling — “going dark” — was an unfortunate but required tradeoff when moving to the cloud.
Fortunately, that situation is now changing for the better. Google recently announced the availability of VPC Flow Logs for the Google Cloud Platform which provide detailed, real-time telemetry of network activity within and between VPCs inside GCP projects. It’s very much like NetFlow for VPCs, but better. VPC Flow Logs provide 5-second granularity, whereas NetFlow is typically 1-minute granularity. VPC Flow Logs also contain fields with network latency measurements, and tags that identify various attributes (VM, VPC, and region / zone names) associated with the source and destination of the traffic, which provides extremely useful context about each flow, and the underlying network activity it represents.
Kentik now also supports VPC Flow Logs as a data source and fully exposes the GCP-specific tags as dimensions and filter terms through the Kentik UI. This means Kentik customers can get full visibility into network activity within GCP projects — and also between GCP and traditional data centers in hybrid cloud architectures. The latter situation is one where customers have told us they are especially blind.
Kentik + VPC Flow Logs provide teams across the operations spectrum an extraordinarily useful tool to ensure the availability and performance of services, and maintain a great user / customer experience.
“What happened?” or “What is happening?” is always the question of the moment. When services go down or some other unexpected condition is impacting user experience, the clock is ticking. Logs are essential, but often don’t tell the whole story. The network “sees all,” and Kentik’s ability to retain a detailed, real-time picture of network activity provides instant answers to key questions like:
Fast filtering, pivots, and drill-downs let teams quickly get to root cause and gather the details they need to restore services to healthy state. Going a step further, Kentik also baselines normal traffic distribution to / from services or hosts, to provide proactive detection of potential problems when conditions change for an even faster response. As an API-first platform, Kentik is also easy to integrate with cloud deployment and incident response toolchains.
At scale, GCP projects quickly become complex. Various application tiers may be deployed across multiple zones and regions, and potentially communicating with remote services in a hybrid or multi-cloud architecture. Without a way to visualize traffic flows and service dependencies, it becomes nearly impossible to understand the big picture and take a data-driven approach to cloud infrastructure planning and growth.
Kentik’s flexible visualizations and dashboards can provide NetOps and NetEng teams with easy answers for:
Controls, policy, hardening and patching are all still basic tenets of security engineering and operations. But incident response is also a key capability for modern security teams. Competent incident response requires data — lots of it, and fast. Kentik’s ability to let users quickly navigate through a comprehensive log of network activity provides insight for security teams which is both broad and detailed. Since the network is both the point of entry and internal transport for threats, VPC Flow Logs provide pervasive instrumentation of potential threat activity to, from, and within GCP projects.
Kentik’s fast, detailed archive of all VPC network activity can provide SecOps teams with the details they need for:
Kentik’s streaming alerting engine also baselines past network activity and provides notifications of potentially malicious activity, like traffic from unexpected geographies, or traffic between host pairs or service pairs that haven’t been seen before.
Cloud data transfer costs 10 times wholesale Internet bandwidth rates, and even inter-VPC traffic costs more than many organizations pay for Internet connectivity. A key use case we see for Kentik customers is proactively monitoring for new applications consuming Internet, cloud interconnects, and inter-VPC traffic and nipping misrouted traffic in the bud. For example, traffic going over the Internet that should be going across private connections, or former customers pounding APIs and causing traffic charges.
This is an extension of a classic cost optimization use case for running customers’ in-house infrastructure, but it has become even more pressing with higher cloud bandwidth costs.
It’s easy to add VPC Flow Logs from a GCP project into your Kentik account.
To summarize the steps:
For detailed instructions, see the Kentik for Google VPC article in the Kentik Knowledge Base.
If you need a Kentik account, you can sign up for a free trial here.