Understanding network performance in your cloud environment is essential for maintaining cloud application performance and reliability. However, most organizations find that they lose network visibility when they move to the cloud. This blog highlights five critical cloud network monitoring imperatives for cloud engineers to put in place to ensure the health of public and hybrid cloud environments.
Moving to the cloud makes many things easier. The cloud’s flexibility and elasticity allow you to add compute, storage and other resources rapidly, and to scale up and down as your needs change.
Yet to leverage the cloud’s elasticity to the greatest effect, you must be able to manage your network configurations to ensure that performance remains optimal even as your cloud environment scales up and down.
That’s why network observability is critical in the cloud. The cloud makes it easy to change configurations, but not so easy to verify that changes are optimal — especially when it comes to cloud networking, which can involve a multitude of VPCs, gateways and related services.
How do cloud engineers meet these challenges? They start by focusing on five key best practices for achieving network observability across their entire cloud environment — whether it includes just one public cloud, multiple public clouds or a hybrid combination of public and private resources.
Achieving consistent visibility into your cloud networks requires more than just one data source or set of data points. You need to be able to collect data from multiple facets and layers of your cloud environment, including:
Simply collecting data is only the first step toward network observability for cloud. Just as important is the correlation of data to gain context into what is happening — to be able to understand the data you are seeing.
This means not just correlating what applications and workloads are generating the traffic, but also going by bringing in names, costs information and locations.
For example, you need to know quickly whether a network slowdown impacts production environments or is limited to dev/test. The difference will tell you how to prioritize the incident. Likewise, mapping IP addresses to specific business units or users helps you understand network observability trends not just from the perspective of machine data, but from the business.
In a world where the majority of businesses that use the cloud have or are moving to multiple clouds, and many use a hybrid model, gaining visibility into cloud networks requires a multifaceted approach that tracks traffic flows within and between clouds. Not only that, but you must also be able to monitor traffic flows between different cloud regions and to on-premises.
And again, you need to be able to contextualize all of this traffic flow in terms of business operations. Maybe you use one of your regions in one of your clouds only to host data backups, for instance, whereas another region hosts production data. A slowdown in traffic to the backup region may be less harmful to the business than a networking performance issue with the production region. Context like this is critical for understanding the impact of traffic flow patterns on business operations.
In addition to conventional east-west traffic within a single cloud (or cloud region), you must also measure and analyze north-south traffic as it flows from one cloud data center or your private data center into others.
For new applications developed by engineers who are less familiar with networking practices, it is imperative to spot sub-optimal traffic flows that may introduce unnecessary traffic delays, costs or security risks.
As you measure all of these traffic flows, look not just for minimum and maximum metrics thresholds in order to establish a baseline of normal activity, but also for anomalies that could reveal a network performance or security issue. You’ll want to be able to answer questions such as:
Remember, too, that because egress plays a big role in your cloud computing bill, you’ll want to identify top talkers and top spenders — meaning resources that generate the highest amounts of traffic and associated egress costs — in order to help manage your network spend. Ask questions and visualize answers As you analyze cloud network data, take a question-and-answer approach to interpret the data and understand its impact on the business.
For a longer discussion of questions to ask yourself as you observe your network, and tips on where to find the answers, check out our blog on network telemetry types.
Using a SaaS-based solution for cloud network observability has some distinct advantages. SaaS gives you flexibility in accessing data from the services you need and deploying any network agents that you need to gather telemetry data quickly and easily. SaaS also gives you the ability to gather all of your network temetery data into a single platform optimized for network data. And, because the service is cloud-based, scalability and performance can grow easily with the need.
Delivering a capable network observability service for cloud requires some unique attributes. First, the data platform needs to be optimized for high-cardinality data. Network telemetry comes in dozens of types and hundreds of attributes, and each attribute can have millions of unique values. Second, the SaaS-based platform needs to support massive multi-tenancy. Operational data telemetry platforms need to be able to support dozens of queries. Not just from interactive users via the UI or API, but from other operational systems within the network stack and across an organization’s operational systems. Finally, users need to be able to ask complex questions and get answers in seconds, not hours, to support modern ‘trail of thought’ diagnostic workflows — thus the need for immediate query results.
Maintaining network observability in the cloud requires a fundamentally different approach than engineers take to monitoring networks within a single data center. Not only do cloud teams need to collect more types of data, but they must also be able to correlate and contextualize them in order to associate performance issues with different parts of their networks. Visualization is critical, too, for making sense of the complex flows and layers that shape cloud network performance. And finally, you need a data platform that is capable of supporting the unique requirements of network observability data.