Kentik - Network Observability
Back to Blog

Five Issues Your NetOps Team Will Face in the Cloud

Kevin Woods

Cloud Networks


For organizations of all types and sizes, hybrid cloud environments pose a substantial challenge. In our new article, we learn about the five biggest issues that NetOps teams face in the cloud.

For organizations of all types and sizes, networks are more than just a line item on a departmental budget. Modern digital enterprises understand that long-term competitiveness depends on maximizing their IT assets to create exceptional user experiences. As part of this shift, business managers and infrastructure leaders are expanding their definition of “network performance” from reactive monitoring of flat networks to a global, real-time awareness of the entire network, including all hardware, software, and third-party components.

To deliver on these expanded expectations of network performance, network and infrastructure engineering teams need greater visibility into these complex, hybrid cloud infrastructures. Gartner observes that “new dynamic network architectures are affecting the efficacy of traditional network monitoring stacks,” and predicts that “by 2024, 50% of network operations teams will be required to re-architect their network monitoring stack due to the impact of hybrid networking — a significant increase from 20% in 2019.”

Network and infrastructure engineering teams are caught in a vice: deliver networks with ever-higher functionality, reliability, performance, and security while coping with the increasing complexity and uncertainty.

For network and infrastructure teams to address and thrive in the era of hybrid clouds, they need to adapt to these five things:

1. Clouds and data centers can have very different architectures

As more and more applications are moved to the public or private cloud or are natively built in the cloud, the increased use of containers and microservices is fundamentally changing the visibility and flow of network traffic. Accenture notes, “As a result of digital decoupling and the adoption of microservices, applications are evolving to more complex patterns and topologies, increasingly requiring more dynamic underlying compute, storage, and networking infrastructure. Cloud-native patterns and technologies are typically more ephemeral than traditional environments.”

As such, monitoring tools are forced to engage alternative telemetry methods to gain visibility into container-to-container traffic by leveraging APIs from orchestration tools and mesh services. In addition, they need to ingest new types of data (such as VPC flow logs and NSG VNet flow) to monitor flows between services in different virtual cloud subnets.

Distributed architectures to accommodate microservices are also changing the intra-data center traffic topology, creating a highly dynamic network between services and tiers. According to Cisco’s Global Cloud Index, east-west traffic will represent 85% of total data center traffic by 2021, and north-south traffic will account for the remaining 15% of the traffic associated with data centers.

Given this, it’s essential to recognize that data center architectures are highly dynamic. Network architecture upgrades, device upgrades, acquisitions, and changing business priorities mean that all data centers are unique. As a result, each needs to be accurately described by network performance tools so network and infrastructure teams can have a fighting chance to do their job effectively and support the organization’s mission.

Public cloud architectures differ from private clouds and data centers, but there are differences between public clouds. Dealing with more than one cloud provider is not at all unusual. A Gartner survey of public cloud users found that 81% of respondents said they were working with two or more providers. The Flexera State of the Cloud Report found that enterprises employed an average of 2.2 public and 2.2 private clouds. This change in the IT landscape—where applications, data, and users fluidly interact—means network and infrastructure teams cannot use siloed legacy monitoring tools.

Why not? Data center architectures are radically changing to accommodate new paradigms like containerized microservices while still supporting legacy apps, incorporating mergers and acquisitions, and keeping pace with advancing technology. This requires an investment in new tooling to bridge visibility gaps. For instance, for microservices deployed in the cloud, network and infrastructure teams cannot simply assume that the cloud infrastructure runs all the time perfectly and does not require monitoring. Quite the opposite is true. When hybrid environments are involved, network and infrastructure teams are responsible for the entirety of the application delivery and user experience across all infrastructures and networks. That’s why fragmented tooling is inadequate. This is yet another example of how strategic IT decisions (like moving to the cloud and adopting containerization) sometimes are made without fully considering and understanding the impact on network requirements and monitoring capabilities.

2. Highly diverse toolsets and increased vendor complexity

While monitoring connectivity within and between on-prem, internet, and WAN infrastructures is a traditional capability of network performance monitoring and diagnostic (NPMD) solutions, the addition of cloud-specific network management solutions from public cloud providers creates an unmanageable number of tools and vendor complexity.

According to a survey of IT professionals reported in “The State of Cloud Monitoring,” 35% of respondents use up to five monitoring tools to keep tabs on hybrid cloud and multi-cloud environments. Each tool has a specific purpose—device health, synthetic performance monitoring, traffic flow metrics, packet data capture, etc.—and thus tells only an isolated portion of the story.

The same research found that nearly 70% felt “public cloud monitoring is more difficult than monitoring data centers and private clouds,” and a stunning 95% said they had experienced one or more performance issues caused by a lack of visibility into the public cloud. Only 19% said they had all the data they needed to monitor a hybrid network (compared to 82% who had enough information to monitor their on-prem network).

3. There is no single view of network activity across a hybrid infrastructure

There are backbone network maps, capacity maps between sites and devices, cloud visualization maps for a specific cloud, and edge maps for WAN/SD-WAN edge networks. But, there needs to be a consolidated, fully integrated map focused on the performance, health, and traffic analytics required to operate a hybrid cloud network.

With applications and data shifting between on-prem and cloud environments—and the absence of a single point of observability—there are gaps in visibility, leading to gaps in network intelligence.

The bottom line is that hybrid clouds are complex and challenging to work with. Here’s why:

Using traditional network management tools, network and infrastructure professionals tracking down bandwidth, performance, and availability problems need help to piece the network together in time to fix critical issues. Discovering which devices and interfaces make up a data path takes time and effort. Correlating these elements’ traffic, health, and performance consumes valuable time that could be spent optimizing or adding features.

Automation, orchestration, and software-defined networking further complicate matters by constantly shifting where applications are located and redefining how they connect. Without tooling built to comprehend this new reality, the network — and those who run it — are at a disadvantage.

4. Limited visibility leads to decreased agility

Visibility and intelligence gaps created by separate network monitoring and diagnostic tools slow down troubleshooting, increase infrastructure engineering burn-out, and prevent proactive operations by keeping the team consumed with reactive tasks. Fragmented apps lead to increased risk of outages, reachability issues, errant traffic flows, increased cyber threats, impact from unknown dependencies, and other threats to network stability. Network and infrastructure teams can’t troubleshoot as fast as necessary when apps and data have moved to the cloud.

Even as networks are ever-more critical to business success, network operations are becoming increasingly complex, with disruptions intrinsically harder to foresee and recover from.

These visibility gaps create an agility gap that:

  • Undermines mean time to resolution (MTTR) with disparate and uncorrelated data
  • Diminishes analytics intelligence used for automation and prevention
  • Increases the chances of operator error from constantly working in reactive mode

5. Falling behind their organizations’ increased expectations

Network and infrastructure managers should be included in decisions about balancing on-prem and cloud deployments. Decisions about moving to the cloud are usually made for reasons of cost containment, flexibility, business agility, and efficiency. They are seldom, if ever, made considering the impact on network and infrastructure teams regarding network performance monitoring.

This presents network engineers and operators with a daunting challenge. New engineers come into an environment without context, and the maps available don’t provide any. Yet, speed to insight for a new data center network engineer is critical. Incumbent engineers, meanwhile, can’t keep up with all the changes, and their tools don’t reflect or accommodate the new and constantly evolving environment.

Compounding these challenges are business pressures to optimize operations across multiple infrastructures, especially when the business is migrating from data centers to clouds, acquiring additional infrastructures through mergers and acquisitions and integrating with legacy systems.

Looking ahead

If infrastructure engineering teams are to meet the increasingly high expectation being placed on them, they need to adopt a proactive approach to meeting the challenges posed by hybrid cloud environments.

And to accomplish this, they need the right kind of tools. Most helpful would be a comprehensive, integrated platform that provides visibility across public/private clouds, on-prem networks, SaaS apps, and other critical workloads. It would be a single place to go for network maps that help operators visualize — in real-time — every aspect of their network and keep track of changes to their infrastructure.

It would be pretty cool if someone built something like that.

Explore more from Kentik

We use cookies to deliver our services.
By using our website, you agree to the use of cookies as described in our Privacy Policy.