True observability requires visibility into both the application and network layers. For companies reliant on multi-zonal cloud networks, the days of NetOps existing as a team siloed away from application developers are over.
One of the great successes of software development in the last ten years has been the relatively decentralized approach to application development made available by containerization, allowing for rapid iteration, service-specific stacks, and (sometimes) elegant deployment and orchestration implementations that piece it all together.
At scale, and primarily when carried out in cloud and hybrid-cloud environments, these distributed, service-oriented architectures and deployment strategies create a complexity that can buckle the most experienced network professionals when things go wrong, costs need to be explained, or optimizations need to be made.
In my experience, many of these complexities and challenges can be addressed proactively if organizations include their network specialists in the planning and monitoring of their distributed applications. CTOs and other umbrella decision-makers recognize that software and network engineers must work together to deliver secure and performant applications.
DevOps is blind to the network
While DevOps teams may be skilled at building and deploying applications in the cloud, they may have a different level of expertise when it comes to optimizing cloud networking, storage, and security. Unaddressed, this can lead to unreliable (and unsafe) application environments.
A common assumption among application developers is that cloud environments are highly available and resilient by default. While cloud providers offer highly available and resilient infrastructure, it is still up to application developers to properly configure and manage their cloud resources to ensure optimal performance and availability. As these applications scale, and engineering for reliability comes into the forefront, DevOps engineers begin to rely on networking concepts like load balancing, auto-scaling, traffic management, and network security.
Something else I’ve run into is that DevOps teams often fail to fully appreciate the importance of cost optimization in cloud environments. Cloud resources can be highly flexible and scalable, but they can also be cripplingly expensive if not properly managed. DevOps teams need to be aware of the cost implications of their cloud infrastructure and take steps to optimize their resource usage, such as using reserved instances, automating resource management, peering, and implementing cost monitoring and business context.
Observability and its SRE (site reliability engineer) champions have risen in demand as applications have evolved into these deeply distributed architectures. Observability strategies like collecting, sampling, and analyzing MELT (metrics, events, logs, and traces) telemetry have dramatically improved structural responses to challenges like incident response and system-wide optimizations.
However, realizing this at the organizational level can involve significant stutter steps as applications grow in their capabilities and sophistication, with many observability implementations still containing significant blind spots to network-specific telemetry, such as from transit and ingress gateways, CDNs, IoT, SD-WANs, routers, switches, and so much more.
DevOps and NetOps need to work together
Collaboration is often a two-way street. While DevOps may indeed be “blind to the network,” achieving visibility will involve a lot of work and contribution from NetOps.
Following are a few key ways NetOps and DevOps can collaborate to make more reliable systems.
First and foremost, having NetOps at the table means allowing network specialists to provide input at the very earliest stages of cloud development. Designing modern applications requires practices like the loose coupling of containerized application stacks, resulting orchestration layers, and data management abstractions like caching, replication, and transformations; these efforts rely on network principles and infrastructure for performance, reliability, and security.
Having an expert perspective on network protocols helps ensure data will be moved securely and with network performance in mind. As this infrastructure is designed and configured across multi-zonal hybrid networks, the NetOps perspective can detail key performance metrics and analysis methods to instrument at the application layer, like when dealing with containers and load balancers. These insights become key as applications mature and find they need to scale in sometimes dramatic and unpredictable ways.
One way to ensure architectural decisions include the perspective of both application and network specialists is to create cross-functional teams. Instead of an IT or infra team trying to manage the hydra of networks and configurations in a completely different fashion, cross-functional teams help ensure each service or development vertical has a NetOps representative from planning through deployment. This personnel shift provides a proactive solution to the networking challenges of highly scaled cloud applications.
Shared tools and processes
As NetOps teams gain a seat at the table, one of the significant shifts for many network professionals will be adopting the methodologies and development pacing of the application developers they are now working with more closely. This can be a big challenge for NetOps, which has historically been afforded very stalwart and insular silos for their work.
But, the complexity of modern networks calls for a change, and adapting network operations to include continuous integration/delivery, automated testing and security scanning, and more human-centered tools for monitoring, alerting, and visualizing information gives application developers and network operators a shared understanding of the systems they both support.
Unified telemetry + data management
DevOps observability has provided a great roadmap for network engineers on better collecting and analyzing the massive amount of data generated by today’s applications. Sampling strategies, technologies like tracing, and contextual instrumentation have made application-layer data a gold mine for root cause analysis, customer or region-specific optimizations, and an overall improved capability to “ask anything” about an application in real-time.
This step of instrumentation (providing high-cardinality context to network layer data) is a critical step in unifying the data available to both DevOps and NetOps and sets up automation efforts that better take into account the full spectrum of components within a system.
Automation is critical to both DevOps and NetOps. IaC (infrastructure-as-code) can automate vital tasks such as provisioning infrastructure, configuring servers, and deploying applications, granting both teams velocity, reducing the risk of human error, and ensuring that these workflows are taking into account both application and networking concerns.
True observability requires visibility into both the application and network layers. For companies reliant on multi-zonal cloud networks, the days of NetOps existing as a team siloed away from application developers must be over. The complexity of modern application environments creates a host of issues around traffic management, network-specific monitoring, and security. By bringing network specialists into the very earliest stages of application development, software engineers can ensure applications are being built in the most cost-effective, reliable, and threat-averse ways possible.