As the complexity of microservice applications continues to grow, it’s becoming extremely difficult to track and manage interactions between services. Understanding the network footprint of applications and services is now essential for delivering fast and reliable services in cloud-native environments. Networking is not evaporating into the cloud but instead has become a critical component that underpins every part of modern application architecture.
If you are at the beginning of the journey to modernize your application and infrastructure architecture with Kubernetes, it’s important to understand how service-to-service communication works in this new world. This blog post is a good starting point to understand the networking model behind Kubernetes and how to make things simpler and more efficient.
(For container cluster networking, there is the Docker model and the Kubernetes model. In this blog, we will focus on discussing Kubernetes.)
Let’s first review a few basic Kubernetes concepts:
Pod: “A pod is the basic building block of Kubernetes,” which encapsulates containerized apps, storage, a unique network IP, and instructions on how to run those containers. Pods can either run a single containerized app or run multiple containerized apps that need to work together. (“Containerized app” and “container” will be used interchangeably in this post.)
Node: A node is a worker machine in Kubernetes. It may be a VM or physical machine. Each node contains the services necessary to run pods and is managed by Kubernetes master components.
Service/Micro-service: A Kubernetes Service is an abstraction which defines a logical set of pods and a policy by which to access them. Services enable loose coupling between dependent pods.
There are several layers of networking that operate in a Kubernetes cluster. That means there are distinct networking problems to solve for different types of communications:
Sometimes, 1 & 2 are grouped as “pod networking” and 3 & 4 as “service networking.”
In the Kubernetes networking model, in order to reduce complexity and make app porting seamless, a few rules are enforced as fundamental requirements:
This set of requirements significantly reduces the complexity of network communication for developers and lowers the friction porting apps from monolithic (VMs) to containers. However, this type of networking model with a flat IP network across the entire Kubernetes cluster creates new challenges for network engineering and operates teams. This is one of the reasons we see so many network solutions and implementations that are available and documented in kubernetes.io today.
While meeting the three requirements above, each of the following implementations tries to solve different kinds of networking problems:
Among all these implementations, Flannel, Calico, and Weave Net are probably the most popular ones that are used as network plugins for the Container Network Interface (CNI). CNI, as its name implies, can be seen as the simplest possible interface between container runtimes and network implementations, with the goal of creating a generic plugin-based networking solution for containers. As a CNCF project, it’s now become a common interface industry standard.
Some quick and additional information about Flannel, Calico, and Weave Net:
Application developers usually assume that the network below layer 4 “just works.” They want to handle service communication in Layers 4 through 7 where they often implement functions like load-balancing, service discovery, encryption, metrics, application-level security and more. If the operations team does not deliver these services within the infrastructure, developers will often roll up their sleeves and fill this gap by changing their code in the application. The downsides are obvious:
…And that’s where service meshes come in.
One of the main benefits of a service mesh is that it allows operations teams to introduce monitoring, security, failover, etc. without developers needing to change their code.
So far, the most quoted definition for service mesh is probably from William Morgan, Buoyant CEO:
“A service mesh is a dedicated infrastructure layer for handling service-to-service communication. It’s responsible for the reliable delivery of requests through the complex topology of services that comprise a modern, cloud-native application. In practice, the service mesh is typically implemented as an array of lightweight network proxies that are deployed alongside application code, without the application needing to be aware.”
With service meshes, developers can bring their focus back to their primary task of creating business value with their applications, and have the operations team take responsibility for service management on a dedicated infrastructure layer (sometimes called “Layer 5”). This way, it’s possible to maintain consistent and secure communication among services.
Familiar to most network engineers, service meshes have two planes: a control plane and a data plane (a.k.a. forwarding plane). If traditional networking planes focus on routing and connectivity, the service mesh planes focus more on application communication such as HTTP, gRPC, etc.
The data plane basically touches every data packet in the system to make sure things like service discovery, health checking, routing, load balancing, and authentication/authorization work.
The control plane acts as a traffic controller of the system that provides a centralized API that decides traffic policies, where services are running, etc. It basically takes a set of isolated stateless sidecar proxies and turns them into a distributed system.
There are many service mesh solutions out there, the most-used are probably Istio, Linkerd, Linkerd2 (formerly Conduit), and Consul Connect. All of those are 100% Kubernetes-specific. Below is a comparison table from the Kubedex community if you want to get nerdier on this topic.
As it’s probably the most popular service mesh architecture today, let’s take a closer look into Istio.
Istio is an open platform that lets you connect, secure, control, and observe services in large hybrid and multi-cloud deployments. Istio has strong integration with Kubernetes, plus it has backing from Google, Lyft, and IBM. It also has good traffic management and security features. All of these have helped Istio to gain strong momentum in the cloud-native community.
Now let’s take a look at simple examples of a Kubernetes cluster with and without Istio, so we can look at the benefits of installing Istio without any changes to the application code running in the cluster.
Take this Sockshop for example: it’s a demonstration e-commerce site that sells fancy socks online. There are a few microservices within this application: “catalog,” “cart,” “order,” “payment,” “shipping,” “user,” and so on. Because each microservice needs to talk to others via remote procedure calls (RPCs), the application forms a “mesh of service communication” over the network given the distributed nature of microservices.
Without a dedicated layer to handle service-to-service communication, developers need to add a lot of extra code in each service to make sure the communication is reliable and secure.
With Istio installed, all of the microservices will be deployed as containers with a sidecar proxy container (e.g. Envoy) alongside. A sidecar ”intercepts all network communication between microservices, then configures and manages Istio using its control plane functionality.” With no changes to the application itself, even if all the components of the application are written in different languages. Istio is well-positioned to generate detailed metrics, logs, and traces about service-to-service communication, policy enforcement, mutual authentication, as well as providing thorough visibility throughout the mesh.
Plus, it’s straightforward to get Istio setup in just a few steps without extra configuration (See the Quick Start Guide). This ease of deployment has also been one of Istio’s differentiators compared to other service mesh architectures.
Networking that underlies service communication has never been so critical. Because modern applications are so agile and dynamic, networks are a key part of ensuring application availability.
Service meshes and the Kubernetes Networking Model are a sweet combination that can make life easier for developers, and can make applications more portable across different types of infrastructure. Meanwhile, ops teams get a great way to make service-to-service communication easier and more reliable.
At Kentik, we embrace cutting-edge technology such as Kubernetes and service meshes even as they continue to evolve. Today, Istio metrics are one of many data sources ingested by the Kentik platform to enrich network flow data with service mesh context to provide complete observability for cloud-native environments.