Kubernetes and the Service Mesh Era: Exploring the Role of Kubernetes Service Mesh in Networking

January 5, 2023Cloud Networks Kubernetes

Summary

In this blog, we discuss how Kubernetes approaches networking, the gaps in networking from Kubernetes, and how Kubernetes service meshes address those gaps.

The adoption of Kubernetes in enterprise organizations is revolutionizing the way businesses manage their IT infrastructures. Automating deployment, scaling, and management of containerized applications allows organizations to embrace a cloud-native paradigm at scale and more easily employ best practices, such as microservices and DevSecOps.

But as with all tech, Kubernetes has its limits. Kelsey Hightower famously tweeted that “Kubernetes is a platform for building platforms. It’s a better place to start; not the endgame.”

And networking is arguably the area where this quote is most applicable. Kubernetes provides a generic networking baseline—a flat address model, services for discovery, simple ingress/egress, and network policies—but anything beyond these basics must come from an extension or integration.

Service meshes were built to close this gap by providing advanced services around traffic management, security, and observability.

Let’s start with a background on the Kubernetes networking model.

Understanding the Kubernetes networking model

Kubernetes runs workloads (services, applications, and jobs) in pods. Each pod contains one or more containers. Each pod also has a unique (within the cluster) private IP address.

All of the containers in a pod run on the same node and (since they share the same IP address) can communicate with one another over localhost. Within a cluster, pods are directly reachable by these private IP addresses. Kubernetes has an internal IP service that is used for this communication and load balancing inside the cluster.

However, communication with a pod outside the cluster is a little more complicated. There are several options:

Provide a public IP address to pods (not recommended)
Use a NodePort service that exposes a public IP port
Use a LoadBalancer service with a public IP

In short, a service in Kubernetes is a way to group a set of pods and present them as a single entity via an IP address and/or DNS name. When the service receives a request, it delegates it to one of the backing pods.

Pods can discover services in two ways: environment variables and DNS. The environment of each pod contains the endpoint to every service in the cluster, such as REDIS_SERVICE_HOST_PORT=10.0.0.11:6738. Every service has a DNS name based on its name and namespace, such as <service>.<namespace>.svc.cluster.local.

Let’s look at several other important networking constructs that Kubernetes works with.

DNS

DNS is a staple of networking beyond Kubernetes. Kubernetes comes with its own internal DNS server, CoreDNS, to create DNS records for pods and services. CoreDNS is its own CNCF project.

Kubernetes NetworkPolicies

Kubernetes NetworkPolicies allow you to manage traffic between pods at the host and port level. You can set policies for ingress or egress within the cluster.

Ingress

Ingress resources control HTTP and HTTPS traffic from the outside world to services inside the cluster. You can define ingress rules, but Kubernetes actually doesn’t know what to do with these rules. This is one of the extensibility points where you need to deploy a third-party ingress controller to watch for the ingress resources you define and enforce the rules.

Gateway API

The Gateway API is the evolution of Ingress. The Gateway API is more expressive and flexible. It is broken into multiple resources that allow application developers and cluster administrators to cooperate without stepping on toes.

The network requirements of enterprise systems

As you can see, Kubernetes provides a robust and clean networking model. Many of the fundamental building blocks of networking are supported. However, as an enterprise organization, you probably need much more.

For example:

Sophisticated routing
Strong security
Observability
Inter-service authentication and authorization
Load balancing
Health checks
Timeouts and retries
Fault injection
Bulkhead
Rate limiting

Before the cloud-native era, the landscape for enterprise organizations was proprietary. Organizations ran their systems in private data centers. Infrastructure was mostly static, with separate IT teams responsible for capacity planning. Software was typically a large monolith with long release cycles.

To handle the enterprise networking requirements mentioned above, the common practice to ensure adherence to policies and interconnectivity between subsystems was to have standard client libraries used by all software teams. This, of course, led to a lack of flexibility, over-budget and past-deadline project failures, and slow decay, as there was no way for complex software systems to stay up to date with modern innovation.

Let’s fast forward to the cloud-native age!

Networking in the cloud-native age

In the modern age, software systems are deployed in the cloud, on multiple clouds, private data centers, and even edge locations. The infrastructure is dynamic. The software comprises hundreds and thousands of microservices that may be implemented in multiple programming languages. The infrastructure and application development follow DevOps practices for continuous delivery. Security is integrated into the process following DevSecOps practices. Different components of the system are released constantly.

This was a boon for productivity and flexibility—but brought on new problems of management, control, and policy enforcement. All these microservices implemented in multiple languages somehow need to interact. Developers and administrators need to understand the flow of information, be able to detect and mitigate problems, and secure the data and the infrastructure.

Enter the service mesh.

What is a Kubernetes service mesh?

A service mesh is a network software layer that employs a control plane to set up policies and a data plane with proxies or node agents to intercept network traffic and implement those policies.

A service mesh serves as a software layer that oversees and manages traffic within a network. Similar to Kubernetes, the service mesh comprises a data plane and a control plane.

The data plane

The data plane is composed of proxies that run alongside each service. These proxies capture every request between services on the network, apply relevant policies, determine how to process the request, and—assuming the request is approved and authorized—decide the routing path.

The data plane can adopt one of two proxy models: the sidecar model or the host model. In the sidecar model, a mesh proxy is connected as a sidecar container to every pod. In the host model, each node operates an agent that functions as the proxy for all workloads running on that node.

The control plane

The control plane offers an API that mesh operators use to establish policies dictating the mesh’s behavior. Additionally, the control plane facilitates communication by identifying all data plane proxies and updating them with the service locations throughout the mesh.

The benefits of service mesh

A service mesh has many benefits in a modern, large, and dynamic networking environment, such as Kubernetes-based systems, where new workloads are deployed constantly, pods come and go, and instances scale up or down.

The service mesh externalizes all the networking concerns from the applications. Now they can be managed and updated centrally. By offloading all networking concerns to the service mesh, service developers can focus their efforts solely on their application and business logic.
With a service mesh, you can upgrade your service mesh, and everyone immediately enjoys the latest and greatest transparently. Traditionally, to introduce a change or upgrade to a client library, you would need to negotiate with each team individually, supporting multiple versions of libraries and across multiple programming languages.
You benefit from the efforts of experts that keep evolving, improving, and optimizing the service mesh. The service mesh is also used and battle-tested by many organizations. This means that problems that might impact you may have been discovered and reported by other users.
As a central component that touches all of your services, the service mesh can handle cross-cutting concerns—such as observability, health checks, and access policy enforcement—across all services in your Kubernetes-based system.
The service mesh can add a layer of security to an enterprise’s inter-service communication by employing a zero-trust approach to access and using mTLS to encrypt traffic for secure communication. Additionally, limiting access from application to application helps to ensure that a malicious attacker who exploits one service cannot move laterally through your network to exploit other services.

Service meshes on Kubernetes

Service mesh fits Kubernetes like a glove. Kubernetes makes it easy for service meshes to integrate with the platform due to its extensibility. The synergy between Kubernetes and service meshes is powerful as the service builds on top of the basic Kubernetes networking model.

For large systems—in particular, systems composed of multiple Kubernetes clusters—the service mesh becomes a standard add-on. Once enterprises begin working with multiple clusters, which might spread across different clouds, the service mesh becomes an essential component for properly facilitating and securing inter-service communication.

A quick review of popular service meshes on Kubernetes

If you’re ready to implement a service mesh on top of Kubernetes, there are many choices. Let’s look at a few of them and their strengths and attributes.

Istio is arguably the most popular service mesh for Kubernetes. Google, IBM, and Lyft originally developed it. It uses the Envoy project from Lyft as its data plane.
Linkerd is the first service mesh. Its claim to fame is that it is more performant and less complicated than Istio. It implements its own data plane using Rust.
Kuma is a service mesh originally developed by Kong, who also has an enterprise service mesh called Kong Mesh built on top of Kuma. Kuma also uses Envoy as the control plane. Its claim to fame is that it allows connecting Kubernetes clusters with non-Kubernetes workloads running on VMs (but Istio now has this capability, too).

Here are several other service meshes you may want to explore:

Traefik Mesh (node agents as control plane)
Open Service Mesh (heavily pushed by Microsoft, can be enabled on AKS as an add-on)
AWS App Mesh (AWS proprietary service mesh, strong integration with EKS, ECS, and EC2)
Cilium Mesh (up and comer service mesh using eBPF in the data plane)

Drawbacks of not utilizing a Kubernetes service mesh

As powerful as Kubernetes is, not utilizing a service mesh can lead to several drawbacks in networking and management of your containerized applications. When operating without a Kubernetes service mesh, you may face several challenges:

Lack of observability

Without a service mesh, it becomes difficult to gain insights into the traffic flow and performance of your microservices. This lack of visibility can make it challenging to identify issues, optimize performance, and troubleshoot problems within your infrastructure.

Limited traffic management

In the absence of a service mesh, you may find it harder to implement advanced traffic management features such as load balancing, timeouts, retries, and fault injection. This can lead to suboptimal routing decisions and an overall less resilient system.

Security concerns

Not using a service mesh may expose your infrastructure to security vulnerabilities. With a service mesh, you can implement a zero-trust approach and use mTLS to encrypt inter-service communication, ensuring a more secure environment.

Increased complexity

Managing networking configurations and policies for a large number of microservices can become complex without a service mesh. A service mesh helps centralize and standardize the management of networking concerns, simplifying the overall process.

Performance issues

Without a service mesh, you may face performance bottlenecks and scalability issues as your infrastructure grows. Service meshes are designed to handle the dynamic nature of microservices, ensuring that your system remains performant and efficient even as it scales.

Kubernetes, service meshes, and Kentik Cloud

As we’ve seen, Kubernetes with a service mesh is a powerful combination that lets you connect workloads across clouds, data centers, and the edge and enforce policies and best practices.

As an additional benefit, as the service mesh works, it collects a lot of valuable data from flow logs and metrics related to your network traffic. This data can help you create a more robust and reliable system. But being able to understand and use this data in a meaningful way, and make it actionable, can be difficult. A strong and robust observability solution (such as Kentik Cloud) can help you make sense of the data from your service mesh, ensuring your system is cost-effective, healthy, and performs well. It can also help to mitigate incidents and/or attacks.

Conclusion

Kubernetes is a powerful tool for modern cloud infrastructure. Out of the box, it offers some networking capabilities, but by adding a service mesh on top, you gain a long list of benefits. Hopefully, you now understand how Kubernetes and service meshes can work together to create modern and robust enterprise systems.

To learn more about how multi-cluster service meshes solve hybrid and multi-cloud networking complexities, read our previous article, Kubernetes and Cross-cloud Service Meshes.