Kubernetes Architecture: Container Orchestration at Scale
Kubernetes (K8s) has become the de facto standard for container orchestration. It automates the deployment, scaling, and management of containerized applications, turning a fleet of machines into a single, programmable platform. Understanding its architecture is essential for anyone building or operating modern distributed systems.
This guide covers the control plane, worker nodes, core abstractions (Pods, Services, Deployments), networking, storage, and how Kubernetes integrates with tools like Helm and service meshes. For deployment patterns on Kubernetes, see Deployment Strategies. For CI/CD integration, see CI/CD Pipeline Design.
The Control Plane
The control plane is the brain of the Kubernetes cluster. It makes global decisions about the cluster (such as scheduling) and detects and responds to cluster events. The control plane components typically run on dedicated master nodes.
API Server (kube-apiserver)
The API server is the front door to Kubernetes. Every interaction — from kubectl commands to internal component communication — goes through the API server. It validates and processes RESTful requests, then persists the resulting state to etcd.
# All kubectl commands go through the API server
kubectl get pods
# Equivalent to: GET /api/v1/namespaces/default/pods
kubectl apply -f deployment.yaml
# Equivalent to: POST /apis/apps/v1/namespaces/default/deployments
etcd
etcd is a distributed key-value store that holds the entire cluster state — every resource definition, every configuration, every secret. It is the single source of truth for the cluster. If etcd is lost, the cluster is lost.
- Uses the Raft consensus algorithm for distributed consistency
- Should run as a 3 or 5 node cluster for high availability
- Back up etcd regularly — it is the most critical component to protect
Scheduler (kube-scheduler)
The scheduler watches for newly created Pods that have no assigned node, then selects a node for them to run on. Scheduling decisions consider:
- Resource requirements: CPU and memory requests/limits
- Affinity/anti-affinity rules: Prefer or avoid certain nodes
- Taints and tolerations: Node restrictions
- Topology spread constraints: Distribute pods across failure domains
Controller Manager (kube-controller-manager)
The controller manager runs a collection of control loops that watch the state of the cluster through the API server and make changes to move the current state toward the desired state. Key controllers include:
- Deployment controller: Manages ReplicaSets and rolling updates
- ReplicaSet controller: Ensures the desired number of pod replicas
- Node controller: Monitors node health and evicts pods from unhealthy nodes
- Job controller: Manages batch jobs to completion
# The reconciliation loop in pseudocode
while true:
desired_state = read_from_api_server()
current_state = observe_cluster()
if current_state != desired_state:
take_action_to_reconcile(current_state, desired_state)
sleep(reconciliation_interval)
Worker Nodes
Worker nodes run the actual application workloads. Each node runs three essential components:
Kubelet
The kubelet is an agent that runs on every worker node. It watches for PodSpecs assigned to its node and ensures the described containers are running and healthy. If a container crashes, the kubelet restarts it according to the pod restart policy.
Kube-Proxy
Kube-proxy maintains network rules on each node that allow communication to Pods from inside or outside the cluster. It implements the Kubernetes Service abstraction by programming iptables rules or IPVS entries to load-balance traffic across pod endpoints.
Container Runtime
The container runtime is responsible for pulling images and running containers. Kubernetes supports any runtime that implements the Container Runtime Interface (CRI):
- containerd: The most common runtime, used by Docker and standalone
- CRI-O: Lightweight runtime designed specifically for Kubernetes
Core Kubernetes Abstractions
Pods
A Pod is the smallest deployable unit in Kubernetes. It wraps one or more containers that share networking and storage. In practice, most pods contain a single application container, but sidecar patterns (like a logging agent or service mesh proxy) add additional containers.
apiVersion: v1
kind: Pod
metadata:
name: my-app
labels:
app: my-app
spec:
containers:
- name: app
image: my-app:v1.2.3
ports:
- containerPort: 8080
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 15
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
Services
A Service provides a stable network identity for a set of Pods. Pods are ephemeral — they come and go. Services give them a persistent DNS name and load-balance traffic across healthy pods.
| Service Type | Scope | Use Case |
|---|---|---|
| ClusterIP | Internal only | Inter-service communication |
| NodePort | External via node port | Development, simple exposure |
| LoadBalancer | External via cloud LB | Production external access |
| ExternalName | DNS alias | Mapping to external services |
apiVersion: v1
kind: Service
metadata:
name: my-app-service
spec:
type: ClusterIP
selector:
app: my-app
ports:
- protocol: TCP
port: 80
targetPort: 8080
Deployments
A Deployment manages a ReplicaSet and provides declarative updates to Pods. It is the most common way to run stateless applications on Kubernetes. Deployments support rolling updates and rollbacks out of the box.
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 3
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app
image: my-app:v1.2.3
ports:
- containerPort: 8080
StatefulSets
StatefulSets are for stateful applications that need stable network identities and persistent storage — databases, message brokers, and distributed systems like Kafka or Elasticsearch. Unlike Deployments, StatefulSets guarantee:
- Ordered deployment and scaling: Pods are created sequentially (pod-0, pod-1, pod-2)
- Stable network identity: Each pod gets a predictable DNS name (pod-0.service-name)
- Persistent storage: Each pod gets its own PersistentVolumeClaim that survives pod restarts
ConfigMaps and Secrets
ConfigMaps store non-confidential configuration data. Secrets store sensitive data like passwords, tokens, and TLS certificates. Both can be mounted as files or injected as environment variables:
# ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
DATABASE_HOST: "postgres.default.svc.cluster.local"
LOG_LEVEL: "info"
---
# Secret
apiVersion: v1
kind: Secret
metadata:
name: app-secrets
type: Opaque
data:
DATABASE_PASSWORD: cGFzc3dvcmQxMjM= # base64 encoded
Kubernetes Networking Model
Kubernetes networking follows three fundamental rules:
- Every Pod gets its own IP address
- Pods on any node can communicate with pods on any other node without NAT
- Agents on a node can communicate with all pods on that node
This flat networking model is implemented by CNI (Container Network Interface) plugins such as Calico, Cilium, Flannel, and Weave. For service-to-service communication patterns, see API Gateway.
Helm: The Kubernetes Package Manager
Helm packages Kubernetes manifests into reusable charts. A chart is a collection of templates that generate Kubernetes YAML based on configurable values:
# Install a chart
helm install my-release stable/nginx-ingress --set controller.replicaCount=3 --set controller.service.type=LoadBalancer
# Upgrade a release
helm upgrade my-release stable/nginx-ingress --set controller.replicaCount=5
# Rollback a release
helm rollback my-release 1
Service Mesh Integration
A service mesh like Istio or Linkerd adds a sidecar proxy to every pod, providing transparent traffic management, observability, and security without changing application code. Key capabilities include:
- Mutual TLS: Automatic encryption between all services
- Traffic splitting: Canary deployments and A/B testing at the mesh level
- Circuit breaking: Prevent cascading failures
- Distributed tracing: Trace requests across service boundaries
For more on resilience patterns like circuit breaking, see Microservices Architecture.
Production Readiness Checklist
| Category | Item | Why It Matters |
|---|---|---|
| Resources | Set CPU/memory requests and limits | Prevents resource starvation |
| Health | Configure liveness and readiness probes | Enables self-healing |
| Scaling | Set up HorizontalPodAutoscaler | Handles traffic spikes |
| Storage | Use PersistentVolumeClaims for state | Survives pod restarts |
| Security | Use NetworkPolicies and RBAC | Limits blast radius |
| Observability | Export metrics, logs, and traces | Enables debugging |
Kubernetes is a complex system, but its architecture is elegant — a declarative API backed by control loops that continuously reconcile desired state with actual state. Master the core abstractions (Pods, Services, Deployments), understand the control plane, and build from there. Pair Kubernetes with solid CI/CD pipelines and smart deployment strategies to create a platform that can run anything at any scale.