Kubernetes Architecture: Container Orchestration at Scale

Kubernetes (K8s) has become the de facto standard for container orchestration. It automates the deployment, scaling, and management of containerized applications, turning a fleet of machines into a single, programmable platform. Understanding its architecture is essential for anyone building or operating modern distributed systems.

This guide covers the control plane, worker nodes, core abstractions (Pods, Services, Deployments), networking, storage, and how Kubernetes integrates with tools like Helm and service meshes. For deployment patterns on Kubernetes, see Deployment Strategies. For CI/CD integration, see CI/CD Pipeline Design.

The Control Plane

The control plane is the brain of the Kubernetes cluster. It makes global decisions about the cluster (such as scheduling) and detects and responds to cluster events. The control plane components typically run on dedicated master nodes.

API Server (kube-apiserver)

The API server is the front door to Kubernetes. Every interaction — from kubectl commands to internal component communication — goes through the API server. It validates and processes RESTful requests, then persists the resulting state to etcd.

# All kubectl commands go through the API server
kubectl get pods
# Equivalent to: GET /api/v1/namespaces/default/pods

kubectl apply -f deployment.yaml
# Equivalent to: POST /apis/apps/v1/namespaces/default/deployments

etcd

etcd is a distributed key-value store that holds the entire cluster state — every resource definition, every configuration, every secret. It is the single source of truth for the cluster. If etcd is lost, the cluster is lost.

Uses the Raft consensus algorithm for distributed consistency
Should run as a 3 or 5 node cluster for high availability
Back up etcd regularly — it is the most critical component to protect

Scheduler (kube-scheduler)

The scheduler watches for newly created Pods that have no assigned node, then selects a node for them to run on. Scheduling decisions consider:

Resource requirements: CPU and memory requests/limits
Affinity/anti-affinity rules: Prefer or avoid certain nodes
Taints and tolerations: Node restrictions
Topology spread constraints: Distribute pods across failure domains

Controller Manager (kube-controller-manager)

The controller manager runs a collection of control loops that watch the state of the cluster through the API server and make changes to move the current state toward the desired state. Key controllers include:

Deployment controller: Manages ReplicaSets and rolling updates
ReplicaSet controller: Ensures the desired number of pod replicas
Node controller: Monitors node health and evicts pods from unhealthy nodes
Job controller: Manages batch jobs to completion

# The reconciliation loop in pseudocode
while true:
    desired_state = read_from_api_server()
    current_state = observe_cluster()
    if current_state != desired_state:
        take_action_to_reconcile(current_state, desired_state)
    sleep(reconciliation_interval)

Worker Nodes

Worker nodes run the actual application workloads. Each node runs three essential components:

Kubelet

The kubelet is an agent that runs on every worker node. It watches for PodSpecs assigned to its node and ensures the described containers are running and healthy. If a container crashes, the kubelet restarts it according to the pod restart policy.

Kube-Proxy

Kube-proxy maintains network rules on each node that allow communication to Pods from inside or outside the cluster. It implements the Kubernetes Service abstraction by programming iptables rules or IPVS entries to load-balance traffic across pod endpoints.

Container Runtime

The container runtime is responsible for pulling images and running containers. Kubernetes supports any runtime that implements the Container Runtime Interface (CRI):

containerd: The most common runtime, used by Docker and standalone
CRI-O: Lightweight runtime designed specifically for Kubernetes

Core Kubernetes Abstractions

Pods

A Pod is the smallest deployable unit in Kubernetes. It wraps one or more containers that share networking and storage. In practice, most pods contain a single application container, but sidecar patterns (like a logging agent or service mesh proxy) add additional containers.

apiVersion: v1
kind: Pod
metadata:
  name: my-app
  labels:
    app: my-app
spec:
  containers:
    - name: app
      image: my-app:v1.2.3
      ports:
        - containerPort: 8080
      resources:
        requests:
          cpu: "250m"
          memory: "256Mi"
        limits:
          cpu: "500m"
          memory: "512Mi"
      livenessProbe:
        httpGet:
          path: /healthz
          port: 8080
        initialDelaySeconds: 15
        periodSeconds: 10
      readinessProbe:
        httpGet:
          path: /ready
          port: 8080
        initialDelaySeconds: 5
        periodSeconds: 5

Services

A Service provides a stable network identity for a set of Pods. Pods are ephemeral — they come and go. Services give them a persistent DNS name and load-balance traffic across healthy pods.

Service Type	Scope	Use Case
ClusterIP	Internal only	Inter-service communication
NodePort	External via node port	Development, simple exposure
LoadBalancer	External via cloud LB	Production external access
ExternalName	DNS alias	Mapping to external services

apiVersion: v1
kind: Service
metadata:
  name: my-app-service
spec:
  type: ClusterIP
  selector:
    app: my-app
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080

Deployments

A Deployment manages a ReplicaSet and provides declarative updates to Pods. It is the most common way to run stateless applications on Kubernetes. Deployments support rolling updates and rollbacks out of the box.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
        - name: my-app
          image: my-app:v1.2.3
          ports:
            - containerPort: 8080

StatefulSets

StatefulSets are for stateful applications that need stable network identities and persistent storage — databases, message brokers, and distributed systems like Kafka or Elasticsearch. Unlike Deployments, StatefulSets guarantee:

Ordered deployment and scaling: Pods are created sequentially (pod-0, pod-1, pod-2)
Stable network identity: Each pod gets a predictable DNS name (pod-0.service-name)
Persistent storage: Each pod gets its own PersistentVolumeClaim that survives pod restarts

ConfigMaps and Secrets

ConfigMaps store non-confidential configuration data. Secrets store sensitive data like passwords, tokens, and TLS certificates. Both can be mounted as files or injected as environment variables:

# ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  DATABASE_HOST: "postgres.default.svc.cluster.local"
  LOG_LEVEL: "info"
---
# Secret
apiVersion: v1
kind: Secret
metadata:
  name: app-secrets
type: Opaque
data:
  DATABASE_PASSWORD: cGFzc3dvcmQxMjM=  # base64 encoded

Kubernetes Networking Model

Kubernetes networking follows three fundamental rules:

Every Pod gets its own IP address
Pods on any node can communicate with pods on any other node without NAT
Agents on a node can communicate with all pods on that node

This flat networking model is implemented by CNI (Container Network Interface) plugins such as Calico, Cilium, Flannel, and Weave. For service-to-service communication patterns, see API Gateway.

Helm: The Kubernetes Package Manager

Helm packages Kubernetes manifests into reusable charts. A chart is a collection of templates that generate Kubernetes YAML based on configurable values:

# Install a chart
helm install my-release stable/nginx-ingress   --set controller.replicaCount=3   --set controller.service.type=LoadBalancer

# Upgrade a release
helm upgrade my-release stable/nginx-ingress   --set controller.replicaCount=5

# Rollback a release
helm rollback my-release 1

Service Mesh Integration

A service mesh like Istio or Linkerd adds a sidecar proxy to every pod, providing transparent traffic management, observability, and security without changing application code. Key capabilities include:

Mutual TLS: Automatic encryption between all services
Traffic splitting: Canary deployments and A/B testing at the mesh level
Circuit breaking: Prevent cascading failures
Distributed tracing: Trace requests across service boundaries

For more on resilience patterns like circuit breaking, see Microservices Architecture.

Production Readiness Checklist

Category	Item	Why It Matters
Resources	Set CPU/memory requests and limits	Prevents resource starvation
Health	Configure liveness and readiness probes	Enables self-healing
Scaling	Set up HorizontalPodAutoscaler	Handles traffic spikes
Storage	Use PersistentVolumeClaims for state	Survives pod restarts
Security	Use NetworkPolicies and RBAC	Limits blast radius
Observability	Export metrics, logs, and traces	Enables debugging

Kubernetes is a complex system, but its architecture is elegant — a declarative API backed by control loops that continuously reconcile desired state with actual state. Master the core abstractions (Pods, Services, Deployments), understand the control plane, and build from there. Pair Kubernetes with solid CI/CD pipelines and smart deployment strategies to create a platform that can run anything at any scale.

Kubernetes Architecture: Container Orchestration at Scale