Skip to main content
📈Scalability

Horizontal Scaling: Building Systems That Grow Outward

Horizontal scaling (scaling out) adds more machines to handle increased load, as opposed to vertical scaling (scaling up) which adds more power to a single...

📖 5 min read

Horizontal Scaling: Building Systems That Grow Outward

Horizontal scaling (scaling out) adds more machines to handle increased load, as opposed to vertical scaling (scaling up) which adds more power to a single machine. Horizontal scaling is the foundation of modern distributed systems because it offers near-unlimited growth potential, better fault tolerance, and cost efficiency. This guide covers the key strategies for designing horizontally scalable systems.

Horizontal vs Vertical Scaling

Aspect Horizontal (Scale Out) Vertical (Scale Up)
Approach Add more machines Upgrade existing machine
Upper Limit Practically unlimited Hardware ceiling
Fault Tolerance High — one node failure is survivable Single point of failure
Complexity Higher (distributed system challenges) Lower (single machine)
Cost Linear with many cheap machines Exponential for high-end hardware
Downtime Zero — add nodes while running May require restart

Designing Stateless Services

The first requirement for horizontal scaling is statelessness. A stateless service does not store any client-specific data between requests. Any instance can handle any request, making it trivial to add or remove instances.

# Stateful (BAD for horizontal scaling)
class StatefulServer:
    def __init__(self):
        self.sessions = {}  # stored in memory

    def handle_request(self, request):
        session = self.sessions.get(request.session_id)
        # If this request hits a different instance, session is lost!

# Stateless (GOOD for horizontal scaling)
class StatelessServer:
    def __init__(self, redis_client):
        self.redis = redis_client  # external session store

    def handle_request(self, request):
        session = self.redis.get(f"session:{request.session_id}")
        # Any instance can serve this request

Session Management Strategies

Strategy How It Works Pros Cons
Sticky Sessions Load balancer routes user to same server Simple, no shared state Uneven load, data loss on failure
Distributed Sessions Sessions stored in Redis/Memcached Even load distribution External dependency, latency
JWT Tokens Session data encoded in token No server-side state Token size, cannot invalidate easily
import jwt
import redis
from datetime import datetime, timedelta

# JWT-based stateless authentication
def create_jwt_token(user_id, roles):
    payload = {
        "sub": user_id,
        "roles": roles,
        "exp": datetime.utcnow() + timedelta(hours=1),
        "iat": datetime.utcnow()
    }
    return jwt.encode(payload, SECRET_KEY, algorithm="HS256")

def verify_request(request):
    token = request.headers.get("Authorization", "").replace("Bearer ", "")
    try:
        payload = jwt.decode(token, SECRET_KEY, algorithms=["HS256"])
        return payload  # Any server can verify this
    except jwt.InvalidTokenError:
        raise AuthenticationError("Invalid token")

Database Scaling

Read Replicas

class DatabaseRouter:
    def __init__(self, primary, replicas):
        self.primary = primary
        self.replicas = replicas
        self.replica_index = 0

    def get_connection(self, operation="read"):
        if operation == "write":
            return self.primary
        # Round-robin across read replicas
        replica = self.replicas[self.replica_index % len(self.replicas)]
        self.replica_index += 1
        return replica

# Usage
router = DatabaseRouter(
    primary="postgres://primary:5432/app",
    replicas=[
        "postgres://replica1:5432/app",
        "postgres://replica2:5432/app",
        "postgres://replica3:5432/app"
    ]
)

Database Sharding

class ShardRouter:
    def __init__(self, shard_configs):
        self.shards = shard_configs
        self.num_shards = len(shard_configs)

    def get_shard(self, user_id):
        shard_index = hash(user_id) % self.num_shards
        return self.shards[shard_index]

    def query(self, user_id, sql, params):
        shard = self.get_shard(user_id)
        return shard.execute(sql, params)

    def query_all_shards(self, sql, params):
        """For queries that span all shards (expensive)."""
        results = []
        for shard in self.shards:
            results.extend(shard.execute(sql, params))
        return results

Load Balancing Strategies

Algorithm Description Best For
Round Robin Distribute requests evenly in order Homogeneous servers, equal request cost
Weighted Round Robin More traffic to more powerful servers Heterogeneous server sizes
Least Connections Route to server with fewest active connections Variable request processing times
Consistent Hashing Hash-based routing to specific servers Caching, stateful services

Auto-Scaling Groups

# AWS Auto Scaling Group configuration (Terraform)
resource "aws_autoscaling_group" "web_servers" {
  name                = "web-asg"
  min_size            = 2
  max_size            = 20
  desired_capacity    = 4
  vpc_zone_identifier = var.private_subnet_ids

  launch_template {
    id      = aws_launch_template.web.id
    version = "$Latest"
  }

  target_group_arns = [aws_lb_target_group.web.arn]

  tag {
    key                 = "Name"
    value               = "web-server"
    propagate_at_launch = true
  }
}

resource "aws_autoscaling_policy" "scale_up" {
  name                   = "scale-up"
  autoscaling_group_name = aws_autoscaling_group.web_servers.name
  policy_type            = "TargetTrackingScaling"

  target_tracking_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ASGAverageCPUUtilization"
    }
    target_value = 70.0
  }
}

Horizontal scaling connects to auto-scaling for automatic capacity management, consistent hashing for data distribution, and load testing for validating scaling behavior. Use the System Design Calculator to estimate your scaling requirements.

Frequently Asked Questions

Q: When should I scale vertically instead of horizontally?

Scale vertically when your application is not easily parallelizable (single-threaded workloads, in-memory computation that needs large RAM), when the operational complexity of distributed systems is not justified, or as a quick fix while planning a horizontal scaling architecture.

Q: How do I handle database scaling with horizontal application scaling?

Start with read replicas for read-heavy workloads. Add connection pooling (PgBouncer, ProxySQL) to handle more application instances. For write-heavy workloads, consider sharding by a partition key. For maximum scalability, use a distributed database like CockroachDB, Vitess, or DynamoDB.

Q: What is the maximum number of instances I should run?

There is no universal limit, but consider: load balancer connection limits, database connection limits, deployment time (rolling updates take longer with more instances), and cost. Most applications scale well to 50-100 instances. Beyond that, consider microservice decomposition or architectural changes.

Q: How do I handle file storage in a horizontally scaled system?

Never store files on local disk of application servers. Use object storage (S3, Azure Blob, GCS) for file persistence. For temporary processing, use shared file systems (EFS, Azure Files) or process files in memory. This ensures any instance can serve any request regardless of which instance received the upload.

Related Articles