Skip to main content
📈Scalability

Geo-Distribution: Multi-Region Deployment and Data Replication

Geo-distribution deploys your system across multiple geographic regions to reduce latency for global users, improve disaster recovery, and meet data reside...

📖 4 min read

Geo-Distribution: Multi-Region Deployment and Data Replication

Geo-distribution deploys your system across multiple geographic regions to reduce latency for global users, improve disaster recovery, and meet data residency requirements. While a single-region deployment might serve users within 50ms, users on the other side of the world experience 200-400ms latency due to the speed of light. Geo-distribution puts your services and data close to your users, wherever they are.

Why Geo-Distribute?

  • Latency: Users in Tokyo should not wait for a round trip to US-East. Local regions serve requests in 10-30ms instead of 200ms+.
  • Resilience: If one region goes down (natural disaster, cloud outage), other regions continue serving traffic.
  • Compliance: GDPR, data sovereignty laws, and industry regulations may require data to stay within specific regions.
  • Capacity: Distributing load across regions prevents any single region from becoming a bottleneck.

Deployment Patterns

Pattern Writes Reads Complexity Consistency
Active-Passive One region All regions Low Strong (for writes)
Active-Active All regions All regions High Eventually consistent
Read-Local, Write-Global Routed to primary Local replicas Medium Read lag possible
Partitioned by Region Region owns its data Region owns its data Medium Strong (per region)

Data Replication Strategies

Synchronous Replication

# Synchronous replication: write waits for all replicas
# Guarantees: Strong consistency
# Trade-off: High write latency (sum of cross-region round trips)

def sync_write(key, value, regions):
    results = []
    for region in regions:
        result = region.write(key, value)  # Waits for each
        results.append(result)
    if all(r.success for r in results):
        return {"status": "committed"}
    else:
        # Rollback on failure
        for region in regions:
            region.rollback(key)
        return {"status": "failed"}

# Latency: max(region_latencies)
# For US-East to EU-West: ~80ms, to AP-Southeast: ~200ms
# Total write latency: ~200ms (dominated by farthest region)

Asynchronous Replication

# Asynchronous replication: write returns immediately after local commit
# Guarantees: Eventual consistency
# Trade-off: Replication lag, potential data loss on region failure

def async_write(key, value, primary_region, replica_regions):
    # Write to primary
    result = primary_region.write(key, value)

    # Asynchronously replicate to other regions
    for region in replica_regions:
        replication_queue.enqueue({
            "key": key,
            "value": value,
            "source_region": primary_region.name,
            "target_region": region.name
        })

    return result  # Returns in ~5ms (local write only)

Latency-Based DNS Routing

# AWS Route 53 latency-based routing (Terraform)
resource "aws_route53_record" "api_us" {
  zone_id        = aws_route53_zone.main.zone_id
  name           = "api.example.com"
  type           = "A"
  set_identifier = "us-east-1"

  alias {
    name                   = aws_lb.us_east.dns_name
    zone_id                = aws_lb.us_east.zone_id
    evaluate_target_health = true
  }

  latency_routing_policy {
    region = "us-east-1"
  }
}

resource "aws_route53_record" "api_eu" {
  zone_id        = aws_route53_zone.main.zone_id
  name           = "api.example.com"
  type           = "A"
  set_identifier = "eu-west-1"

  alias {
    name                   = aws_lb.eu_west.dns_name
    zone_id                = aws_lb.eu_west.zone_id
    evaluate_target_health = true
  }

  latency_routing_policy {
    region = "eu-west-1"
  }
}

Conflict Resolution in Geo-Distributed Systems

In active-active deployments, two users in different regions can modify the same data simultaneously. Conflict resolution strategies include:

  • Last-Writer-Wins (LWW): Use timestamps to pick the latest write. Simple but can lose data.
  • CRDTs: Conflict-free data types that automatically merge (see data sync).
  • Region ownership: Each data item is owned by one region; writes are routed to the owner.
  • Application-level merge: Return conflicting versions to the application for custom resolution.

Cross-Region Data Patterns

# DynamoDB Global Tables (active-active multi-region)
resource "aws_dynamodb_table" "users" {
  name         = "users"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "user_id"

  attribute {
    name = "user_id"
    type = "S"
  }

  replica {
    region_name = "us-east-1"
  }

  replica {
    region_name = "eu-west-1"
  }

  replica {
    region_name = "ap-southeast-1"
  }
}
# DynamoDB automatically replicates writes across all regions
# Uses last-writer-wins with version numbers for conflict resolution

Geo-distribution connects to multi-region architecture, consistent hashing for data distribution, and latency reduction. For conflict handling, see vector clocks and data sync patterns.

Frequently Asked Questions

Q: How many regions should I deploy to?

Start with 2 regions (primary + DR) for resilience. Add a third for truly global coverage. Most applications serve 90% of users from 2-3 regions. Each additional region adds operational complexity and cost. Only add regions where you have significant user traffic or compliance requirements.

Q: How do I handle database migrations across regions?

Use managed services with built-in replication (DynamoDB Global Tables, CockroachDB, Azure Cosmos DB). For relational databases, use logical replication (PostgreSQL) or change data capture. Schema migrations should be backward-compatible and rolled out to all regions simultaneously to avoid incompatibilities.

Q: What is the cost of geo-distribution?

Major costs include: compute in each region, cross-region data transfer (often $0.02-0.09/GB), replicated storage, and operational overhead. Cross-region data transfer is typically the largest surprise cost. Minimize it by replicating only necessary data and compressing replication streams.

Q: Active-active or active-passive — which should I choose?

Active-passive is simpler and sufficient for disaster recovery. Active-active provides lower latency for all users but introduces write conflict complexity. Start with active-passive. Move to active-active only when latency requirements demand it and you have the engineering capacity to handle conflict resolution.

Related Articles