Geo-Distribution: Multi-Region Deployment and Data Replication
Geo-distribution deploys your system across multiple geographic regions to reduce latency for global users, improve disaster recovery, and meet data residency requirements. While a single-region deployment might serve users within 50ms, users on the other side of the world experience 200-400ms latency due to the speed of light. Geo-distribution puts your services and data close to your users, wherever they are.
Why Geo-Distribute?
- Latency: Users in Tokyo should not wait for a round trip to US-East. Local regions serve requests in 10-30ms instead of 200ms+.
- Resilience: If one region goes down (natural disaster, cloud outage), other regions continue serving traffic.
- Compliance: GDPR, data sovereignty laws, and industry regulations may require data to stay within specific regions.
- Capacity: Distributing load across regions prevents any single region from becoming a bottleneck.
Deployment Patterns
| Pattern | Writes | Reads | Complexity | Consistency |
|---|---|---|---|---|
| Active-Passive | One region | All regions | Low | Strong (for writes) |
| Active-Active | All regions | All regions | High | Eventually consistent |
| Read-Local, Write-Global | Routed to primary | Local replicas | Medium | Read lag possible |
| Partitioned by Region | Region owns its data | Region owns its data | Medium | Strong (per region) |
Data Replication Strategies
Synchronous Replication
# Synchronous replication: write waits for all replicas
# Guarantees: Strong consistency
# Trade-off: High write latency (sum of cross-region round trips)
def sync_write(key, value, regions):
results = []
for region in regions:
result = region.write(key, value) # Waits for each
results.append(result)
if all(r.success for r in results):
return {"status": "committed"}
else:
# Rollback on failure
for region in regions:
region.rollback(key)
return {"status": "failed"}
# Latency: max(region_latencies)
# For US-East to EU-West: ~80ms, to AP-Southeast: ~200ms
# Total write latency: ~200ms (dominated by farthest region)
Asynchronous Replication
# Asynchronous replication: write returns immediately after local commit
# Guarantees: Eventual consistency
# Trade-off: Replication lag, potential data loss on region failure
def async_write(key, value, primary_region, replica_regions):
# Write to primary
result = primary_region.write(key, value)
# Asynchronously replicate to other regions
for region in replica_regions:
replication_queue.enqueue({
"key": key,
"value": value,
"source_region": primary_region.name,
"target_region": region.name
})
return result # Returns in ~5ms (local write only)
Latency-Based DNS Routing
# AWS Route 53 latency-based routing (Terraform)
resource "aws_route53_record" "api_us" {
zone_id = aws_route53_zone.main.zone_id
name = "api.example.com"
type = "A"
set_identifier = "us-east-1"
alias {
name = aws_lb.us_east.dns_name
zone_id = aws_lb.us_east.zone_id
evaluate_target_health = true
}
latency_routing_policy {
region = "us-east-1"
}
}
resource "aws_route53_record" "api_eu" {
zone_id = aws_route53_zone.main.zone_id
name = "api.example.com"
type = "A"
set_identifier = "eu-west-1"
alias {
name = aws_lb.eu_west.dns_name
zone_id = aws_lb.eu_west.zone_id
evaluate_target_health = true
}
latency_routing_policy {
region = "eu-west-1"
}
}
Conflict Resolution in Geo-Distributed Systems
In active-active deployments, two users in different regions can modify the same data simultaneously. Conflict resolution strategies include:
- Last-Writer-Wins (LWW): Use timestamps to pick the latest write. Simple but can lose data.
- CRDTs: Conflict-free data types that automatically merge (see data sync).
- Region ownership: Each data item is owned by one region; writes are routed to the owner.
- Application-level merge: Return conflicting versions to the application for custom resolution.
Cross-Region Data Patterns
# DynamoDB Global Tables (active-active multi-region)
resource "aws_dynamodb_table" "users" {
name = "users"
billing_mode = "PAY_PER_REQUEST"
hash_key = "user_id"
attribute {
name = "user_id"
type = "S"
}
replica {
region_name = "us-east-1"
}
replica {
region_name = "eu-west-1"
}
replica {
region_name = "ap-southeast-1"
}
}
# DynamoDB automatically replicates writes across all regions
# Uses last-writer-wins with version numbers for conflict resolution
Geo-distribution connects to multi-region architecture, consistent hashing for data distribution, and latency reduction. For conflict handling, see vector clocks and data sync patterns.
Frequently Asked Questions
Q: How many regions should I deploy to?
Start with 2 regions (primary + DR) for resilience. Add a third for truly global coverage. Most applications serve 90% of users from 2-3 regions. Each additional region adds operational complexity and cost. Only add regions where you have significant user traffic or compliance requirements.
Q: How do I handle database migrations across regions?
Use managed services with built-in replication (DynamoDB Global Tables, CockroachDB, Azure Cosmos DB). For relational databases, use logical replication (PostgreSQL) or change data capture. Schema migrations should be backward-compatible and rolled out to all regions simultaneously to avoid incompatibilities.
Q: What is the cost of geo-distribution?
Major costs include: compute in each region, cross-region data transfer (often $0.02-0.09/GB), replicated storage, and operational overhead. Cross-region data transfer is typically the largest surprise cost. Minimize it by replicating only necessary data and compressing replication streams.
Q: Active-active or active-passive — which should I choose?
Active-passive is simpler and sufficient for disaster recovery. Active-active provides lower latency for all users but introduces write conflict complexity. Start with active-passive. Move to active-active only when latency requirements demand it and you have the engineering capacity to handle conflict resolution.