Latency Reduction: Techniques for Faster Distributed Systems
Latency is the time between a request being sent and the response being received. In distributed systems, latency compounds across multiple hops — a 50ms database query plus a 30ms API call plus 20ms of processing quickly adds up. Reducing latency improves user experience, conversion rates, and system throughput. This guide covers practical techniques for minimizing latency at every layer of your stack.
Understanding Latency Sources
| Source | Typical Latency | Optimization |
|---|---|---|
| DNS Resolution | 20-120ms | DNS caching, prefetch |
| TCP Handshake | 1 RTT (10-200ms) | Connection reuse, keep-alive |
| TLS Handshake | 1-2 RTTs (20-400ms) | TLS session resumption, HTTP/2 |
| Network Transit | 1-200ms (depends on distance) | CDN, edge computing, geo-routing |
| Server Processing | 5-500ms | Code optimization, caching |
| Database Query | 1-1000ms | Indexes, query optimization, caching |
CDN Strategies
Content Delivery Networks cache content at edge locations close to users, eliminating the round trip to your origin server.
# CloudFront distribution with caching rules
resource "aws_cloudfront_distribution" "web" {
origin {
domain_name = aws_lb.web.dns_name
origin_id = "web-origin"
custom_origin_config {
http_port = 80
https_port = 443
origin_protocol_policy = "https-only"
}
}
default_cache_behavior {
target_origin_id = "web-origin"
viewer_protocol_policy = "redirect-to-https"
allowed_methods = ["GET", "HEAD"]
cached_methods = ["GET", "HEAD"]
forwarded_values {
query_string = false
cookies { forward = "none" }
}
min_ttl = 0
default_ttl = 86400 # 24 hours
max_ttl = 31536000 # 1 year
compress = true # Enable gzip/brotli
}
# API routes bypass cache
ordered_cache_behavior {
path_pattern = "/api/*"
target_origin_id = "web-origin"
viewer_protocol_policy = "https-only"
allowed_methods = ["GET", "HEAD", "OPTIONS", "PUT",
"POST", "PATCH", "DELETE"]
min_ttl = 0
default_ttl = 0
max_ttl = 0
}
}
Connection Reuse
import requests
from urllib3.util.retry import Retry
from requests.adapters import HTTPAdapter
# Reuse connections with session
session = requests.Session()
adapter = HTTPAdapter(
pool_connections=10, # Connection pools per host
pool_maxsize=20, # Connections per pool
max_retries=Retry(total=3, backoff_factor=0.1)
)
session.mount("https://", adapter)
# All requests reuse TCP connections (keep-alive)
response1 = session.get("https://api.example.com/users") # New connection
response2 = session.get("https://api.example.com/orders") # Reuses connection
Compression
# Nginx compression configuration
gzip on;
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_types text/plain text/css application/json
application/javascript text/xml application/xml
application/xml+rss text/javascript;
gzip_min_length 256;
# Brotli (better compression than gzip)
brotli on;
brotli_comp_level 6;
brotli_types text/plain text/css application/json
application/javascript text/xml;
| Algorithm | Compression Ratio | Speed | Support |
|---|---|---|---|
| gzip | Good (60-80%) | Fast | Universal |
| Brotli | Better (70-85%) | Moderate | Modern browsers |
| zstd | Best ratio/speed | Very fast | Growing (server-to-server) |
Database Optimization
-- Identify slow queries
SELECT query, calls, mean_exec_time, total_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
-- Add indexes for common queries
CREATE INDEX idx_orders_customer_date
ON orders (customer_id, created_at DESC);
-- Use covering indexes to avoid table lookups
CREATE INDEX idx_orders_covering
ON orders (customer_id, created_at DESC)
INCLUDE (total_amount, status);
-- Denormalize for read-heavy paths
ALTER TABLE orders ADD COLUMN customer_name VARCHAR(100);
-- Update on write, avoid JOIN on read
Prefetching and Preloading
# Server-side prefetching
class ProductPageHandler:
async def handle(self, request):
product_id = request.params["id"]
# Fetch all needed data in parallel
product, reviews, related, inventory = await asyncio.gather(
self.product_service.get(product_id),
self.review_service.get_for_product(product_id),
self.recommendation_service.get_related(product_id),
self.inventory_service.check(product_id)
)
# Total latency = max(individual latencies) instead of sum
return render_template("product.html",
product=product, reviews=reviews,
related=related, inventory=inventory)
Measuring Latency
Always measure percentiles, not averages. The p50 (median) shows typical experience; p95 and p99 show worst-case experience for the tail users who are often your most engaged customers.
import time
from collections import defaultdict
import numpy as np
class LatencyTracker:
def __init__(self):
self.measurements = defaultdict(list)
def record(self, endpoint, duration_ms):
self.measurements[endpoint].append(duration_ms)
def report(self, endpoint):
data = self.measurements[endpoint]
return {
"p50": np.percentile(data, 50),
"p95": np.percentile(data, 95),
"p99": np.percentile(data, 99),
"mean": np.mean(data),
"count": len(data)
}
Latency reduction connects to performance optimization, edge computing, and geo-distribution strategies.
Frequently Asked Questions
Q: What is the biggest single latency reduction I can make?
Caching. A cache hit takes 1-5ms versus 50-500ms for a database query or external API call. If your system is read-heavy, adding a Redis cache in front of your database can reduce p95 latency by 80% or more.
Q: How does HTTP/2 reduce latency?
HTTP/2 allows multiplexing multiple requests over a single TCP connection, eliminating head-of-line blocking. It also supports server push and header compression. HTTP/3 (QUIC) goes further by using UDP, eliminating TCP head-of-line blocking entirely.
Q: What latency targets should I set?
For user-facing APIs: p50 < 100ms, p95 < 300ms, p99 < 1000ms. For internal service-to-service: p50 < 10ms, p99 < 100ms. Google research shows that a 200ms increase in search latency reduces queries by 0.6%. Amazon found that every 100ms of latency cost them 1% in revenue.