Cache Stampede: Understanding and Solving the Thundering Herd Problem
A cache stampede (also called the thundering herd problem or dog-pile effect) occurs when a heavily-accessed cache entry expires and many concurrent requests simultaneously discover the cache miss, all rushing to the database to regenerate the same data. Instead of one database query, you get hundreds or thousands — potentially overwhelming your database and causing a cascading failure.
This is one of the most dangerous failure modes in high-traffic systems and a critical topic for caching architecture. This guide covers how stampedes happen, why they are dangerous, and the proven solutions to prevent them.
How a Cache Stampede Happens
Consider a product page on an e-commerce site that receives 10,000 requests per second. The product data is cached in Redis with a 5-minute TTL.
# Normal operation (cache hit):
# Request 1: GET product:1001 → Cache HIT → 0.5ms
# Request 2: GET product:1001 → Cache HIT → 0.5ms
# ...10,000 requests/second, all served from cache
# After TTL expires (cache stampede):
# Request 1: GET product:1001 → Cache MISS → Query DB (50ms)
# Request 2: GET product:1001 → Cache MISS → Query DB (50ms)
# Request 3: GET product:1001 → Cache MISS → Query DB (50ms)
# ...hundreds of requests in 50ms window, ALL hit the database
#
# Database receives 500+ simultaneous identical queries
# Database slows down → more requests pile up → cascade failure
The core issue is the gap between when the cache entry expires and when the first request repopulates it. During this window — even if it is just 50 milliseconds — every incoming request experiences a cache miss.
Why It Is Dangerous
- Database overload: Hundreds of identical queries hit the database simultaneously, far exceeding its capacity.
- Cascading failure: The overloaded database slows down, causing connection pool exhaustion, timeouts, and failures across all services that depend on it.
- Self-reinforcing: Slow database responses mean the repopulation takes longer, extending the stampede window.
- Amplified by hot keys: The most popular cache entries cause the worst stampedes because they have the highest request rates.
Solution 1: Locking (Mutex-Based)
The most straightforward solution: when a cache miss occurs, only one request is allowed to fetch from the database. All other requests wait for that single request to repopulate the cache.
import redis
import time
import json
r = redis.Redis()
def get_with_lock(key, ttl, fetch_fn, lock_timeout=5):
# Try cache first
value = r.get(key)
if value is not None:
return json.loads(value)
lock_key = f"lock:{key}"
# Try to acquire lock
if r.set(lock_key, "1", nx=True, ex=lock_timeout):
try:
# Winner: fetch from database and populate cache
data = fetch_fn()
r.setex(key, ttl, json.dumps(data))
return data
finally:
r.delete(lock_key)
else:
# Losers: wait and retry from cache
for _ in range(50): # Retry for up to 5 seconds
time.sleep(0.1)
value = r.get(key)
if value is not None:
return json.loads(value)
# Fallback: fetch from DB if lock holder failed
return fetch_fn()
# Usage
product = get_with_lock(
"product:1001",
ttl=300,
fetch_fn=lambda: db.get_product(1001)
)
Pros: Only one database query per cache miss. Simple to understand.
Cons: Adds latency for "loser" requests that must wait. Lock management adds complexity. If the lock holder crashes, others wait until lock timeout.
Solution 2: Probabilistic Early Expiration (PER)
Instead of all requests discovering the expiry at the same time, probabilistic early expiration randomly refreshes the cache before the actual TTL expires. As the expiry approaches, each request has an increasing probability of triggering a background refresh.
import random
import math
import time
import json
def get_with_early_expiry(key, ttl, fetch_fn, beta=1.0):
cached = r.get(key)
if cached is not None:
data = json.loads(cached)
expiry_time = data.get("_expiry", 0)
delta = data.get("_delta", 0) # Time to compute the value
# XFetch algorithm: probabilistically refresh before expiry
now = time.time()
random_factor = -delta * beta * math.log(random.random())
if now + random_factor >= expiry_time:
# Probabilistically chosen to refresh early
start = time.time()
fresh_data = fetch_fn()
compute_time = time.time() - start
cache_data = {
**fresh_data,
"_expiry": time.time() + ttl,
"_delta": compute_time
}
r.setex(key, ttl, json.dumps(cache_data))
return fresh_data
return {k: v for k, v in data.items() if not k.startswith("_")}
# True cache miss — compute and store
start = time.time()
fresh_data = fetch_fn()
compute_time = time.time() - start
cache_data = {
**fresh_data,
"_expiry": time.time() + ttl,
"_delta": compute_time
}
r.setex(key, ttl, json.dumps(cache_data))
return fresh_data
Pros: No locking, no waiting. Stampede is prevented proactively. Works well for high-traffic keys.
Cons: Some requests trigger unnecessary early refreshes. Requires storing metadata alongside cached values. More complex implementation.
Solution 3: Stale-While-Revalidate
Serve slightly stale data while refreshing the cache in the background. The cached entry has two TTLs: a "fresh" TTL and a "stale" TTL. After the fresh TTL expires, the entry is still served but a background refresh is triggered.
import threading
def get_with_stale_revalidate(key, fresh_ttl, stale_ttl, fetch_fn):
cached_raw = r.get(key)
if cached_raw is not None:
cached = json.loads(cached_raw)
stored_at = cached.get("_stored_at", 0)
now = time.time()
age = now - stored_at
if age < fresh_ttl:
# Still fresh — serve immediately
return cached["data"]
if age < fresh_ttl + stale_ttl:
# Stale but within grace period — serve stale, refresh async
revalidate_key = f"revalidating:{key}"
if r.set(revalidate_key, "1", nx=True, ex=10):
threading.Thread(
target=_background_refresh,
args=(key, fresh_ttl + stale_ttl, fetch_fn)
).start()
return cached["data"]
# True miss or beyond stale window — synchronous fetch
data = fetch_fn()
_store_with_timestamp(key, data, fresh_ttl + stale_ttl)
return data
def _background_refresh(key, total_ttl, fetch_fn):
data = fetch_fn()
_store_with_timestamp(key, data, total_ttl)
def _store_with_timestamp(key, data, ttl):
payload = {"data": data, "_stored_at": time.time()}
r.setex(key, ttl, json.dumps(payload))
Pros: Zero latency impact — users always get instant responses. Smoothly handles high traffic. Natural fit for CDN caching with the stale-while-revalidate header.
Cons: Users may see slightly stale data during the revalidation window. Requires background processing infrastructure.
Solution 4: External Cache Refresh (Proactive)
Instead of relying on user requests to trigger cache refreshes, a separate background worker proactively refreshes cache entries before they expire. This completely eliminates the stampede window.
import schedule
def refresh_popular_products():
popular_ids = db.query(
"SELECT id FROM products ORDER BY view_count DESC LIMIT 100"
)
for product_id in popular_ids:
product = db.get_product(product_id)
r.setex(
f"product:{product_id}",
600, # 10-minute TTL
json.dumps(product)
)
print(f"Refreshed {len(popular_ids)} product caches")
# Refresh every 8 minutes (before 10-minute TTL expires)
schedule.every(8).minutes.do(refresh_popular_products)
Pros: Completely eliminates stampedes for known hot keys. Simple and predictable.
Cons: Only works for predictable key sets. Does not handle long-tail keys. Requires maintaining a list of keys to refresh.
Solution Comparison
| Solution | Latency Impact | Complexity | Data Freshness | Best For |
|---|---|---|---|---|
| Locking | Moderate (waiters delayed) | Low | Always fresh | Moderate traffic, simple systems |
| Probabilistic Early Expiry | Minimal | Medium | Mostly fresh | High-traffic, latency-sensitive |
| Stale-While-Revalidate | None | Medium | Briefly stale | User-facing, high-traffic pages |
| Proactive Refresh | None | Low | Always fresh | Known hot keys, predictable data |
Real-World Scenarios
Flash Sale: An e-commerce site announces a flash sale at noon. At 12:00:00, millions of users hit the product page simultaneously. If the product cache expired at 11:59:55, a stampede of millions of database queries crashes the system. Solution: proactively refresh the cache before the sale and use stale-while-revalidate as a safety net.
Breaking News: A news aggregator caches article metadata with a 2-minute TTL. When a major story breaks, traffic spikes 100x. Every 2-minute expiry becomes a mini-stampede. Solution: use probabilistic early expiry with an aggressive beta parameter for high-traffic articles.
Microservice Dependencies: Service A caches responses from Service B. If Service B is slow (500ms per request) and the cache expires, 200 concurrent requests from Service A simultaneously call Service B, overwhelming it. Solution: use locking so only one request calls Service B; others wait for the cache to be repopulated.
Combining Solutions
The most robust production systems layer multiple solutions:
- Proactive refresh for known hot keys (prevents stampedes entirely for the top 1% of keys).
- Stale-while-revalidate as the default for all cached data (prevents stampedes for most requests).
- Locking as a fallback for true cache misses where no stale data exists (prevents stampedes on cold start).
Frequently Asked Questions
How do I detect cache stampedes in production?
Monitor these signals: sudden spikes in database query rate that correlate with cache expiry times, increased cache miss rates, and Redis memory drops (indicating mass expiry). Set up alerts on database connection pool utilization — a stampede often manifests as connection pool exhaustion. Use Redis INFO stats to track keyspace_misses over time.
Does cache stampede only happen with TTL-based expiration?
No. Stampedes can also occur after manual cache invalidation (e.g., purging a key when data updates), after cache server restarts (cold cache), or when eviction policies remove popular entries under memory pressure. Any event that causes a popular key to disappear can trigger a stampede.
Is the locking approach safe in a distributed environment?
Use Redis-based distributed locks (SETNX with expiry) rather than local mutexes. Local locks only prevent stampedes from a single application instance — in a distributed system with 20 instances, you still get 20 simultaneous database queries. A Redis-based lock ensures only one request across all instances fetches from the database. Set a reasonable lock timeout (5-10 seconds) to handle lock holder crashes.
What is the "beta" parameter in probabilistic early expiry?
Beta controls how aggressively the cache is refreshed before expiry. A higher beta means earlier refresh (more protection against stampede but more unnecessary refreshes). A lower beta means refresh closer to actual expiry (less wasted work but higher stampede risk). The default value of 1.0 works well for most workloads. For extremely high-traffic keys, use beta=2.0 or higher.