Thundering Herd Problem: Fix Cache Stampedes Under Load (2026)

When a popular resource is requested by many clients simultaneously after a cache miss, all those clients can overwhelm the origin server, causing a "cache stampede" or "thundering herd" problem.

This isn’t just about a few extra requests; it’s about a cascade where every client waiting for the cache to be repopulated decides to hit the origin at the exact same moment. The origin, expecting a single request to refresh the cache, is suddenly hammered by hundreds or thousands. This overload can cause the origin to crash, become unresponsive, or significantly slow down, leading to even longer wait times and a worse user experience. The problem is exacerbated by the fact that the cache entry is now stale, and the lock mechanism designed to prevent this is either absent or ineffective.

Here are the common causes and how to fix them:

1. No Cache Expiration Lock: The most basic cause is a cache that simply expires without any mechanism to serialize requests. When the TTL (Time To Live) hits zero, every client sees it as a cache miss and tries to fetch.

Diagnosis: Observe cache logs or network traffic. You’ll see multiple identical requests for the same resource hitting the origin server within milliseconds of each other after a predictable interval.

Fix: Implement a locking mechanism. When the first request detects a cache miss for an expired item, it should acquire a lock (e.g., a flag in a distributed cache like Redis, or a database record). This lock indicates that a refresh is in progress. Subsequent requests for the same item see the lock and wait or return a stale response. Once the refresh is complete, the lock is released.

Example (Conceptual Redis Lock):

import redis
import time

r = redis.Redis(host='localhost', port=6379, db=0)
cache_key = "my_resource_data"
lock_key = f"{cache_key}:lock"
lock_timeout = 10 # seconds

def get_or_refresh_resource():
    data = r.get(cache_key)
    if data:
        return data

    # Attempt to acquire lock
    lock_acquired = r.set(lock_key, "locked", nx=True, ex=lock_timeout)

    if lock_acquired:
        try:
            # Data is still not in cache (or was deleted between get and set)
            # Fetch from origin
            print("Lock acquired. Fetching from origin...")
            origin_data = fetch_from_origin_server()
            r.set(cache_key, origin_data)
            return origin_data
        finally:
            r.delete(lock_key) # Release lock
    else:
        # Lock not acquired, another process is refreshing. Wait and retry.
        print("Lock not acquired. Waiting and retrying...")
        time.sleep(0.1) # Short sleep before retrying
        return get_or_refresh_resource() # Recursive retry

# In your application:
# resource_data = get_or_refresh_resource()

Why it works: The lock ensures only one process fetches from the origin at a time. Others either wait for the lock to be released or, in some implementations, receive the stale data while the refresh happens in the background.

2. Ineffective Lock Timeout: If the lock acquired by the refreshing process has a timeout that is too short, and the origin request takes longer than that timeout, the lock will expire. This allows other clients to then attempt to acquire the lock and fetch from the origin simultaneously.

Diagnosis: In your lock logs (e.g., Redis MONITOR command), you’ll see the lock key being deleted due to expiration (EXPIRE command followed by DEL implicitly) before the cache entry is successfully updated.
Fix: Increase the lock timeout to a value comfortably longer than the expected maximum time to fetch from the origin and update the cache. For example, if your origin typically responds in 2 seconds, set the lock timeout to 15-30 seconds.
- Example (Redis SET with EX):
```
# In the lock acquisition part of the code:
lock_timeout_seconds = 30
lock_acquired = r.set(lock_key, "locked", nx=True, ex=lock_timeout_seconds)
```
Why it works: A longer lock timeout prevents premature lock release, ensuring the refresh process has sufficient time to complete without other clients mistakenly believing the refresh is no longer in progress.

3. Stale-While-Revalidate Strategy Not Implemented: Many systems miss the opportunity to serve stale data while a refresh is happening. Instead, they make clients wait for the new data.

Diagnosis: Clients requesting an expired cache item consistently experience a delay equal to the origin fetch time, rather than getting an immediate (albeit stale) response.

Fix: Implement a "stale-while-revalidate" pattern. When a cache miss occurs and a lock is acquired for refresh:

If there’s a stale version of the data in the cache, return that immediately.
Then, in the background, perform the origin fetch and update the cache.
If there’s no stale data, then the client must wait for the origin fetch.

Example (Conceptual):

def get_or_refresh_stale_while_revalidate(cache_key, origin_fetch_func):
    stale_data = cache.get(f"{cache_key}:stale") # Assume stale data is stored separately
    data = cache.get(cache_key)

    if data:
        return data # Fresh data available

    if stale_data:
        # Return stale data immediately and refresh in background
        thread = threading.Thread(target=refresh_cache_background, args=(cache_key, origin_fetch_func))
        thread.start()
        return stale_data

    # No fresh or stale data, must wait for refresh
    return refresh_cache_blocking(cache_key, origin_fetch_func)

def refresh_cache_background(cache_key, origin_fetch_func):
    lock_key = f"{cache_key}:lock"
    if redis_client.set(lock_key, "locked", nx=True, ex=30): # Use a lock
        try:
            origin_data = origin_fetch_func()
            cache.set(cache_key, origin_data) # Update fresh cache
            cache.delete(f"{cache_key}:stale") # Remove stale if present
        finally:
            redis_client.delete(lock_key)

def refresh_cache_blocking(cache_key, origin_fetch_func):
    lock_key = f"{cache_key}:lock"
    if redis_client.set(lock_key, "locked", nx=True, ex=30): # Use a lock
        try:
            origin_data = origin_fetch_func()
            cache.set(cache_key, origin_data)
            return origin_data
        finally:
            redis_client.delete(lock_key)
    else:
        # If lock is already held, wait a bit and retry blocking fetch
        time.sleep(0.1)
        return refresh_cache_blocking(cache_key, origin_fetch_func)

Why it works: Serving stale data immediately drastically reduces perceived latency for most users. The background refresh happens without blocking the initial request, significantly improving the user experience during cache churn.

4. Distributed Cache Inconsistency: In a distributed cache environment, multiple nodes might independently decide a cache entry has expired and attempt to refresh it simultaneously if the locking mechanism isn’t correctly synchronized across all nodes.

Diagnosis: You’ll see stampedes even when a locking mechanism appears to be in place. This often points to the lock itself not being truly atomic or globally consistent across all cache instances. For example, if each application server has its own Redis instance and they are not sharded or clustered properly for locking.
Fix: Use a centralized, highly available distributed cache system (like Redis Cluster, Memcached with consistent hashing, or a dedicated distributed locking service like ZooKeeper/etcd) for managing the locks. Ensure your lock keys are distributed consistently, so only one client, regardless of which cache node it interacts with, can acquire the lock.
- Example: Ensure all your application servers are connecting to the same Redis cluster or Sentinel setup for lock management. If using Redis, SET ... NX EX ... is atomic and works across nodes in a cluster if the key lands on the same shard.
Why it works: A single, authoritative source for lock acquisition prevents race conditions between different application instances trying to refresh the same cache item.

5. Network Latency or Temporary Origin Unavailability: A slow or intermittently unavailable origin server can cause refresh operations to take longer than expected, exceeding lock timeouts or simply causing requests to queue up.

Diagnosis: Monitoring of origin server response times shows significant spikes or long tail latencies, correlating with cache stampede events. Network monitoring might show packet loss or high latency to the origin.
Fix:
- Origin Improvement: Optimize the origin server’s performance. This could involve database indexing, query optimization, more efficient code, or scaling up the origin infrastructure.
- Retry Logic: Implement robust retry mechanisms on the origin server side for its own dependencies, and on the cache refresh side with exponential backoff, so that a transient issue doesn’t lead to a failed refresh that immediately triggers another attempt.
- Circuit Breaker: Implement a circuit breaker pattern. If the origin consistently fails or times out, the cache layer can stop attempting refreshes for a period, returning stale data or an error gracefully, preventing the stampede from continually hitting a failing service.
Why it works: Addressing the root cause of origin slowness or unavailability reduces the duration of cache misses and makes the refresh process more reliable, thus reducing the likelihood of exceeding lock timeouts or causing cascading failures.

6. Cache TTL Too Short or Unpredictable: Setting a very short TTL, or a TTL that is highly variable due to clock drift across servers, can increase the frequency of cache misses and thus the opportunity for stampedes.

Diagnosis: Cache hit ratio is low, and cache expiration events are happening very frequently. Server clocks might show minor but consistent differences.
Fix: Increase the TTL to a reasonable value that balances freshness with cache efficiency. Ensure all servers involved in caching and locking have their clocks synchronized using NTP (Network Time Protocol).
- Example (NTP Configuration on Linux):
```
sudo timedatectl set-ntp true
sudo systemctl enable ntp
sudo systemctl start ntp
```
- Cache Config: Adjust max-age or Expires headers in your application or web server configuration. For example, setting Cache-Control: public, max-age=3600 for a 1-hour cache.
Why it works: A longer TTL means fewer cache misses over time. NTP synchronization ensures that cache expiration and lock acquisition/release times are consistent across all nodes, preventing race conditions due to timing discrepancies.

After fixing these issues, the next problem you’ll likely encounter is managing cache invalidation when the underlying data changes before the TTL expires.