Caching can make your application feel instantaneous, but its actual impact is often wildly overestimated because the cost of a cache miss is so much higher than the benefit of a hit.

Let’s see what happens when we hit a cache and when we don’t, using a simple Redis-backed cache for user profile data.

Imagine we have a service that fetches user profiles. Without caching, a request might look like this:

import redis
import json

r = redis.Redis(host='localhost', port=6379, db=0)

def get_user_profile_uncached(user_id):
    # Simulate a slow database lookup
    print(f"Fetching profile for user {user_id} from DB...")
    import time
    time.sleep(0.5) # Simulate 500ms DB latency
    profile_data = {"user_id": user_id, "name": f"User {user_id}", "email": f"user{user_id}@example.com"}
    print("DB fetch complete.")
    return profile_data

# --- Benchmark ---
start_time = time.time()
profile = get_user_profile_uncached(123)
end_time = time.time()
print(f"Uncached request took: {end_time - start_time:.4f} seconds")
print(f"Profile: {profile}")

Running this, you’d see:

Fetching profile for user 123 from DB...
DB fetch complete.
Uncached request took: 0.5012 seconds
Profile: {'user_id': 123, 'name': 'User 123', 'email': 'user123@example.com'}

Now, let’s add Redis caching. We’ll cache the profile for 60 seconds.

import redis
import json
import time

r = redis.Redis(host='localhost', port=6379, db=0)

def get_user_profile_cached(user_id):
    cache_key = f"user_profile:{user_id}"
    cached_profile = r.get(cache_key)

    if cached_profile:
        print(f"Cache hit for user {user_id}!")
        return json.loads(cached_profile)
    else:
        print(f"Cache miss for user {user_id}. Fetching from DB...")
        # Simulate a slow database lookup
        time.sleep(0.5) # Simulate 500ms DB latency
        profile_data = {"user_id": user_id, "name": f"User {user_id}", "email": f"user{user_id}@example.com"}
        print("DB fetch complete.")

        # Cache the data for 60 seconds
        r.setex(cache_key, 60, json.dumps(profile_data))
        print(f"Profile for user {user_id} cached.")
        return profile_data

# --- Benchmark ---

# First request (cache miss)
start_time = time.time()
profile1 = get_user_profile_cached(123)
end_time = time.time()
print(f"First request took: {end_time - start_time:.4f} seconds")
print(f"Profile: {profile1}")

print("-" * 20)

# Second request (cache hit)
start_time = time.time()
profile2 = get_user_profile_cached(123)
end_time = time.time()
print(f"Second request took: {end_time - start_time:.4f} seconds")
print(f"Profile: {profile2}")

Running this for the first time (a cache miss):

Cache miss for user 123. Fetching from DB...
DB fetch complete.
Profile for user 123 cached.
First request took: 0.5021 seconds
Profile: {'user_id': 123, 'name': 'User 123', 'email': 'user123@example.com'}
--------------------
Cache hit for user 123!
Second request took: 0.0015 seconds
Profile: {'user_id': 123, 'name': 'User 123', 'email': 'user123@example.com'}

The first request, with the cache miss, still takes about 500ms because it has to go to the simulated database. The second request, however, is nearly instantaneous, taking only about 1.5 milliseconds. This is the power of a cache hit.

Here’s how it works internally:

  1. Cache Check: When get_user_profile_cached is called, it first constructs a unique cache_key for the requested user_id.
  2. Redis GET: It then sends a GET command to Redis. This is a very fast network operation.
  3. Cache Hit: If Redis finds the cache_key, it returns the stored value immediately. The application then deserializes this value (using json.loads) and returns it. This entire process (Redis GET + JSON loads) is what we see as the ~1.5ms.
  4. Cache Miss: If Redis doesn’t find the cache_key, it returns None. The application then proceeds to the "slow path" – simulating a database lookup.
  5. DB Fetch & Cache SET: After fetching the data from the simulated DB, the application serializes it (using json.dumps) and sends a SETEX command to Redis. SETEX stands for "SET with EXpiry". It stores the data and sets a Time-To-Live (TTL) of 60 seconds. After this, the data will be automatically removed from Redis.

The key levers you control are:

  • Cache Key Strategy: How you construct your keys determines what is considered a unique item to cache. A good strategy ensures that identical requests map to the same key.
  • Cache Expiry (TTL): This is the 60 in r.setex(cache_key, 60, ...). It dictates how long data stays fresh in the cache. Too short, and you get too many cache misses. Too long, and users might see stale data.
  • Serialization/Deserialization: How you convert your application objects to bytes for storage in Redis and back again. JSON is common but can be slower than binary formats like MessagePack or Protocol Buffers for very large objects.
  • Cache Client Configuration: Parameters like connection pooling, timeouts for Redis operations, and the choice of Redis commands (GET, MGET, HGETALL, etc.) all impact performance.

The actual performance gain isn’t just the difference between 500ms and 1.5ms. It’s about how many times you avoid that 500ms operation. If 99% of your requests are cache hits, your average latency will be very close to the cache hit latency, not the database latency.

When you implement caching, you’re essentially trading increased complexity and a small amount of memory for a significant reduction in latency and load on your primary data stores. The crucial insight is that the cost of a cache miss includes not just the time to fetch from the origin, but also the opportunity cost of those requests that could have been hits.

If your cache invalidation strategy relies on periodically clearing the entire cache, you might be surprised to find that after a cache flush, your system experiences a "thundering herd" problem where all requests simultaneously miss the cache and hit the database, causing a massive spike in load.

Want structured learning?

Take the full Caching-strategies course →