Design a Distributed Cache That Handles Millions of Requests (2026)

A distributed cache doesn’t speed up your application; it allows your application to scale by offloading read-heavy workloads that would otherwise overwhelm your primary data store.

Let’s build a mental model for Redis, a popular choice. Imagine you have a popular e-commerce site. Every time a user views a product page, your backend application needs to fetch product details (name, price, description, image URL). If millions of users are doing this concurrently, your database will buckle under the strain.

Here’s where Redis comes in. We can store frequently accessed product data in Redis.

# redis.conf
port 6379
maxmemory 2gb
maxmemory-policy allkeys-lru

In this configuration:

port 6379 is the standard listening port for Redis.
maxmemory 2gb limits Redis to using a maximum of 2 gigabytes of RAM. This is crucial to prevent it from consuming all system memory.
maxmemory-policy allkeys-lru dictates what happens when maxmemory is reached. allkeys-lru means Redis will evict the least recently used keys across the entire dataset to make space for new data.

When a user requests a product page:

The application first checks Redis: GET product:12345.
Cache Hit: If the data is found in Redis, it’s returned instantly. This is the fast path.
Cache Miss: If the data is not in Redis, the application fetches it from the database.
Populate Cache: The application then writes this data into Redis: SET product:12345 '{"name": "Awesome Gadget", "price": 99.99, ...}' with an expiration time (TTL). This is the slower path, but it ensures subsequent requests for this product will be fast.

The "millions of requests" part is handled by distributing this cache across multiple Redis instances. We achieve this using a technique called sharding.

Consider a Redis cluster with 3 master nodes. A key like product:12345 will always be routed to the same master node. This routing is determined by a hash function applied to the key. The formula is typically hash(key) % N, where N is the number of master nodes. For example, if hash('product:12345') is 789 and we have 3 nodes, 789 % 3 = 0, so it goes to master node 0.

Here’s a simplified view of how your application client might implement sharding (this logic is often built into client libraries):

import redis
import hashlib

class ShardedRedisClient:
    def __init__(self, hosts):
        self.nodes = [redis.Redis(host=h, port=6379) for h in hosts]
        self.num_nodes = len(hosts)

    def get_node(self, key):
        hash_val = int(hashlib.md5(key.encode()).hexdigest(), 16)
        return self.nodes[hash_val % self.num_nodes]

    def get(self, key):
        node = self.get_node(key)
        return node.get(key)

    def set(self, key, value, ex=None):
        node = self.get_node(key)
        node.set(key, value, ex=ex)

# Usage:
client = ShardedRedisClient(['redis-node-1', 'redis-node-2', 'redis-node-3'])
product_data = client.get('product:12345')
if not product_data:
    # Fetch from DB, then:
    client.set('product:12345', json.dumps(db_data), ex=3600) # 1 hour TTL

This allows you to scale horizontally: add more Redis nodes to handle more data and more requests. Each node only needs to store a fraction of the total data.

The critical challenge with distributed systems like this is consistency. When you write data, how do you ensure that all clients see the latest version? If you set a TTL, the data will eventually disappear. What if a node goes down? This is where concepts like replication and consensus protocols (like Raft or Paxos, used in Redis Cluster) become vital for high availability and fault tolerance. Redis Cluster automatically handles sharding, failover, and rebalancing of keys when nodes are added or removed.

Most people understand that Redis is a key-value store used for caching. What they often overlook is the complexity of ensuring that writes are durable and that the system remains available even if network partitions occur or nodes fail. Redis Cluster addresses this by having master nodes replicate their data to slave nodes, so if a master fails, a slave can be promoted to take its place, ensuring minimal downtime. The client libraries then need to be aware of the cluster topology to route requests correctly, even during failovers.