Redis Cluster is surprisingly resilient, but its high availability doesn’t come from replicating data to a single standby. Instead, it uses a system of sharding and master-less failover across multiple independent nodes.

Let’s see it in action. Imagine you have a simple Redis setup and you want to make it highly available. You’d start with a few Redis instances:

# Start node 1 on port 7000
redis-server --port 7000 --cluster-config-file nodes-7000.conf --cluster-save-config

# Start node 2 on port 7001
redis-server --port 7001 --cluster-config-file nodes-7001.conf --cluster-save-config

# Start node 3 on port 7002
redis-server --port 7002 --cluster-config-file nodes-7002.conf --cluster-save-config

Now, you need to tell these nodes they’re part of a cluster. You’ll need at least six nodes for a production-ready setup (three masters and three replicas), but for demonstration, we’ll start with three and add replicas later. First, create the cluster, assigning slots to masters:

redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 --cluster-replicas 0

This command will prompt you to confirm the slot distribution. Redis will divide its 16384 hash slots among the masters. For instance, node 7000 might get slots 0-5460, 7001 slots 5461-10922, and 7002 slots 10923-16383.

Once the cluster is created, you can add replicas to provide fault tolerance. If a master goes down, one of its replicas will be promoted to take its place.

redis-cli --cluster add-node 127.0.0.1:7003 127.0.0.1:7000 --cluster-slave --cluster-master-node <master_node_id>

You’d repeat this for each master, assigning a replica to it. The <master_node_id> is the unique ID of the master node, which you can find using redis-cli -c -p 7000 CLUSTER NODES.

The core problem Redis Cluster solves is scaling your cache beyond the memory of a single machine while maintaining availability. It achieves this by sharding your data across multiple master nodes. Each master is responsible for a subset of the 16384 hash slots. When a client connects, it’s routed to the correct node based on the hash of the key.

Internally, this routing is managed by a gossip protocol. Each node periodically exchanges information about cluster state, including which nodes are masters, which are replicas, and which hash slots each master owns. When a key is requested, if the client is on the wrong node, that node will respond with a MOVED redirect, telling the client the correct node to contact. Clients are expected to cache this slot-to-node mapping to avoid repeated redirects.

The actual levers you control are primarily in your redis.conf and the redis-cli --cluster commands. Key redis.conf settings include:

  • cluster-enabled yes: Essential to enable cluster mode.
  • cluster-config-file nodes-<port>.conf: This file is automatically managed by Redis and stores the cluster topology. Do not edit it manually.
  • cluster-port <port>: If you want your cluster bus port to be different from your client port (e.g., port 6379, cluster-port 16379). The cluster bus uses a separate port for inter-node communication.

The redis-cli --cluster commands are your primary interface for managing the cluster:

  • create: Initializes a new cluster.
  • add-node: Adds a new node (master or replica).
  • del-node: Removes a node.
  • reshard: Moves hash slots between masters, crucial for rebalancing.
  • failover: Manually initiates a failover for a specific master.

The most surprising aspect of Redis Cluster’s failover is that it’s not a primary-replica hot-standby system in the traditional sense. When a master node becomes unreachable for a configurable period (cluster-node-timeout), its replicas enter a "simulation" phase. They query other masters to see if they believe the master is truly down. If a majority of masters agree, one of the replicas is elected by a simple majority vote among the remaining masters to become the new master. This distributed consensus mechanism allows for failover without a central coordinator.

The next concept to grapple with is how to handle Redis Cluster when your application needs to scale writes beyond what a single master can handle, even with sharding.

Want structured learning?

Take the full Caching-strategies course →