CouchDB’s shard rebalancing isn’t about moving data; it’s about moving pointers to data, which is why it’s so fast and can happen live.

Let’s see it in action. Imagine a freshly set up cluster with two nodes, node1 and node2. We’ll create a database, my_db, and add a few documents.

# On node1
curl -X PUT http://admin:password@localhost:5984/my_db

# Add some documents
curl -X PUT http://admin:password@localhost:5984/my_db/doc1 -d '{"name": "Alice"}'
curl -X PUT http://admin:password@localhost:5984/my_db/doc2 -d '{"name": "Bob"}'
curl -X PUT http://admin:password@localhost:5984/my_db/doc3 -d '{"name": "Charlie"}'

Now, let’s check the shard distribution. CouchDB automatically distributes shards, but with only two nodes and a small database, it might put all shards on one node initially.

# On node1
curl http://admin:password@localhost:5984/_dbs/my_db/_shards

You’ll likely see output like this, indicating all shards for my_db are on node1:

{
  "my_db": {
    "0000000000-fffffffffff": "node1"
  }
}

The goal of rebalancing is to distribute these shards more evenly, especially when adding new nodes or when existing nodes become overloaded. If we add node3 to our cluster and want to move some shards to it, we’d use the _rebalance endpoint.

# On node1 (or any node in the cluster)
curl -X POST http://admin:password@localhost:5984/_rebalance

This command doesn’t take any arguments. CouchDB itself determines how to best distribute the shards based on its internal algorithms and the current state of the cluster. It looks at which nodes have the most shards and tries to move some to nodes with fewer shards.

After running _rebalance, if you check the shards again:

# On node1
curl http://admin:password@localhost:5984/_dbs/my_db/_shards

You might now see a distribution like this, with one shard moved to node3:

{
  "my_db": {
    "0000000000-7ffffffffffff": "node1",
    "8000000000000-ffffffffffff": "node3"
  }
}

This is the essence of shard rebalancing: CouchDB intelligently redistributes the shard maps (which are just pointers to the actual data partitions) across the available nodes to achieve a more balanced load. The data itself doesn’t move; only the metadata that tells CouchDB where to find the shards is updated. This is why rebalancing is so fast and non-disruptive.

The core problem rebalancing solves is preventing "hot spots" – individual nodes becoming overwhelmed with requests because they hold a disproportionate number of shards. By distributing shards, CouchDB spreads the read and write load across more nodes, improving overall cluster performance and resilience. When you add a new node, running _rebalance is the mechanism to tell CouchDB, "Hey, there’s a new resource, use it!"

The internal mechanism for shard distribution is a bit more nuanced than just an even split. CouchDB uses a consistent hashing ring. Each shard is assigned a hash value, and nodes are also placed on this ring. Shards are assigned to the node that "owns" the segment of the ring containing the shard’s hash. When you add or remove nodes, only the assignments of shards near the affected segments change, minimizing disruption. The _rebalance command triggers a recalculation and migration of these shard assignments.

A common misconception is that rebalancing automatically moves all data. In reality, CouchDB’s distributed nature means data is already partitioned. Rebalancing is about moving the knowledge of which node is responsible for which partition. The actual data resides on disk, and when a node takes over a shard, it’s simply updating its internal routing tables to point to that shard’s data. If a node is removed, CouchDB will eventually rebalance those shards to other nodes, and those nodes will then fetch the data from the remaining copies.

After rebalancing your shards, you might notice that certain queries seem to perform slightly differently. This is because the query planner now has a new set of shard locations to consider, and it might adjust its strategy for fetching data from different nodes.

Want structured learning?

Take the full Couchdb course →