CouchDB’s revision history, often seen as a complex maze of _rev values, is actually a core mechanism for its eventual consistency and conflict resolution.
Let’s see it in action. Imagine we have a document representing a user profile.
{
"_id": "user_alice",
"_rev": "1-a1b2c3d4e5f678901234567890abcdef",
"name": "Alice",
"email": "alice@example.com"
}
Now, Alice updates her email. CouchDB doesn’t overwrite the old document; it creates a new revision.
{
"_id": "user_alice",
"_rev": "2-9876543210fedcba09876543210fedcba",
"name": "Alice",
"email": "alice.smith@example.com"
}
The _rev is a cryptographic hash. The first part (1-, 2-) is the revision number, and the second part is the hash of the document’s content plus the previous revision’s hash. This creates a linked list of changes. If two clients edit the same document simultaneously, CouchDB will detect a conflict.
{
"_id": "user_alice",
"_rev": "3-abcdef1234567890abcdef1234567890ab",
"name": "Alice",
"email": "alice.smith@example.com"
}
And on another replica:
{
"_id": "user_alice",
"_rev": "3-0987654321fedcba09876543210fedcba",
"name": "Alice Smith",
"email": "alice.smith@example.com"
}
Notice the _rev values are different at the beginning of the hash (abcdef... vs 098765...). This indicates a conflict. CouchDB doesn’t automatically resolve this; it stores both versions and flags them. Your application logic then needs to decide which version is the "winner" or merge them. When you update a document that has conflicts, you must specify which winning revision you are basing your update on.
The problem this solves is maintaining data integrity and availability in a distributed, eventually consistent system. By tracking every change as a new revision, CouchDB can reconstruct the history of a document, detect and present conflicts, and ensure that all nodes eventually agree on the state of the data. The _rev is your guarantee of "what version of this document am I looking at?" and "is this the most up-to-date version based on what I know?".
The default behavior is to keep all revisions forever. This is great for auditing and conflict resolution, but it can lead to massive disk usage. CouchDB provides a mechanism to prune old revisions. You can tell CouchDB to keep only a certain number of the most recent revisions for each document.
The command to initiate pruning is a POST request to the _revs_cleanup endpoint of your database. For example, to clean up a database named my_app_db and keep only the 100 most recent revisions for each document:
curl -X POST "http://localhost:5984/my_app_db/_revs_cleanup" \
-H "Content-Type: application/json" \
-d '{"new_edits": false}'
Wait, that’s not right. The _revs_cleanup endpoint doesn’t actually do the pruning. It’s a bit of a misnomer. What _revs_cleanup actually does is trigger a background process that Compacts the database. The compaction process is what removes old revisions. The new_edits: false flag is a bit of a red herring here; it’s more relevant to compaction of view indexes. For pruning document revisions, you don’t typically need to specify anything special for _revs_cleanup itself. The real control over pruning happens via the _config API, specifically the [couchdb] section’s max_revs_per_doc setting.
To set the maximum number of revisions per document to 100 database-wide, you’d use:
curl -X PUT "http://localhost:5984/_config/couchdb/max_revs_per_doc" \
-H "Content-Type: application/json" \
-d '"100"'
After setting this configuration, you then run _revs_cleanup to trigger the compaction that will enforce this limit on existing documents.
curl -X POST "http://localhost:5984/my_app_db/_revs_cleanup"
Once this compaction runs, CouchDB will start discarding revisions older than the max_revs_per_doc limit. You can monitor the progress of compaction via the /_active_tasks endpoint.
The most surprising thing is that even after you set max_revs_per_doc, CouchDB doesn’t immediately delete old revisions. It marks them for deletion, and they are only physically removed from disk during a compaction process. This means that even after setting the configuration and triggering a cleanup, you might not see an immediate disk space reduction until the next full database compaction completes.
The next thing you’ll likely run into is understanding how view compactions interact with document compactions and the impact on disk I/O.