The _bulk_docs API in CouchDB doesn’t actually do "bulk inserts" or "bulk updates" in the way you might expect; it’s a single API call that can either insert or update documents, and its real power comes from its atomicity guarantees.
Let’s see it in action. Imagine you have a CouchDB database named my_database and you want to add a couple of new documents and update one existing document.
First, here’s a document we’ll be adding. Notice it has no _id field, so CouchDB will generate one for us.
{
"name": "Alice",
"email": "alice@example.com"
}
Next, another document to add, this time with a pre-defined _id.
{
"_id": "user-bob-123",
"name": "Bob",
"email": "bob@example.com"
}
Finally, an update to an existing document. You must provide the _id and the current _rev of the document you want to modify. Let’s say the document with _id: "user-charlie-456" currently has _rev: "3-abc123def456".
{
"_id": "user-charlie-456",
"_rev": "3-abc123def456",
"name": "Charlie Smith",
"email": "charlie.smith@example.com",
"status": "active"
}
Now, we bundle these into a single request to the _bulk_docs endpoint of our database:
curl -X POST \
-H "Content-Type: application/json" \
-d '{
"docs": [
{
"name": "Alice",
"email": "alice@example.com"
},
{
"_id": "user-bob-123",
"name": "Bob",
"email": "bob@example.com"
},
{
"_id": "user-charlie-456",
"_rev": "3-abc123def456",
"name": "Charlie Smith",
"email": "charlie.smith@example.com",
"status": "active"
}
]
}' \
http://localhost:5984/my_database/_bulk_docs
The response will tell you the outcome for each document. For newly inserted documents, you’ll see the generated _id and the new _rev. For updated documents, you’ll see the same _id and a new _rev.
{
"ok": true,
"results": [
{
"ok": true,
"id": "generated-id-for-alice",
"rev": "1-xyz789uvw012"
},
{
"ok": true,
"id": "user-bob-123",
"rev": "1-rst456qwe789"
},
{
"ok": true,
"id": "user-charlie-456",
"rev": "4-def123ghi456"
}
]
}
The problem _bulk_docs solves is ensuring that a set of operations against your documents either all succeed or all fail. This is crucial for maintaining data consistency. If you were to send three separate PUT requests for these operations, and the network failed after the first two, you’d be in an inconsistent state with some documents updated and others not. _bulk_docs guarantees atomicity for the entire batch. CouchDB handles the transaction internally; if any document in the batch fails to be written (e.g., due to a conflict on an update, or a validation error), the entire batch is rolled back, and you’ll receive an error indicating which document(s) failed and why.
The key levers you control are the docs array within the JSON payload. Each object in this array represents a document operation.
- If an object has an
_idand_rev, CouchDB attempts to update the existing document. If the_revdoesn’t match the current revision in the database, the update will fail with aconflicterror. - If an object has an
_idbut no_rev(or if the_iddoesn’t exist), CouchDB attempts to insert a new document with that_id. - If an object has neither
_idnor_rev, CouchDB generates a new UUID for the_idand inserts the document.
The surprising thing about _bulk_docs is that it’s not about raw speed through parallelization. While it’s more efficient than many individual requests due to reduced network overhead, CouchDB processes these documents sequentially on the server-side. The real performance gain, and its most critical feature, is the transactional guarantee it provides across multiple document operations.
What most people don’t realize is that the _bulk_docs API is also the mechanism used internally by CouchDB for replication. When CouchDB replicates data between servers, it packages up changes and sends them using this exact API, leveraging its atomicity to ensure that a set of replicated documents arrives consistently on the target.
The next step after mastering _bulk_docs is understanding how to handle the _rev parameter effectively, especially in high-concurrency scenarios where conflict errors are more likely.