CouchDB’s backup and restore process is surprisingly manual and relies on its HTTP API, not a dedicated command-line tool for full database dumps.

Let’s see it in action. Imagine you have a CouchDB instance running and want to back up a database named my_app_db.

First, we’ll use curl to retrieve the database’s contents as a JSON document. This document contains all documents, design documents, and view definitions.

curl -X GET http://localhost:5984/my_app_db > my_app_db_backup.json

This command simply hits the /my_app_db endpoint and pipes the output to a file named my_app_db_backup.json. Now, if you were to inspect this file, you’d see something like this:

{
  "db_name": "my_app_db",
  "doc_count": 150,
  "doc_del_count": 5,
  "update_seq": "155-abc123def456",
  "compacted_seq": "150-ghi789jkl012",
  "uuid": "a1b2c3d4-e5f6-7890-1234-567890abcdef",
  "sizes": {
    "active": 123456,
    "disk": 789012,
    "external": 987654
  },
  "purge_seq": "155-abc123def456",
  "props": {},
  "instance_start_time": "1678886400000000"
}

This JSON is not the actual data. The actual documents are found in CouchDB’s internal storage, and the _all_docs endpoint is what gives us the list of document IDs and their revisions, along with their content if requested.

To get the actual documents, we need to use the _all_docs endpoint with the include_docs=true parameter.

curl -X GET 'http://localhost:5984/my_app_db/_all_docs?include_docs=true&limit=10000' > my_app_db_docs.json

The limit parameter is crucial here. CouchDB paginates results from _all_docs. If you have more documents than your limit, you’ll need to use the _update_seq or _next_key parameters to paginate through all of them. A common strategy is to set a high limit like 10000 or 100000, or to iterate using the _update_seq.

This my_app_db_docs.json file will contain a structure like this:

{
  "total_rows": 150,
  "offset": 0,
  "rows": [
    {
      "id": "doc1",
      "key": "doc1",
      "value": {
        "rev": "1-a1b2c3d4e5f678901234567890abcdef"
      },
      "doc": {
        "_id": "doc1",
        "_rev": "1-a1b2c3d4e5f678901234567890abcdef",
        "name": "Example Document 1",
        "value": 100
      }
    },
    {
      "id": "doc2",
      "key": "doc2",
      "value": {
        "rev": "2-f0e9d8c7b6a543210fedcba987654321"
      },
      "doc": {
        "_id": "doc2",
        "_rev": "2-f0e9d8c7b6a543210fedcba987654321",
        "name": "Example Document 2",
        "data": {
          "fieldA": "valueA",
          "fieldB": "valueB"
        }
      }
    }
    // ... more documents
  ]
}

This JSON does contain your actual documents, including their _id and _rev. This is your primary backup data.

To restore this database, you’d first create an empty database if it doesn’t exist:

curl -X PUT http://localhost:5984/my_app_db_restored

Then, you’d use the _bulk_docs endpoint to re-insert your documents.

curl -X POST http://localhost:5984/my_app_db_restored/_bulk_docs \
  -H "Content-Type: application/json" \
  -d '{"docs": [{"_id": "doc1", "_rev": "1-a1b2c3d4e5f678901234567890abcdef", "name": "Example Document 1", "value": 100}, {"_id": "doc2", "_rev": "2-f0e9d8c7b6a543210fedcba987654321", "name": "Example Document 2", "data": {"fieldA": "valueA", "fieldB": "valueB"}}] }'

The key here is that _bulk_docs expects a JSON object with a docs array. Each element in the docs array is a document to be inserted or updated. For an update, you must include the _rev field. If you omit _rev or provide an incorrect one, CouchDB will treat it as a new document creation, which might not be what you want during a restore from a full backup.

The _bulk_docs endpoint is optimized for inserting many documents at once. You’ll want to process your my_app_db_docs.json file, extract the doc objects from the rows array, and send them in batches to _bulk_docs. A common batch size might be 1000 documents.

One subtle but critical aspect of CouchDB backups is handling design documents and their associated views. The _all_docs?include_docs=true call does include design documents (those starting with _design/). However, when restoring, you need to ensure that the design documents are restored before or concurrently with the data documents that depend on them. If you try to query a view before its design document is fully available and indexed, you’ll get an empty result or an error.

The _view_cleanup command is often run after a restore to remove old view indexes that might be lingering from previous states, ensuring that only the current, correct indexes are built.

curl -X POST http://localhost:5984/my_app_db_restored/_view_cleanup

This command tells CouchDB to clean up any unused view indexes, which can save disk space and prevent unexpected behavior if you’re restoring over an existing database that had different view structures.

Another approach to consider for larger databases or more robust backup strategies is using the _replicator database. You can set up a replication job from your source database to a new, empty target database on a separate CouchDB instance or even back to the same instance (though not ideal for disaster recovery). This leverages CouchDB’s built-in replication mechanism, which is more fault-tolerant and can handle incremental updates.

// Example _replicator document for a full backup replication
{
  "_id": "rep-my_app_db-backup",
  "source": "http://localhost:5984/my_app_db",
  "target": "http://localhost:5984/my_app_db_backup_replica",
  "create_target": true,
  "continuous": false, // Set to true for ongoing sync
  "cancel": false
}

You would then POST this document to the _replicator database. This is often a more reliable method for large datasets as CouchDB handles the chunking, retries, and tracking of replicated documents.

The next hurdle you’ll likely encounter is managing the _rev IDs during restore. If your backup file contains documents with old _rev IDs and you’re trying to restore them into a database that already has newer versions of those documents, the _bulk_docs operation will fail with a conflict error. You’ll need a strategy to either deduplicate your backup or to selectively update documents based on their revision history.

Want structured learning?

Take the full Couchdb course →