CouchDB’s HTTP API, despite its RESTful elegance, can be a surprisingly tricky beast to wrangle for truly reliable clients.

Let’s see it in action. Imagine a simple Python client interacting with a CouchDB instance.

import requests
import json

COUCHDB_URL = "http://localhost:5984"

def get_doc(db_name, doc_id):
    url = f"{COUCHDB_URL}/{db_name}/{doc_id}"
    try:
        response = requests.get(url, timeout=5) # Timeout is crucial!
        response.raise_for_status() # Checks for HTTP errors (4xx or 5xx)
        return response.json()
    except requests.exceptions.RequestException as e:
        print(f"Error fetching document {doc_id} from {db_name}: {e}")
        return None

def update_doc(db_name, doc_id, doc_data):
    url = f"{COUCHDB_URL}/{db_name}/{doc_id}"
    try:
        # Ensure _rev is present for updates
        if '_rev' not in doc_data:
            existing_doc = get_doc(db_name, doc_id)
            if existing_doc:
                doc_data['_rev'] = existing_doc['_rev']
            else:
                print(f"Cannot update doc {doc_id}: missing _rev and document not found.")
                return None

        response = requests.put(url, data=json.dumps(doc_data), timeout=5)
        response.raise_for_status()
        return response.json()
    except requests.exceptions.RequestException as e:
        print(f"Error updating document {doc_id} in {db_name}: {e}")
        return None

# Example Usage
if __name__ == "__main__":
    db = "my_database"
    doc_id = "my_document"
    initial_data = {"_id": doc_id, "message": "Hello, CouchDB!"}

    # Create or update document
    print("Attempting to create/update document...")
    result = update_doc(db, doc_id, initial_data)
    if result:
        print("Document updated successfully:", result)

    # Retrieve document
    print("\nAttempting to retrieve document...")
    retrieved_doc = get_doc(db, doc_id)
    if retrieved_doc:
        print("Document retrieved:", retrieved_doc)

    # Modify and update
    if retrieved_doc:
        retrieved_doc["message"] = "CouchDB is awesome!"
        print("\nAttempting to update again...")
        result_update_2 = update_doc(db, doc_id, retrieved_doc)
        if result_update_2:
            print("Document updated successfully again:", result_update_2)

    # Example of a potential error (e.g., document not found on get)
    print("\nAttempting to retrieve non-existent document...")
    non_existent_doc = get_doc(db, "non_existent_id")
    if non_existent_doc is None:
        print("As expected, document not found.")

This code snippet demonstrates basic GET and PUT operations. Notice the timeout=5 and response.raise_for_status(). These are early indicators of how we need to think about reliability: network issues, server hiccups, and the inherent statefulness of CouchDB’s document model.

CouchDB’s API is fundamentally about manipulating JSON documents within databases. The core operations are GET (retrieving a document), PUT (creating or updating a document), DELETE (removing a document), and POST (often used for bulk operations or creating documents without specifying an ID). Views, queries, and replication are also exposed via HTTP endpoints. The "state" in CouchDB is primarily the collection of documents, each with a unique ID and a revision history. When you update a document, you’re not just overwriting data; you’re creating a new revision, and CouchDB requires you to provide the _rev of the document you’re modifying to ensure you’re not overwriting someone else’s changes. This optimistic concurrency control is key to its distributed nature.

The fundamental problem CouchDB’s API tries to solve is providing a robust, scalable, and easy-to-use data store for web applications, especially those needing to handle offline synchronization and distributed data. It achieves this with its document-oriented model, eventual consistency, and built-in replication. The API is the gateway to all these features.

The most surprising thing about CouchDB’s HTTP API is how its handling of revisions, while central to its strength, can also be the source of the most subtle and frustrating client-side errors if not managed meticulously.

To build a reliable client, you must first accept that network requests can and will fail. Your client code needs to be resilient. This means implementing robust error handling, especially for requests.exceptions.RequestException in Python (or its equivalent in other languages). Don’t just catch generic exceptions; be specific about network timeouts, connection errors, and server-side HTTP error codes (4xx and 5xx). For timeouts, a reasonable starting point is 5 to 15 seconds, depending on your network conditions and expected server load.

# Example: More granular error handling
try:
    response = requests.get(url, timeout=5)
    response.raise_for_status()
    data = response.json()
except requests.exceptions.Timeout:
    print("Request timed out. Server might be overloaded or unreachable.")
except requests.exceptions.ConnectionError:
    print("Failed to connect to CouchDB. Is the server running?")
except requests.exceptions.HTTPError as e:
    print(f"HTTP error occurred: {e.response.status_code} - {e.response.text}")
except requests.exceptions.RequestException as e:
    print(f"An unexpected request error occurred: {e}")

The _rev field is not just metadata; it’s a CAS (Compare-And-Swap) token. When you PUT a document, CouchDB checks if the provided _rev matches the current revision of the document on the server. If it doesn’t, it means the document has been modified since you last fetched it, and CouchDB will return a 409 Conflict error. Your client must be prepared to handle this. The typical strategy is to re-fetch the document, merge your intended changes with the latest version from the server, and then attempt the PUT again.

# Handling 409 Conflict
def update_doc_with_conflict_resolution(db_name, doc_id, new_data):
    url = f"{COUCHDB_URL}/{db_name}/{doc_id}"
    max_retries = 3
    for attempt in range(max_retries):
        try:
            # First, get the current document to obtain the _rev
            response = requests.get(url, timeout=5)
            response.raise_for_status()
            doc = response.json()
            current_rev = doc.get('_rev')

            if not current_rev:
                print(f"Document {doc_id} found but missing _rev. Cannot update.")
                return None

            # Prepare the document to be updated
            update_payload = {"_rev": current_rev, **new_data}

            # Attempt the PUT operation
            response = requests.put(url, data=json.dumps(update_payload), timeout=5)
            response.raise_for_status()
            return response.json()

        except requests.exceptions.HTTPError as e:
            if e.response.status_code == 409:
                print(f"Conflict detected for document {doc_id} (attempt {attempt + 1}/{max_retries}). Retrying...")
                # The document was modified by someone else.
                # The loop will re-fetch the latest version and try again.
                continue # Go to the next iteration of the loop
            else:
                print(f"HTTP error during update: {e.response.status_code} - {e.response.text}")
                return None
        except requests.exceptions.RequestException as e:
            print(f"Request error during update: {e}")
            return None
    print(f"Failed to update document {doc_id} after {max_retries} retries due to conflicts.")
    return None

Idempotency is your friend. CouchDB’s PUT and DELETE operations are inherently idempotent if you use the correct _rev for PUT. This means making the same request multiple times has the same effect as making it once. Design your client operations to be idempotent where possible. For example, if a PUT request fails after the server might have processed it (e.g., a network split after the server acknowledges the write but before the client receives it), retrying the PUT with the same data and _rev is safe.

When dealing with bulk operations (e.g., _bulk_docs), CouchDB processes them sequentially. If one document in a bulk POST fails due to a conflict, the entire operation might be rolled back or partially completed depending on the CouchDB version and exact API usage. Design your bulk operations carefully. It’s often safer to break down large bulk operations into smaller batches, or to pre-validate documents and handle retries for individual documents that fail.

Consider the ?new_edits=false parameter. When creating a new document using POST without specifying an _id, CouchDB generates one. If you POST the same document data twice, you’ll get two different documents with different IDs. However, if you POST a document with a pre-assigned _id and _rev, new_edits=false tells CouchDB to accept that revision ID, making the operation idempotent even for creation if you use it correctly. This is less common but powerful for specific scenarios.

Finally, CouchDB’s HTTP API is stateless from the server’s perspective for each request. This is great for scalability but means your client needs to manage state, particularly the _rev values of documents it frequently interacts with. Caching _revs locally can reduce the number of GET requests needed before an update, but you must have a strategy for invalidating that cache when conflicts occur.

The next hurdle you’ll likely face is understanding and efficiently managing CouchDB’s view collation and indexing, especially when dealing with large datasets and complex queries.

Want structured learning?

Take the full Couchdb course →