CouchDB attachments are not just files stored alongside documents; they are fundamentally woven into the document’s JSON structure, which is the source of their surprising performance characteristics.

Let’s see this in action. Imagine a CouchDB document representing a user profile with a small avatar image attached.

{
  "_id": "user-123",
  "_rev": "1-abc...",
  "type": "user",
  "username": "alice",
  "avatar": {
    "content_type": "image/png",
    "data": "iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAAAXNSR0IArs4c6QAAACBjSFJNAAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3Cuz3zgAAACxSURBVHgB7ZIxCQAgDETn/f/j6G1lMh5q3sI00K0Wk0oV8+i94O0s03iPIS+37w2Yt82Yj/1aK1Qf8A3kR1mJv9Q/H/lP7t2/gV2X8h0f+j47j5eF2b8cQ+aR1uX8lP5J5uJ8/N3uV+g94H0M321h6H/qfV0A0R+q/9n2oV399w3+n72l/X/o/6f3M1fT9zF5/7i31n9qH8P7mP/4Q10rU4z/QAAAABJRU5ErkJggg=="
  }
}

When you fetch this document, the entire JSON, including the avatar field containing the base64 encoded image data, is transferred.

The core problem CouchDB attachments solve is the desire to keep related binary data directly with its metadata. This seems simple, but it has profound implications. When CouchDB stores a document, it serializes the entire document, including attachments, into a single JSON object. This object is then compressed (usually with gzip) and stored. When you retrieve the document, CouchDB decompresses it and sends the whole thing back.

This means that even if you only need the document’s metadata (like the username), you still download the attachment’s data, as it’s embedded within the main JSON. This can quickly become a bottleneck. Fetching a document with a 1MB image attached means downloading that 1MB (plus any compression overhead) every single time you access the document, even if you only care about the _id and _rev.

The primary levers you control are how you design your documents and how you retrieve them. If you have documents that frequently change or are accessed without their associated binary data, you should strongly consider storing attachments separately. This could involve using a dedicated object storage service (like S3) or another database optimized for large binary objects.

The performance cost is directly tied to the size of the attachment and the frequency of document access. Every read operation of a document with an attachment incurs the cost of transferring that attachment’s data. This cost is amplified because CouchDB’s internal mechanisms, like replication and view indexing, also process the entire document, including attachments. If an attachment constitutes a significant portion of a document’s size, replication can become very slow, and view computations can consume excessive memory and I/O.

A common misconception is that attachments are like separate files linked by a pointer. They are not. They are base64 encoded strings within the document’s JSON. This embedding is why CouchDB treats them as part of the document for all operations.

Consider a scenario where you have a document with multiple attachments, or one very large attachment. If you then update any other field in that document, CouchDB will still generate a new revision of the entire document, including the unchanged attachment data, and write it to disk. This leads to increased write amplification and can bloat your database over time, even if the binary data itself hasn’t changed.

When you need to retrieve an attachment without the document, you use a specific URL pattern: /your_database/your_doc_id/your_attachment_name. This allows CouchDB to stream just the attachment. However, this still requires CouchDB to locate and decompress the attachment data from within the document’s storage. It’s more efficient than fetching the whole document if you only need the attachment, but it doesn’t negate the underlying storage and processing costs associated with attachments.

The next performance pitfall to be aware of is how CouchDB handles updates to documents that contain attachments.

Want structured learning?

Take the full Couchdb course →