CouchDB’s partitioned databases are a surprisingly efficient way to scale query performance, not by parallelizing queries across nodes, but by distributing the data within a single node to minimize I/O.

Let’s see this in action. Imagine we have a bookings database tracking airline reservations. Without partitioning, a query for all bookings in July might scan every document in the database, even those from January.

// Example: Querying a non-partitioned database
// This would scan the entire 'bookings' database
db.find({
  selector: {
    type: 'booking',
    departure_date: {
      $gte: '2023-07-01',
      $lt: '2023-08-01'
    }
  }
});

Now, let’s create a partitioned version. We’ll use a partition_key that incorporates the month and year, allowing CouchDB to physically separate data for different time periods.

// Create a partitioned database
// PUT /bookings_partitioned
// Request Body:
// {
//   "partitioned": true
// }

// Add a document to a specific partition
// PUT /bookings_partitioned/2023-07/doc_id_1
// Request Body:
// {
//   "type": "booking",
//   "departure_date": "2023-07-15",
//   "passenger": "Alice"
// }

// Querying the partitioned database for July
// This query targets only the '2023-07' partition
db.find({
  selector: {
    type: 'booking',
    departure_date: {
      $gte: '2023-07-01',
      $lt: '2023-08-01'
    }
  },
  // CouchDB implicitly uses the partition key from the URL
  // For a query like this, it's efficient if the index
  // includes the partition key and the query fields.
});

The core problem partitioned databases solve is the "hot spot" or the ever-growing index that needs to be scanned for many common queries. By partitioning, we’re not just grouping data logically; CouchDB physically organizes documents within partitions. When you query a specific partition (or a query that can be scoped to a partition), CouchDB only needs to consult the index for that partition, drastically reducing the amount of data and index entries it needs to process. This is particularly effective for time-series data, multi-tenant applications, or any dataset where queries naturally align with a specific, well-defined subset of the data.

The key lever you control is the partition_key. This is appended to the database name in the URL for document operations (e.g., /my_database/my_partition_key/doc_id). When you perform a query, CouchDB can use the partition key derived from the URL to limit its search. For this to be most effective, your _design documents and their indexes should be designed to leverage this. A common pattern is to include the partition key field in your index.

Consider an index defined in a _design/bookings document:

{
  "_id": "_design/bookings",
  "language": "javascript",
  "views": {
    "by_departure_date": {
      "map": "function(doc) { if (doc.type === 'booking') { emit([doc.departure_date, doc.booking_id]); } }",
      "index": {
        "fields": ["departure_date", "booking_id"]
      }
    }
  }
}

When querying /bookings_partitioned/2023-07/_find, CouchDB will automatically try to use indexes that can be scoped to the 2023-07 partition. If your _find query also includes departure_date fields, CouchDB can efficiently use the index within that partition. The magic is that CouchDB doesn’t need to scan all _design/bookings indexes across the entire database; it focuses on the indexes associated with the requested partition.

What most people miss is that while partitioning distributes data and indexes, it doesn’t magically parallelize queries across nodes in a cluster. A partitioned database still lives on a single node (or is replicated as a whole). The performance gain comes from reducing the local I/O and index scanning on that node. You still need a cluster for high availability and distributing load across nodes, but partitioning is the primary tool for scaling query performance within a well-defined data subset on a single node.

The next step is understanding how to manage view rebuilds and index maintenance across many partitions.

Want structured learning?

Take the full Couchdb course →