CouchDB’s map-reduce views are not just for querying; they are a fundamental mechanism for building complex, dynamic data structures within the database itself.

Let’s see this in action. Imagine a simple users database with documents like this:

{"_id": "user:alice", "type": "user", "name": "Alice", "country": "USA", "tags": ["developer", "coffee_lover"]}
{"_id": "user:bob", "type": "user", "name": "Bob", "country": "Canada", "tags": ["designer", "tea_drinker"]}
{"_id": "user:charlie", "type": "user", "name": "Charlie", "country": "USA", "tags": ["developer", "music_fan"]}

We want to find all developers. A basic map function looks like this:

function (doc) {
  if (doc.type === 'user' && doc.tags && doc.tags.includes('developer')) {
    emit(doc.country, doc.name);
  }
}

When you query this view with GET /users/_design/users/_view/developers, CouchDB doesn’t scan every document every time. Instead, it builds and maintains an index. For the map function above, the index might look something like this (conceptually, not the actual on-disk format):

USA: ["Alice", "Charlie"]
Canada: ["Bob"]

The emit(key, value) pair becomes an entry in this index. The key is what CouchDB uses to sort and group, and the value is what’s associated with that key.

The power comes when you add a reduce function. Let’s say we want to count developers per country.

function (keys, values, rereduce) {
  if (rereduce) {
    // If this is a re-reduction (e.g., for clustered views), sum up the counts
    return sum(values);
  } else {
    // Otherwise, just count the number of values emitted for a given key
    return values.length;
  }
}

Querying this view (GET /users/_design/users/_view/developers?group=true) yields:

{"rows": [
  {"key": "Canada", "value": 1},
  {"key": "USA", "value": 2}
]}

CouchDB’s index is built incrementally. When a new document is added or updated, only the relevant parts of the index are recomputed. This is why querying indexed views is fast, even as the database grows. The index is essentially a sorted list of (key, value) pairs derived from your documents.

The emit function is the core of map functions. It can emit multiple key-value pairs from a single document. For example, if you wanted to index users by country and by tag:

function (doc) {
  if (doc.type === 'user') {
    if (doc.country) {
      emit(doc.country, 1); // Emit count for country
    }
    if (doc.tags) {
      doc.tags.forEach(function(tag) {
        emit(tag, 1); // Emit count for each tag
      });
    }
  }
}

This single map function would create index entries for both country and tags, allowing you to query by either. The group=true with a reduce function that sums (sum(values)) would then give you counts for each unique country and each unique tag.

The reduce function operates on the results of the map function. When you use group=true, CouchDB groups the emitted keys and passes the corresponding values to the reduce function. If group_level is specified, it groups by that level of the key. The rereduce flag is crucial for distributed environments or when CouchDB needs to combine pre-computed reduce results.

The actual on-disk representation of a view index is a B-tree. Each node in the tree contains sorted keys and pointers to child nodes or leaf nodes containing the actual (key, value) lists. This structure allows for efficient searching, insertion, and deletion of index entries.

The startkey and endkey query parameters allow you to fetch a range of entries from the index. For example, GET /users/_design/users/_view/developers?startkey="US" would retrieve all entries where the key starts with "US", effectively filtering by country.

When you query a view, CouchDB first checks if the index is up-to-date. If it’s stale, it will update the index in the background before returning the results (unless you explicitly request stale results with stale=ok). This ensures data consistency.

The mental model to hold is that a view index is a persistent, sorted, and queryable data structure derived from your documents, updated incrementally as documents change. It’s not a materialized view in the traditional SQL sense where the entire result set is stored; rather, it’s an index that CouchDB can traverse efficiently to compute the result.

A common pitfall is misunderstanding how emit works with complex keys. If you emit([doc.country, tag], 1), CouchDB will sort these as nested arrays, allowing for multi-dimensional queries and grouping.

The next concept to explore is how to use lib/view_cleanup to remove old, unused view indexes.

Want structured learning?

Take the full Couchdb course →