The Q parameter in CouchDB sharding isn’t about how many shards you have, but rather how CouchDB decides which shard a document belongs to.
Let’s see it in action. Imagine you have a CouchDB cluster and you’re setting up a sharded database. You create a design document with a _id like _design/mydatabase. Inside this design document, you define a fulltext index.
{
"_id": "_design/mydatabase",
"fulltext": {
"my_index": {
"index": "function (doc) { index('default', doc.title); }"
}
},
"views": {}
}
When you index documents using this fulltext index, CouchDB needs to know which shard to put the index terms into. This is where Q comes in. If you don’t specify Q, CouchDB defaults to Q=1. This means it uses the document’s _id modulo the number of shards to determine the shard.
Now, let’s say you want to influence where these index terms go. You can use the Q parameter when you’re querying the view or index. For example, to query the my_index fulltext index, you might construct a URL like this:
GET /mydatabase/_design/mydatabase/_view/my_index?q=search_term&Q=10
Here, Q=10 tells CouchDB to use 10 as the divisor for determining the shard. It will calculate hash(doc._id) % 10 (or a similar hashing mechanism) to figure out which shard to look in for documents whose index terms match search_term.
The real power of Q emerges when you have a large number of shards. Let’s say you have 128 shards. If you’re querying a fulltext index and you don’t specify Q, CouchDB will effectively use Q=128 (or rather, it will hash the _id and then modulo by the number of shards). This distributes the index terms across all shards.
However, there are scenarios where you might want to control this distribution more granularly, especially during large data migrations or when dealing with specific performance bottlenecks. If you’re rebalancing shards or performing a large-scale update that you want to isolate to a subset of shards, you could use Q.
For instance, imagine you’re migrating data and want to ensure that documents with _ids starting with user_ go to a specific set of shards. You could theoretically use Q to influence this, though it’s more commonly used for fulltext indexing. If you wanted to direct queries for user_ related terms to shards that are likely to hold those documents, you might set Q to a value that aligns with your sharding strategy. For example, if your shard keys are designed such that user_ documents tend to fall into shards 0-15, you might experiment with Q=16 to see if it improves query performance by limiting the search scope.
The core problem Q addresses is how to deterministically map indexed data (like fulltext terms) to specific shards when the default _id modulo behavior isn’t granular enough or doesn’t align with your desired distribution. By providing an explicit Q value, you’re essentially telling CouchDB to use a different divisor in its sharding calculation for that specific query.
The mental model to build here is that Q is a query-time parameter that overrides the default sharding calculation for the purpose of that query. It doesn’t change the underlying sharding configuration of the database itself. It’s a way to tell CouchDB, "When you’re looking for documents related to this query, use this number as the basis for which shards to check."
What most people don’t realize is that Q is applied after CouchDB has determined which documents are candidates for the query. CouchDB first identifies the documents that match your query criteria (e.g., the text you’re searching for). Then, for each of those candidate documents, it calculates its target shard using hash(doc._id) % Q. If this calculated shard is among the shards CouchDB is currently querying, it includes the result. This allows you to effectively "pre-filter" which shards are even considered for a given query.
The next thing you’ll likely grapple with is how Q interacts with view collation and index merging across shards, especially when dealing with very high shard counts.