The most surprising thing about Couchbase query plans is that they often look like they’re telling you the truth, but the real bottleneck is usually hiding in plain sight, masked by a seemingly innocent operation.

Let’s see what a query plan actually does. Imagine you’re asking Couchbase to find all users in California who joined after January 1st, 2023, and have a "premium" subscription.

{
  "requestID": "a1b2c3d4-e5f6-7890-1234-567890abcdef",
  "signature": {
    "host": "10.0.0.10",
    "port": 8091,
    "node": "10.0.0.10:8091"
  },
  "results": [
    {
      "plan": "{\n  \"@class\": \"com.couchbase.query.execution.v2.QueryPlan\",\n  \"operator\": {\n    \"@class\": \"com.couchbase.query.execution.v2.operator.ParallelOperator\",\n    \"children\": [\n      {\n        \"@class\": \"com.couchbase.query.execution.v2.operator.IndexScanOperator\",\n        \"index\": \"idx_users_state_joined_sub\",\n        \"index_projection\": {\n          \"primary_key\": true\n        },\n        \"keyspace\": \"users\",\n        \"term\": \"SELECT * FROM users WHERE state = 'CA' AND joined_date > '2023-01-01' AND subscription = 'premium'\",\n        \"spans\": [\n          {\n            \"exact\": true,\n            \"range\": [\n              {\n                \"high\": [\n                  \"premium\"\n                ],\n                \"index_start\": [\n                  \"CA\",\n                  null,\n                  null\n                ],\n                \"index_end\": [\n                  \"CA\",\n                  null,\n                  null\n                ],\n                \"low\": [\n                  \"CA\",\n                  null,\n                  null\n                ]\n              }\n            ]\n          }\n        ],\n        \"using_index\": \"idx_users_state_joined_sub\",\n        \"limit\": 10000,\n        \"offset\": 0\n      }\n    ]\n  },\n  \"text\": \"SELECT * FROM users WHERE state = 'CA' AND joined_date > '2023-01-01' AND subscription = 'premium'\"\n}",
      "metrics": {
        "resultCount": 50,
        "resultSize": 12345,
        "errorCount": 0,
        "warningCount": 0,
        "mutationCount": 0,
        "elapsedTime": "15.5ms",
        "executionTime": "12.2ms",
        "timeTaken": "15.8ms",
        "format": "pretty",
        "executionStages": {
          "total": 1,
          "index_scan": 1,
          "mutation_fetch": 0,
          "sort": 0,
          "group": 0,
          "join": 0,
          "filter": 0,
          "map": 0,
          "reduce": 0
        }
      }
    }
  ]
}

This plan field is Couchbase’s recipe for executing your query. The operator tree shows the steps. Here, it’s a ParallelOperator with a single IndexScanOperator. This operator is trying to use the idx_users_state_joined_sub index. The spans tell us it’s looking for documents where state is 'CA', and then it’s trying to narrow down joined_date and subscription. The metrics give us actual performance numbers: it took 15.5ms, found 50 results, and only spent time in index_scan. Seems fast, right?

The problem Couchbase solves is efficiently retrieving data from a distributed, schema-less document database. It needs to figure out the cheapest way to satisfy your SELECT statement, which often involves using indexes.

Here’s how it works internally: When you execute a query, Couchbase’s query optimizer analyzes the query text and available indexes. It generates a query plan, which is a tree of operations. Each operator in the tree represents a step, like scanning an index, fetching documents, filtering, sorting, or joining. The optimizer tries to pick the sequence of operations that will minimize disk I/O, network traffic, and CPU usage. It uses statistics about your data (maintained by the indexer) to make these choices.

The operator tree is the key. You have IndexScanOperator, FetchOperator, FilterOperator, SortOperator, JoinOperator, and AggregateOperator (for GROUP BY, etc.). The goal is to have operations that reduce the number of documents processed as early as possible. An IndexScanOperator that uses a covering index (meaning it can get all the requested data from the index itself without fetching the full document) is usually the fastest.

The metrics section is where you see the actual cost. elapsedTime is the total time. executionTime is the time spent doing actual work. resultCount tells you how many documents were ultimately returned. executionStages breaks down the time spent by type of operation. If index_scan is high, your index might be inefficient or not selective enough. If mutation_fetch is high, you’re fetching a lot of full documents from disk. If sort is high, you’re probably sorting a large intermediate result set, which is expensive.

The one thing most people don’t realize is that the spans in an IndexScanOperator are generated based on the order of fields in your index definition. If you have an index on (state, subscription, joined_date) and query for WHERE state = 'CA' AND subscription = 'premium' AND joined_date > '2023-01-01', Couchbase can use the index very effectively. But if you query for WHERE subscription = 'premium' AND state = 'CA' AND joined_date > '2023-01-01', the optimizer might still figure it out, but the spans shown in the plan might not directly reflect the query order, and the index scan might be less efficient because it’s not scanning the most selective field first according to the index definition.

The next thing you’ll want to understand is how to write indexes that perfectly match your common query patterns to ensure the IndexScanOperator is as efficient as possible.

Want structured learning?

Take the full Couchbase course →