N1QL joins are often the performance bottleneck in Couchbase applications, and index hints are your secret weapon for steering the query optimizer down the right path.

Let’s see this in action. Imagine we have two buckets, users and orders, and we want to find all orders placed by a specific user.

// users bucket
{
  "type": "user",
  "user_id": "user123",
  "name": "Alice"
}

// orders bucket
{
  "type": "order",
  "order_id": "order456",
  "user_id": "user123",
  "amount": 50.00
}

A naive join query might look like this:

SELECT u.name, o.order_id, o.amount
FROM users u
JOIN orders o ON u.user_id = o.user_id
WHERE u.user_id = "user123";

Without any hints, Couchbase’s query optimizer will analyze the query and available indexes, then make its best guess. This guess might involve scanning a large portion of one bucket to find matching documents for the join condition.

Now, let’s introduce an index hint. Suppose we have indexes like:

  • users_user_id on users(user_id)
  • orders_user_id on orders(user_id)

We can explicitly tell N1QL which index to use for the join.

SELECT u.name, o.order_id, o.amount
FROM users u
JOIN orders o ON u.user_id = o.user_id
WHERE u.user_id = "user123"
USING INDEX users_user_id FOR KEYS u.user_id;

Here, USING INDEX users_user_id FOR KEYS u.user_id is the hint. It tells N1QL: "When you’re looking up documents in the users bucket based on the user_id field for the join condition or the WHERE clause, use the users_user_id index." This is particularly useful when you have multiple indexes on the user_id field, or when the optimizer might incorrectly choose a less efficient index.

The core problem index hints solve is when the query optimizer, despite having access to indexes, makes a suboptimal choice for join processing. This often happens when:

  1. Multiple Indexes on the Same Field: If you have several indexes covering the join fields (e.g., users(user_id), users(user_id, name)), the optimizer might pick one that’s not ideal for this specific query’s selectivity.
  2. Suboptimal Join Order: The optimizer decides to scan orders first and then look up matching users, when scanning users first and then looking up orders would be far more efficient.
  3. Complex Query Plans: In queries with multiple joins or subqueries, the optimizer’s decision tree can become complex, leading to unexpected performance.
  4. Data Skew: If one user_id has vastly more orders than others, an index that seems generally good might perform poorly for that specific, highly selective user_id.
  5. Index Selectivity Misjudgment: The optimizer might miscalculate how many documents an index will return for a given predicate.

When you use USING INDEX index_name FOR KEYS field_name, you’re essentially guiding the optimizer. For users u JOIN orders o ON u.user_id = o.user_id WHERE u.user_id = "user123", if you hint users_user_id FOR KEYS u.user_id, Couchbase will use the users_user_id index to efficiently find the document(s) for users where user_id = "user123". Then, it will use the user_id from those found user documents to look up matching orders in the orders bucket, ideally using an index like orders_user_id.

The FOR KEYS clause is crucial. It specifies which field in the ON clause of the join (or the WHERE clause) the index hint applies to. If you omit FOR KEYS, the hint applies to the entire join condition, which is less precise.

To diagnose whether a hint is needed, run your query with EXPLAIN. Look at the plan section. If you see a PrimaryScan on a large bucket where an index scan would be more appropriate, or if a join operation is using a less selective index, it’s a prime candidate for a hint. For example, an EXPLAIN output might show:

{
  "requestID": "...",
  "signature": "json-1.0",
  "results": [
    {
      "plan": {
        "#operator": "Sequence",
        "~children": [
          {
            "#operator": "Parallel",
            "~children": [
              {
                "#operator": "IndexScan",
                "index": "users_user_id",
                "justKey": true,
                "keys": [
                  "\"user123\""
                ],
                "spans": [
                  {
                    "range": [
                      {
                        "high": "\"user123\"",
                        "include_high": true,
                        "low": "\"user123\"",
                        "spans": [
                          {
                            "exact": true,
                            "name": "users_user_id"
                          }
                        ]
                      }
                    ]
                  }
                ]
              }
            ]
          },
          {
            "#operator": "Parallel",
            "~children": [
              {
                "#operator": "KeyspaceScan",
                "id": "datastore",
                "index": "orders",
                "keyspace": "orders",
                "spans": [
                  {
                    "range": [
                      {
                        "low": "\"user123\""
                      }
                    ]
                  }
                ]
              }
            ]
          }
        ]
      },
      "metrics": { ... }
    }
  ],
  "status": "success",
  "timing": "..."
}

If the IndexScan is on the wrong index, or if the KeyspaceScan on orders is inefficient, you’d add the hint.

The most subtle point about index hints is their interaction with the ORDER BY clause. If you hint an index for a join condition, but that index doesn’t cover the fields required for an ORDER BY clause that follows the join, Couchbase will still need to perform a sort operation after the join and index lookups. This can negate some of the performance gains. Ideally, your hinted index (or a combination of indexes used in sequence) should also satisfy the ORDER BY clause to enable an index-ordered scan, avoiding a separate sort phase.

With the hint in place, the query plan will explicitly show the IndexScan for users_user_id being used to satisfy the WHERE clause predicate efficiently, and then the user_id obtained from that scan will be used to probe the orders bucket, likely via the orders_user_id index. This drastically reduces the number of documents scanned.

The next thing you’ll likely encounter is optimizing queries with multiple joins, where you might need to hint indexes for each join leg independently.

Want structured learning?

Take the full Couchbase course →