The Couchbase Index Advisor is your best friend when you realize that your queries are crawling, and you suspect missing indexes are the culprit. It’s designed to analyze your query logs and suggest the optimal indexes you should create to speed things up.

Let’s see it in action. Imagine you have a users bucket with documents like this:

{
  "name": "Alice",
  "email": "alice@example.com",
  "city": "New York",
  "signup_date": "2023-01-15T10:00:00Z",
  "active": true
}

And you’re running queries like:

SELECT * FROM users WHERE city = "New York";
SELECT name, email FROM users WHERE active = true AND signup_date > "2023-03-01T00:00:00Z";

If these queries are slow, the Index Advisor can help.

How it Works

The Index Advisor operates by examining the queries that have been executed against your Couchbase cluster. It looks for patterns in the WHERE clauses, ORDER BY clauses, and JOIN conditions. By comparing these patterns against the existing indexes, it identifies queries that are performing full scans (or expensive partial scans) because no suitable index exists.

Once it identifies such queries, it calculates the most efficient index (or set of indexes) that would satisfy those queries. It considers factors like the selectivity of the fields (how many unique values they have) and the order in which fields are used in the query.

Using the Index Advisor

You can access the Index Advisor through the Couchbase Web Console.

  1. Navigate to the Query Workbench: In the Couchbase Web Console, go to Query.
  2. Enable Index Advisor: On the right-hand side of the Query Workbench, you’ll see a section for "Index Advisor." Click the "Enable" button.
  3. Run Your Queries: Now, execute the queries you suspect are slow. You can also let it run passively while your application is using the database. The advisor will collect data over time.
  4. View Suggestions: After some queries have been run (or after a period of observation), click the "Analyze" button in the Index Advisor panel. It will then present a list of suggested indexes.

The suggestions will look something like this:

Index Name: idx_users_city
Bucket: users
Index Expression: "city"
Query: SELECT * FROM users WHERE city = "New York";

Index Name: idx_users_active_signup
Bucket: users
Index Expression: "active", "signup_date"
Query: SELECT name, email FROM users WHERE active = true AND signup_date > "2023-03-01T00:00:00Z";

Creating the Suggested Indexes

To create these indexes, you can copy the provided CREATE INDEX statements and run them in the Query Workbench:

CREATE INDEX `idx_users_city` ON `users`(`city`);
CREATE INDEX `idx_users_active_signup` ON `users`(`active`, `signup_date`);

Once these indexes are built, Couchbase’s query optimizer will automatically use them for subsequent queries that match the patterns, leading to significant performance improvements.

The Mental Model: What’s Actually Happening?

When you run a query without an index, Couchbase has to perform what’s called a full bucket scan. It has to open every single document in the specified bucket and check if it matches the query’s criteria. This is incredibly inefficient, especially for large buckets.

Indexes are like the index in the back of a book. Instead of reading the whole book to find every mention of "New York," you look up "New York" in the index, and it tells you exactly which pages to turn to. In Couchbase, an index is a separate data structure that stores a sorted list of values for one or more fields, along with pointers to the documents containing those values.

When you create idx_users_city on the city field, Couchbase builds a data structure that maps each unique city name to the documents where that city appears. When you query WHERE city = "New York", Couchbase consults idx_users_city, finds "New York" in its sorted list, and directly retrieves the pointers to the relevant documents. This is orders of magnitude faster than scanning every document.

For composite indexes like idx_users_active_signup on active and signup_date, Couchbase stores a sorted combination of these fields. This allows it to efficiently satisfy queries that filter on both active and signup_date, or queries that filter on active alone (if active is the first field in the index and the query uses an equality predicate on it). The order of fields in the CREATE INDEX statement matters: ON users(active, signup_date) is generally better for WHERE active = ? AND signup_date > ? than ON users(signup_date, active).

The Nuance: Why ARRAY Fields are Tricky

Many developers miss the fact that the Index Advisor’s suggestions for indexing fields within JSON arrays are crucial and often overlooked. If you have documents like this:

{
  "name": "Bob",
  "tags": ["frontend", "javascript", "react"]
}

And you run a query like SELECT * FROM users WHERE ANY tag IN tags SATISFIES tag = "react" END;, you’ll need a specific type of index. The Index Advisor will suggest an index using ARRAY_CONTAINS or a standard index on the array field itself, depending on the exact query pattern. Creating a regular index on tags won’t work effectively for ANY clauses. You need to explicitly tell Couchbase how to index array elements. The correct syntax for this specific ANY query would be:

CREATE INDEX `idx_users_tags` ON `users`(`tags`[*] FOR "tag");

This tells Couchbase to index each element within the tags array individually, allowing efficient lookups for any matching element.

The next step after optimizing your indexes is understanding how to tune your N1QL queries for even better performance.

Want structured learning?

Take the full Couchbase course →