Couchbase metrics are designed to be exported in a Prometheus-friendly format, but they don’t actually use Prometheus’s exposition format by default.

Let’s get Couchbase humming with Prometheus and Grafana. This isn’t just about pretty dashboards; it’s about having a clear view of what your data is doing, from individual document operations to cluster-wide health.

Here’s a basic setup we’ll walk through:

  1. Couchbase Server: Your data store.
  2. Prometheus: The time-series database that will scrape and store metrics.
  3. Grafana: The visualization tool that will query Prometheus and display dashboards.

First, we need Couchbase to expose metrics in a way Prometheus can understand. Couchbase has a built-in exporter.

Enabling the Couchbase Exporter

On your Couchbase node(s), you’ll need to enable the Prometheus exporter. This is done via the couchbase-cli.

First, ensure you have the necessary credentials. You’ll need a username and password for Couchbase.

Now, run this command on each Couchbase node. Replace couchbase_username and couchbase_password with your actual credentials.

/opt/couchbase/bin/couchbase-cli config --cluster=localhost:8091 --user=couchbase_username --password=couchbase_password --services="data,index,query,fts,eventing,analytics" --prometheus-exporter=enabled

This command tells Couchbase to start its internal Prometheus exporter. By default, it listens on port 9984. You can verify this by curl http://localhost:9984/metrics. You should see a wall of text containing metric names prefixed with cb_.

Configuring Prometheus to Scrape Couchbase

Now, we tell Prometheus where to find these metrics. Edit your prometheus.yml configuration file. You’ll add a new scrape job for Couchbase.

scrape_configs:
  - job_name: 'couchbase'
    static_configs:
      - targets: ['<couchbase_node_ip_1>:9984', '<couchbase_node_ip_2>:9984', '<couchbase_node_ip_3>:9984']

Replace <couchbase_node_ip_1>, <couchbase_node_ip_2>, etc., with the actual IP addresses of your Couchbase nodes. If you’re running Couchbase in a cluster, you’ll want to list all nodes so Prometheus can scrape them independently.

After saving prometheus.yml, reload your Prometheus configuration. You can do this by sending a SIGHUP signal to the Prometheus process or by making an HTTP POST request to its /-/reload endpoint:

curl -X POST http://localhost:9090/-/reload

Go to your Prometheus UI (http://<prometheus_ip>:9090), navigate to "Status" -> "Targets". You should see your couchbase job with your nodes listed, and their state should be "UP".

Setting Up Grafana Dashboards

With Prometheus scraping Couchbase, it’s time for visualization.

  1. Add Prometheus as a Data Source in Grafana:

    • Log in to your Grafana instance.
    • Go to "Configuration" (gear icon) -> "Data Sources".
    • Click "Add data source".
    • Select "Prometheus".
    • Set the URL to your Prometheus server (e.g., http://<prometheus_ip>:9090).
    • Click "Save & Test". You should see a "Data source is working" message.
  2. Import a Couchbase Dashboard:

    • Grafana has a rich community dashboard library. Search for "Couchbase" on the Grafana Dashboards site (grafana.com/grafana/dashboards/).
    • A popular and comprehensive dashboard is ID 7588 (often titled "Couchbase").
    • In Grafana, go to "Dashboards" (four squares icon) -> "Browse".
    • Click "Import".
    • Enter the dashboard ID (7588) and click "Load".
    • On the next screen, select your Prometheus data source from the dropdown.
    • Click "Import".

You should now see a detailed dashboard showing metrics like:

  • Cluster Health: Node status, memory usage, disk usage.
  • Bucket Performance: Operations per second (get, set, delete), latency, cache hit rates.
  • Query Performance: Slow queries, query execution times.
  • Index Performance: Index build times, index memory usage.

The metrics exposed by Couchbase are quite granular. For instance, cb_kv_ops_total is a counter that increments for every Key-Value operation. Prometheus’s rate() function is essential here to see operations per second. A query like rate(cb_kv_ops_total{operation="get"}[5m]) will show you the average number of GET operations per second over the last 5 minutes.

What most people overlook is how to correlate specific bucket performance with cluster-wide resource utilization. Looking at cb_kv_cache_hit_ratio for a particular bucket alongside cb_memory_used_bytes for the entire node can quickly reveal if a memory bottleneck is impacting specific services.

Now you have a robust monitoring setup for your Couchbase cluster. The next step is to configure alerting based on these metrics.

Want structured learning?

Take the full Couchbase course →