Couchbase metrics are designed to be exported in a Prometheus-friendly format, but they don’t actually use Prometheus’s exposition format by default.
Let’s get Couchbase humming with Prometheus and Grafana. This isn’t just about pretty dashboards; it’s about having a clear view of what your data is doing, from individual document operations to cluster-wide health.
Here’s a basic setup we’ll walk through:
- Couchbase Server: Your data store.
- Prometheus: The time-series database that will scrape and store metrics.
- Grafana: The visualization tool that will query Prometheus and display dashboards.
First, we need Couchbase to expose metrics in a way Prometheus can understand. Couchbase has a built-in exporter.
Enabling the Couchbase Exporter
On your Couchbase node(s), you’ll need to enable the Prometheus exporter. This is done via the couchbase-cli.
First, ensure you have the necessary credentials. You’ll need a username and password for Couchbase.
Now, run this command on each Couchbase node. Replace couchbase_username and couchbase_password with your actual credentials.
/opt/couchbase/bin/couchbase-cli config --cluster=localhost:8091 --user=couchbase_username --password=couchbase_password --services="data,index,query,fts,eventing,analytics" --prometheus-exporter=enabled
This command tells Couchbase to start its internal Prometheus exporter. By default, it listens on port 9984. You can verify this by curl http://localhost:9984/metrics. You should see a wall of text containing metric names prefixed with cb_.
Configuring Prometheus to Scrape Couchbase
Now, we tell Prometheus where to find these metrics. Edit your prometheus.yml configuration file. You’ll add a new scrape job for Couchbase.
scrape_configs:
- job_name: 'couchbase'
static_configs:
- targets: ['<couchbase_node_ip_1>:9984', '<couchbase_node_ip_2>:9984', '<couchbase_node_ip_3>:9984']
Replace <couchbase_node_ip_1>, <couchbase_node_ip_2>, etc., with the actual IP addresses of your Couchbase nodes. If you’re running Couchbase in a cluster, you’ll want to list all nodes so Prometheus can scrape them independently.
After saving prometheus.yml, reload your Prometheus configuration. You can do this by sending a SIGHUP signal to the Prometheus process or by making an HTTP POST request to its /-/reload endpoint:
curl -X POST http://localhost:9090/-/reload
Go to your Prometheus UI (http://<prometheus_ip>:9090), navigate to "Status" -> "Targets". You should see your couchbase job with your nodes listed, and their state should be "UP".
Setting Up Grafana Dashboards
With Prometheus scraping Couchbase, it’s time for visualization.
-
Add Prometheus as a Data Source in Grafana:
- Log in to your Grafana instance.
- Go to "Configuration" (gear icon) -> "Data Sources".
- Click "Add data source".
- Select "Prometheus".
- Set the URL to your Prometheus server (e.g.,
http://<prometheus_ip>:9090). - Click "Save & Test". You should see a "Data source is working" message.
-
Import a Couchbase Dashboard:
- Grafana has a rich community dashboard library. Search for "Couchbase" on the Grafana Dashboards site (grafana.com/grafana/dashboards/).
- A popular and comprehensive dashboard is ID
7588(often titled "Couchbase"). - In Grafana, go to "Dashboards" (four squares icon) -> "Browse".
- Click "Import".
- Enter the dashboard ID (
7588) and click "Load". - On the next screen, select your Prometheus data source from the dropdown.
- Click "Import".
You should now see a detailed dashboard showing metrics like:
- Cluster Health: Node status, memory usage, disk usage.
- Bucket Performance: Operations per second (get, set, delete), latency, cache hit rates.
- Query Performance: Slow queries, query execution times.
- Index Performance: Index build times, index memory usage.
The metrics exposed by Couchbase are quite granular. For instance, cb_kv_ops_total is a counter that increments for every Key-Value operation. Prometheus’s rate() function is essential here to see operations per second. A query like rate(cb_kv_ops_total{operation="get"}[5m]) will show you the average number of GET operations per second over the last 5 minutes.
What most people overlook is how to correlate specific bucket performance with cluster-wide resource utilization. Looking at cb_kv_cache_hit_ratio for a particular bucket alongside cb_memory_used_bytes for the entire node can quickly reveal if a memory bottleneck is impacting specific services.
Now you have a robust monitoring setup for your Couchbase cluster. The next step is to configure alerting based on these metrics.