CouchDB’s distributed nature means you can scale it out, but getting traffic to all those nodes efficiently is where load balancing comes in.

Here’s CouchDB running, serving requests from a cluster of three nodes:

# On node 1
curl http://localhost:5984/
{
  "couchdb": "Welcome",
  "version": "3.3.1",
  "git_sha": "...",
  "uuid": "...",
  "features": [...]
}

Now, let’s set up HAProxy to distribute requests. We’ll treat CouchDB nodes as backend servers.

HAProxy Configuration

Create a file named haproxy.cfg:

global
    log /dev/log    local0
    log /dev/log    local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
    stats timeout 30s
    user haproxy
    group haproxy
    daemon

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    timeout connect 5000ms
    timeout client  50000ms
    timeout server  50000ms
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 408 /etc/haproxy/errors/408.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http
    errorfile 504 /etc/haproxy/errors/504.http

frontend couchdb_frontend
    bind *:5984
    default_backend couchdb_backend

backend couchdb_backend
    balance roundrobin
    option httpchk GET / HTTP/1.1\r\nHost:\ localhost
    server couchdb1 192.168.1.101:5984 check
    server couchdb2 192.168.1.102:5984 check
    server couchdb3 192.168.1.103:5984 check

Explanation:

  • global and defaults: Standard HAProxy setup for logging, user, timeouts, and error pages.
  • frontend couchdb_frontend: Listens on port 5984 for incoming CouchDB traffic.
  • backend couchdb_backend: Defines the pool of CouchDB servers.
    • balance roundrobin: Distributes requests sequentially to each server.
    • option httpchk GET / HTTP/1.1\r\nHost:\ localhost: Configures HAProxy to send a GET / request to each backend server to check its health. CouchDB’s root endpoint returns a 200 OK if healthy.
    • server couchdbX IP:PORT check: Defines each CouchDB node as a server and enables health checks.

Start HAProxy:

sudo systemctl start haproxy

Now, direct your traffic through HAProxy:

curl http://localhost:5984/

You should see the same CouchDB welcome message. HAProxy is now your single entry point, intelligently routing requests to your CouchDB cluster.

The balance roundrobin is a good starting point, but CouchDB’s own internal replication and sharding mechanisms also play a crucial role in how data is distributed and accessed. HAProxy primarily handles the initial connection distribution. For more advanced scenarios, you might explore leastconn or even custom health checks if CouchDB’s basic health endpoint isn’t sufficient.

The actual magic behind CouchDB’s distributed queries involves its replication protocol and how it manages partitions across nodes. When a query hits a node, that node might need to communicate with other nodes in the cluster to gather all the necessary data. HAProxy simply ensures that no single CouchDB node becomes a bottleneck for incoming requests.

The most surprising thing about using HAProxy with CouchDB is how little configuration is actually needed on the CouchDB side. CouchDB is designed to be cluster-aware, so as long as your nodes can communicate with each other, HAProxy just needs to know their addresses. The httpchk is key here; it leverages CouchDB’s own basic HTTP interface to determine if a node is alive and responding.

Consider a scenario where you have a very active _all_docs query on a large database. If that query lands on a node that isn’t the "leader" for the partition being queried, that node will fetch the data from the appropriate node(s) before returning it to the client. HAProxy’s job is to ensure that the initial request has the best chance of reaching a responsive node, but the underlying CouchDB cluster handles the data aggregation.

The next step for many users is implementing sticky sessions, especially if they are using CouchDB’s _session API for authentication. While CouchDB itself is stateless in many regards, the session API creates a client-server affinity that you might want to preserve across HAProxy.

Want structured learning?

Take the full Couchdb course →