CouchDB’s distributed nature means you can scale it out, but getting traffic to all those nodes efficiently is where load balancing comes in.
Here’s CouchDB running, serving requests from a cluster of three nodes:
# On node 1
curl http://localhost:5984/
{
"couchdb": "Welcome",
"version": "3.3.1",
"git_sha": "...",
"uuid": "...",
"features": [...]
}
Now, let’s set up HAProxy to distribute requests. We’ll treat CouchDB nodes as backend servers.
HAProxy Configuration
Create a file named haproxy.cfg:
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
stats timeout 30s
user haproxy
group haproxy
daemon
defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
frontend couchdb_frontend
bind *:5984
default_backend couchdb_backend
backend couchdb_backend
balance roundrobin
option httpchk GET / HTTP/1.1\r\nHost:\ localhost
server couchdb1 192.168.1.101:5984 check
server couchdb2 192.168.1.102:5984 check
server couchdb3 192.168.1.103:5984 check
Explanation:
globalanddefaults: Standard HAProxy setup for logging, user, timeouts, and error pages.frontend couchdb_frontend: Listens on port5984for incoming CouchDB traffic.backend couchdb_backend: Defines the pool of CouchDB servers.balance roundrobin: Distributes requests sequentially to each server.option httpchk GET / HTTP/1.1\r\nHost:\ localhost: Configures HAProxy to send aGET /request to each backend server to check its health. CouchDB’s root endpoint returns a200 OKif healthy.server couchdbX IP:PORT check: Defines each CouchDB node as a server and enables health checks.
Start HAProxy:
sudo systemctl start haproxy
Now, direct your traffic through HAProxy:
curl http://localhost:5984/
You should see the same CouchDB welcome message. HAProxy is now your single entry point, intelligently routing requests to your CouchDB cluster.
The balance roundrobin is a good starting point, but CouchDB’s own internal replication and sharding mechanisms also play a crucial role in how data is distributed and accessed. HAProxy primarily handles the initial connection distribution. For more advanced scenarios, you might explore leastconn or even custom health checks if CouchDB’s basic health endpoint isn’t sufficient.
The actual magic behind CouchDB’s distributed queries involves its replication protocol and how it manages partitions across nodes. When a query hits a node, that node might need to communicate with other nodes in the cluster to gather all the necessary data. HAProxy simply ensures that no single CouchDB node becomes a bottleneck for incoming requests.
The most surprising thing about using HAProxy with CouchDB is how little configuration is actually needed on the CouchDB side. CouchDB is designed to be cluster-aware, so as long as your nodes can communicate with each other, HAProxy just needs to know their addresses. The httpchk is key here; it leverages CouchDB’s own basic HTTP interface to determine if a node is alive and responding.
Consider a scenario where you have a very active _all_docs query on a large database. If that query lands on a node that isn’t the "leader" for the partition being queried, that node will fetch the data from the appropriate node(s) before returning it to the client. HAProxy’s job is to ensure that the initial request has the best chance of reaching a responsive node, but the underlying CouchDB cluster handles the data aggregation.
The next step for many users is implementing sticky sessions, especially if they are using CouchDB’s _session API for authentication. While CouchDB itself is stateless in many regards, the session API creates a client-server affinity that you might want to preserve across HAProxy.