An external etcd cluster for Kubernetes isn’t just a backup; it’s the primary data store for your entire cluster, and running it externally offers a more resilient foundation than the default embedded setup.
Let’s see what that looks like in practice. Imagine a simple Kubernetes cluster. The API server, controller manager, and scheduler all talk to etcd to read and write cluster state.
Here’s a snippet of what that communication looks like from the API server’s perspective when it fetches a Pod definition:
I0810 10:00:00.123456 1 etcd.go:123] GET /registry/pods/default/my-pod
I0810 10:00:00.130000 1 etcd.go:124] etcd: received response for GET /registry/pods/default/my-pod (200ms)
When you run etcd externally, you’re essentially creating a dedicated, highly available key-value store outside of your Kubernetes control plane nodes. This means etcd can be scaled, patched, and managed independently, without impacting your Kubernetes API access during maintenance or upgrades.
The core problem this solves is decoupling etcd’s lifecycle from Kubernetes. In a default setup, if etcd on a control plane node has issues, your entire cluster becomes read-only or completely unusable. With an external cluster, your Kubernetes nodes can still function and communicate with the etcd cluster, even if individual control plane nodes are down for maintenance or experiencing problems.
Here’s a typical diagram of how it’s structured:
+---------------------+ +---------------------+
| Kubernetes API Srv | | Kubernetes API Srv |
+---------------------+ +---------------------+
| |
| (etcd client traffic) |
v v
+---------------------+ +---------------------+ +---------------------+
| etcd Node 1 | <-> | etcd Node 2 | <-> | etcd Node 3 |
+---------------------+ +---------------------+ +---------------------+
Each etcd node is a full member of the cluster, participating in Raft consensus. This ensures that even if one etcd node fails, the remaining nodes can elect a new leader and continue serving requests.
To set this up, you’ll need to provision separate machines (virtual or physical) for your etcd cluster. A minimum of three nodes is recommended for high availability. On each node, you’ll install the etcd binary.
The critical part is configuring etcd for clustering. This involves setting up peer discovery and client endpoints. For a static cluster, you’d typically use a configuration like this on each etcd node:
# /etc/etcd/etcd.conf.yml
name: etcd-node-1
data-dir: /var/lib/etcd
listen-peer-urls: http://<node1-ip>:2380
listen-client-urls: http://<node1-ip>:2379,http://127.0.0.1:2379
advertise-client-urls: http://<node1-ip>:2379
initial-advertise-peer-urls: http://<node1-ip>:2380
initial-cluster: etcd-node-1=http://<node1-ip>:2380,etcd-node-2=http://<node2-ip>:2380,etcd-node-3=http://<node3-ip>:2380
initial-cluster-state: new
Replace <node1-ip>, <node2-ip>, and <node3-ip> with the actual IP addresses of your etcd nodes. The initial-cluster defines all members, and initial-cluster-state: new is used when bootstrapping the very first cluster. For subsequent additions, you’d use existing.
Once etcd is running and healthy, you’ll configure your Kubernetes API server to use this external cluster. This is done via the --etcd-servers flag. Instead of pointing to localhost:2379, it will point to the client endpoints of your external etcd cluster.
# Example kube-apiserver startup flag
--etcd-servers=http://<etcd-node1-ip>:2379,http://<etcd-node2-ip>:2379,http://<etcd-node3-ip>:2379
When you have an external etcd cluster, you’re not just replicating etcd; you’re also managing its TLS certificates for secure communication between etcd peers and between the Kubernetes API server and etcd. This involves generating CA certificates, server certificates for each etcd node, and client certificates for the API server. The etcdctl tool is invaluable here for checking cluster health:
ETCDCTL_API=3 etcdctl --endpoints=http://<etcd-node1-ip>:2379,http://<etcd-node2-ip>:2379,http://<etcd-node3-ip>:2379 --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --cacert=/etc/kubernetes/pki/etcd/ca.crt endpoint health
This command verifies that each etcd endpoint is reachable and healthy.
The most surprising truth about running an external etcd cluster is that your Kubernetes control plane nodes don’t need to run etcd at all. This is a significant shift from the default installation, where etcd is bundled. Removing etcd from control plane nodes reduces their resource footprint and attack surface, and crucially, prevents a compromised or misconfigured etcd on a control plane node from immediately taking down your entire cluster. It allows for independent scaling and patching of your etcd data layer, which is often a more frequent requirement than control plane node maintenance.
After successfully configuring your API server to point to the external etcd cluster and verifying its health, the next immediate challenge will be ensuring other control plane components like the controller manager and scheduler are also configured to communicate with the same external etcd endpoints, often via similar flags like --etcd-servers.