The most surprising thing about etcd upgrades is that they are designed to be non-disruptive, even though etcd is the beating heart of Kubernetes.

Let’s watch etcd do its thing during an upgrade. Imagine we have a three-node etcd cluster, with members etcd-0, etcd-1, and etcd-2.

# On etcd-0:
ETCDCTL_API=3 etcdctl member list -w table

This command shows us the current state of our etcd cluster. We’ll see something like:

+------------------+---------+---------+----------------------------------------------------------+------------+-----------+
|        ID        | STATUS  |  NAME   |                         PEER URLS                        | CLIENT URLS| IS LEARNER|
+------------------+---------+---------+----------------------------------------------------------+------------+-----------+
| 3990636538471772 | started | etcd-0  | http://10.0.1.10:2380                                    | http://10.0.1.10:2379 |         false|
| 7341851309492402 | started | etcd-1  | http://10.0.1.11:2380                                    | http://10.0.1.11:2379 |         false|
| 9182924429416449 | started | etcd-2  | http://10.0.1.12:2380                                    | http://10.0.1.12:2379 |         false|
+------------------+---------+---------+----------------------------------------------------------+------------+-----------+

Now, we want to upgrade etcd from version 3.5.9 to 3.5.10. The key to a rolling upgrade is that etcd members can operate with different versions during the upgrade process, as long as the newer version can still communicate with the older version. This is achieved through a concept called "quorum" and "read-only" members.

Here’s how it works:

  1. Upgrade one member: We stop etcd-0, upgrade its binary to 3.5.10, and restart it. Crucially, etcd’s internal protocol is backward compatible. The new 3.5.10 etcd-0 can still talk to the older 3.5.9 members (etcd-1, etcd-2).
  2. Maintain quorum: As long as a majority of etcd members (the quorum) are running and can communicate, the cluster remains operational. With three members, two are needed for quorum. So, even with one member upgraded, the cluster is fine.
  3. Repeat: We then move to etcd-1, upgrade it, and restart. Now, two members are on 3.5.10, and one is on 3.5.9. The cluster still functions.
  4. Final upgrade: Finally, we upgrade etcd-2. Once all members are on 3.5.10, the cluster is fully upgraded.

The actual process involves stopping the etcd service, replacing the binary, and starting it again. For example, on a system using systemd:

# On etcd-0 (example):
sudo systemctl stop etcd
# Replace the etcd binary with the new version (e.g., copy from /usr/local/bin/etcd-new to /usr/local/bin/etcd)
sudo systemctl start etcd

The critical part is ensuring your etcd configuration (usually in /etc/etcd/etcd.conf.yml or passed via command-line flags) remains consistent across all members, especially the --initial-cluster and --listen-peer-urls settings.

The etcdctl command-line tool also needs to be upgraded to match the cluster version. If you try to use an older etcdctl with a newer cluster, you might encounter issues. So, after upgrading the etcd binaries, you’d also upgrade etcdctl.

# On your workstation or a control plane node:
ETCDCTL_API=3 etcdctl version

This command will show the version of etcdctl you’re using. You’ll want this to match the upgraded etcd cluster version.

The levers you control are primarily the etcd binary versions on each node and ensuring the peer URLs are correctly configured for discovery. The internal etcd protocol handles the rest, allowing for seamless transitions as long as quorum is maintained.

The mental model here is one of graceful degradation and forward/backward compatibility. etcd is designed so that a minority of nodes can be temporarily out of sync or even offline without affecting the cluster’s ability to serve reads and writes, provided the majority remains available.

A common pitfall is forgetting to upgrade etcdctl itself. If you try to manage an upgraded etcd cluster with an older etcdctl, you might see errors like etcdserver: unsupported command version.

The next concept you’ll likely encounter is upgrading the Kubernetes control plane components that rely on etcd, such as the API server, controller manager, and scheduler.

Want structured learning?

Take the full Etcd course →