etcd leases are the unsung heroes of ephemeral state management, ensuring that critical information doesn’t linger indefinitely, hogging resources or becoming stale.
Let’s see how this plays out with a simple example. Imagine we have a service that needs to register its presence with a short-lived key. If the service crashes or becomes unresponsive, we want that registration to automatically disappear.
# Start a local etcd instance
ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 member add my-member --peer-urls=http://127.0.0.1:2380
ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 member list
# In a separate terminal, start etcd with the cluster configuration
# (Assuming you have etcdctl and etcd binaries in your PATH)
# This command is a simplified example for demonstration; a real cluster setup is more involved.
# For a single node:
etcd --listen-client-urls http://127.0.0.1:2379 --advertise-client-urls http://127.0.0.1:2379 \
--listen-peer-urls http://127.0.0.1:2380 --initial-advertise-peer-urls http://127.0.0.1:2380 \
--name my-node --initial-cluster my-node=http://127.0.0.1:2380 \
--initial-cluster-token etcd-cluster-1 --initial-cluster-state new
# Now, let's create a lease with a 10-second TTL
ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 lease grant 10
The output will look something like this:
0a1b2c3d4e5f6789
This 0a1b2c3d4e5f6789 is your lease ID. Now, let’s associate a key with this lease:
# Put a key 'my-service/heartbeat' with value 'alive' attached to the lease
ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 put --lease=0a1b2c3d4e5f6789 my-service/heartbeat "alive"
You’ll see output confirming the transaction. If you check the key’s value:
ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 get my-service/heartbeat
You’ll see alive. Now, wait for more than 10 seconds.
# Wait for 12 seconds
sleep 12
ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 get my-service/heartbeat
The output will be empty, indicating the key has been automatically deleted. This is the core of lease management: the key is tied to the lease, and when the lease expires, the key goes with it.
The problem leases solve is managing the lifecycle of dynamic, potentially short-lived configuration or state. In distributed systems, services often need to announce their presence, provide health checks, or store temporary data. Without a mechanism like leases, you’d need to implement custom cleanup logic in every service, which is complex and error-prone. A crashed service would leave behind stale entries, leading to incorrect routing, wasted resources, or security vulnerabilities. Leases offload this cleanup to etcd itself.
Internally, etcd maintains a per-lease timer. When a lease is granted, etcd starts a countdown. For every second the lease is active, etcd decrements its remaining TTL. If the TTL reaches zero, etcd triggers a deletion event for all keys associated with that lease. Crucially, you can "renew" a lease before it expires, resetting its TTL. This is how a healthy service keeps its registration alive.
# Grant a new lease with a 30-second TTL
ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 lease grant 30
# Let's say the lease ID is 'abcdef1234567890'
# Put a key associated with this lease
ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 put --lease=abcdef1234567890 my-service/status "healthy"
# Now, renew the lease after 15 seconds
sleep 15
ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 lease renew abcdef1234567890
The lease renew command returns the new TTL. If successful, your key my-service/status will remain in etcd for another 30 seconds from the renewal time. This is the heartbeat mechanism in action.
The exact levers you control are the TTL when granting a lease and the lease ID when associating keys or renewing. You can also set a lease’s TTL to 0, which immediately revokes it and deletes all associated keys.
# Revoke a lease immediately
ETCDCTL_API=3 etcdctl --endpoints=http://127.0.0.1:2379 lease revoke abcdef1234567890
A common pattern is to use leases for distributed locks. A client trying to acquire a lock would:
- Grant a lease.
- Attempt to create a key (e.g.,
/locks/myresource) usingput --lease=<lease-id>. - If the
putsucceeds, the client holds the lock, and its lease keeps the key alive. - If the
putfails because the key already exists, the client needs to wait and retry. - If the client holding the lock crashes, the lease expires, the key is deleted, and the lock is automatically released.
The most surprising thing about leases is that etcd doesn’t just track time for each lease; it actually uses a time-series data structure internally to manage lease expirations efficiently. When a lease is granted, its expiration time is recorded, and etcd’s internal clock periodically checks for expired leases. This means that even with thousands of leases, the overhead of managing them remains relatively low because the check is based on absolute expiration times rather than constant per-lease polling.
The next thing you’ll run into is how to robustly implement distributed locking using these leases, particularly handling the race condition between checking for an existing lock and acquiring it.