etcd’s distributed mutexes allow you to build robust distributed locking directly into your application logic, bypassing the need for external coordination services for simple locking needs.

Let’s see this in action. Imagine two application instances, app-1 and app-2, both trying to acquire a lock for a critical resource, say, updating a shared configuration file.

package main

import (
	"context"
	"fmt"
	"log"
	"time"

	"go.etcd.io/etcd/client/v3"
)

func main() {
	cli, err := clientv3.New(clientv3.Config{
		Endpoints: []string{"localhost:2379"}, // Your etcd endpoint
	})
	if err != nil {
		log.Fatal(err)
	}
	defer cli.Close()

	mutex := clientv3.NewMutex(cli, "/my-app/locks/config-update")

	ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
	defer cancel()

	fmt.Println("Attempting to acquire lock...")
	if err := mutex.Lock(ctx); err != nil {
		log.Fatalf("Failed to acquire lock: %v", err)
	}
	fmt.Println("Lock acquired!")

	// --- Critical Section ---
	fmt.Println("Performing critical operation...")
	time.Sleep(5 * time.Second) // Simulate work
	fmt.Println("Critical operation finished.")
	// --- End Critical Section ---

	if err := mutex.Unlock(ctx); err != nil {
		log.Fatalf("Failed to release lock: %v", err)
	}
	fmt.Println("Lock released.")
}

When app-1 runs, it successfully acquires the lock. If app-2 runs concurrently, its mutex.Lock(ctx) call will block until app-1 releases the lock. The clientv3.NewMutex function creates a mutex object associated with a specific etcd key (/my-app/locks/config-update in this case). The Lock method attempts to create a lease and then an ephemeral key under that lease. If the key already exists (meaning another client holds the lock), Lock will block, waiting for the lease to expire or the key to be deleted. Unlock simply revokes the associated lease, causing the ephemeral key to be deleted and freeing the lock.

The primary problem solved by distributed locks is preventing race conditions in distributed systems. When multiple processes or threads need to access a shared resource (like a database record, a file, or a critical code section), a distributed lock ensures that only one process can access it at a time, preventing data corruption or inconsistent states. The core mechanism leverages etcd’s leases and ephemeral keys. When a client acquires a lock, it creates an ephemeral key associated with a lease. If the client crashes or disconnects, the lease expires, and etcd automatically deletes the ephemeral key, releasing the lock. This provides a crucial safety net against deadlocks caused by client failures.

When you call mutex.Lock(ctx), etcd performs a series of operations. First, it tries to create a lease with a specific TTL (Time To Live), say 30 seconds. Then, it attempts to create an ephemeral key (e.g., /my-app/locks/config-update/lock-<lease-id>) associated with that lease. If this key already exists, it means another client holds the lock. In this scenario, your client will wait. It does this by setting up a watch on the prefix /my-app/locks/config-update/. When the existing lock key is deleted (either by the owner unlocking or the lease expiring), your client is notified. It then immediately tries to acquire the lock again by creating its own ephemeral key. The first client to successfully create its ephemeral key wins the lock. The Unlock operation is straightforward: it revokes the lease associated with the lock key, which in turn deletes the ephemeral key.

The context’s Timeout or CancelFunc is critical. If the lock acquisition takes longer than the context’s deadline, the Lock call will return an error, preventing your application from blocking indefinitely. This is essential for maintaining responsiveness. The clientv3.NewMutex constructor takes a clientv3.Client and the lock name (which becomes an etcd key prefix). The Lock and Unlock methods operate on this mutex object.

A common pitfall is not setting a reasonable TTL for the lease when acquiring the lock. If your critical section takes longer than the lease TTL, the lock will be automatically released by etcd, and another client might acquire it, leading to a race condition. Conversely, setting the TTL too high can lead to long-lived locks that might not be released promptly if the owning client crashes unexpectedly but the lease is still active.

You can also use TryLock to attempt to acquire the lock without blocking. If the lock is already held, TryLock returns immediately with an error (usually clientv3.ErrLeaseExpired or a context error if the context is done), allowing your application to take an alternative action instead of waiting.

The underlying mechanism uses etcd’s atomic compare-and-swap (CAS) operations implicitly. When a client tries to acquire a lock, it’s essentially trying to create a key that doesn’t exist under a specific prefix, and if it succeeds, it’s the lock holder. The watch mechanism ensures that clients waiting for the lock are efficiently notified when it becomes available, minimizing the polling overhead.

The next step is understanding how to handle lock renewal in long-running critical sections.

Want structured learning?

Take the full Etcd course →