A distributed lock is essentially a mechanism that ensures only one process or thread across multiple machines can access a shared resource at any given time, preventing race conditions and data corruption. The real magic, and the terrifying part, is how these systems stay consistent when network partitions happen.

Imagine you have a critical microservice that updates a shared database table. Without a distributed lock, if two instances of this service receive requests simultaneously, they might both try to update the same row, leading to data loss or inconsistencies. A distributed lock solves this by having a single, agreed-upon "owner" of the lock that is allowed to perform the operation.

Here’s a simplified scenario in Python using redis-py and the redis-py-cluster client, assuming you have a Redis cluster set up. We’ll simulate acquiring a lock, performing an operation, and then releasing it.

import redis
import time
import uuid

# Assuming your Redis cluster is running on these nodes
# In a real scenario, you'd use redis_cluster.RedisCluster()
# This is a simplified example for a single Redis instance for clarity
r = redis.Redis(host='localhost', port=6379, db=0)

RESOURCE_KEY = "my_shared_resource"
LOCK_TIMEOUT = 10  # Lock expires after 10 seconds
LOCK_ID = str(uuid.uuid4()) # Unique identifier for this lock holder

def acquire_lock(resource_key, lock_id, timeout):
    """Attempts to acquire a distributed lock."""
    # NX: Only set the key if it does not already exist.
    # PX: Set the specified expire time, in milliseconds.
    return r.set(resource_key, lock_id, nx=True, px=timeout * 1000)

def release_lock(resource_key, lock_id):
    """Releases a distributed lock if we are the owner."""
    # Use a Lua script for atomic check-and-delete
    script = """
    if redis.call("get", KEYS[1]) == ARGV[1]
    then
        return redis.call("del", KEYS[1])
    else
        return 0
    end
    """
    return r.eval(script, 1, resource_key, lock_id)

def critical_operation():
    """The operation that needs to be protected by the lock."""
    print(f"[{LOCK_ID}] Performing critical operation...")
    time.sleep(5) # Simulate work
    print(f"[{LOCK_ID}] Critical operation finished.")

if __name__ == "__main__":
    print(f"[{LOCK_ID}] Attempting to acquire lock for {RESOURCE_KEY}...")
    if acquire_lock(RESOURCE_KEY, LOCK_ID, LOCK_TIMEOUT):
        print(f"[{LOCK_ID}] Lock acquired successfully!")
        try:
            critical_operation()
        finally:
            print(f"[{LOCK_ID}] Releasing lock...")
            if release_lock(RESOURCE_KEY, LOCK_ID):
                print(f"[{LOCK_ID}] Lock released successfully.")
            else:
                print(f"[{LOCK_ID}] Failed to release lock (might have expired or been taken by another).")
    else:
        print(f"[{LOCK_ID}] Could not acquire lock. Another process is likely holding it.")

This code demonstrates the basic acquire/release pattern. The set command with NX (Not Exists) and PX (milliseconds expiry) is the core of acquiring the lock. NX ensures only one client can set the key, and PX prevents deadlocks by automatically releasing the lock if the client holding it crashes. The release_lock uses a Lua script to atomically check if the current client is indeed the owner (by comparing LOCK_ID) before deleting the key, preventing a scenario where a client re-acquires a lock, and then a stale release_lock command deletes the new owner’s lock.

The fundamental problem distributed locks solve is mutual exclusion in a distributed environment. Without them, concurrent access to shared resources across multiple nodes would lead to unpredictable states. The underlying mechanism, often Redis, provides the atomic operations necessary for this.

Here’s how it works internally:

  1. Acquisition: A client wanting to access a resource tries to SET a specific key (e.g., lock:my_resource) to a unique value (its client ID) only if the key does not already exist (NX). It also sets an expiry time (PX in milliseconds). If the SET command returns OK, the client has acquired the lock. If it returns nil (or false), another client holds the lock.
  2. Operation: The client holding the lock performs its critical operation on the shared resource.
  3. Release: The client holding the lock attempts to DEL the key. However, this must be done carefully. If the client’s operation took longer than the lock’s expiry time, the lock might have expired and been acquired by another client. To prevent accidental deletion of another client’s lock, the release operation must atomically check if the key’s value still matches the client’s unique ID before deleting it. This is typically done with a Lua script executed on the Redis server.

The real challenge, and where "split-brain" becomes a concern, is when the network between nodes (or between clients and the lock service) becomes unreliable. If a client acquires a lock, but then its connection to the lock service is severed before it can release the lock, or even before it finishes its operation, that lock will remain held until it expires. If the client thinks it lost the lock but hasn’t, and another client acquires it, you can have two clients acting as if they own the lock.

The most surprising true thing about distributed locks is that even with systems like Redis, achieving perfect safety against split-brain scenarios without introducing significant latency or complexity is incredibly difficult, and most simple implementations are actually unsafe under certain network conditions. The common "set key NX PX" pattern is vulnerable.

Consider this scenario:

  1. Client A acquires lock my_lock with a 10-second TTL.
  2. Client A’s operation takes 12 seconds.
  3. At second 10, the lock expires in Redis.
  4. At second 11, Client B acquires lock my_lock.
  5. At second 12, Client A finishes its operation and proceeds to release the lock. If Client A simply calls DEL my_lock, it will delete Client B’s lock, causing a data corruption. This is why the atomic Lua script for release is crucial.

However, even with the atomic release, what if the network partition happens after Client A successfully acquires the lock but before its operation is fully committed, and the lock expires? Another client might then acquire the lock. The core problem is ensuring that a lock holder is actually the only one able to proceed with the critical operation.

The Redlock algorithm, proposed by Salvatore Fanciu, is a more robust approach that uses multiple independent Redis instances to mitigate single-point-of-failure issues and improve safety against network partitions. It involves acquiring locks on a majority of N Redis instances. However, Redlock itself has been subject to significant debate regarding its safety guarantees.

A more practical and widely adopted approach for safer distributed locking often involves a dedicated distributed consensus service like ZooKeeper or etcd, which are designed from the ground up to handle network partitions and provide strong consistency guarantees. These systems use algorithms like Paxos or Raft.

For instance, using Apache ZooKeeper, you’d create an ephemeral, sequential node under a designated lock path. The client that creates the node with the lowest sequence number among all active clients acquires the lock. If the client disconnects, its ephemeral node is automatically removed, releasing the lock.

// Example using Apache ZooKeeper (conceptual Java)
import org.apache.zookeeper.*;
import java.io.IOException;
import java.util.List;
import java.util.Collections;
import java.util.concurrent.CountDownLatch;

public class ZkDistributedLock {
    private ZooKeeper zk;
    private String lockPath = "/my_lock";
    private String nodeCreated;
    private final CountDownLatch connectedSignal = new CountDownLatch(1);

    public ZkDistributedLock(String connectString) throws IOException, InterruptedException {
        zk = new ZooKeeper(connectString, 5000, new Watcher() {
            public void process(WatchedEvent event) {
                if (event.getState() == Watcher.Event.KeeperState.SyncConnected) {
                    connectedSignal.countDown();
                }
            }
        });
        connectedSignal.await();
        // Ensure the lock path exists
        if (zk.exists(lockPath, false) == null) {
            zk.create(lockPath, new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
        }
    }

    public void acquireLock() throws KeeperException, InterruptedException {
        // Create an ephemeral, sequential node
        nodeCreated = zk.create(lockPath + "/lock-", new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL);
        String[] split = nodeCreated.split("/");
        String selfNode = split[split.length - 1];

        while (true) {
            List<String> children = zk.getChildren(lockPath, false);
            Collections.sort(children); // Lexicographical sort is sufficient for sequential nodes

            if (selfNode.equals(children.get(0))) {
                // We are the lowest sequential node, we got the lock
                System.out.println("Lock acquired: " + nodeCreated);
                return;
            } else {
                // Find the node immediately preceding ours
                String predecessor = null;
                for (String child : children) {
                    if (child.compareTo(selfNode) < 0) {
                        predecessor = child;
                    } else {
                        break;
                    }
                }
                // Watch the predecessor node for deletion
                final String watchNode = lockPath + "/" + predecessor;
                final CountDownLatch latch = new CountDownLatch(1);
                zk.exists(watchNode, new Watcher() {
                    public void process(WatchedEvent event) {
                        if (event.getType() == Watcher.Event.EventType.NodeDeleted) {
                            latch.countDown();
                        }
                    }
                });
                latch.await(); // Wait until the predecessor is deleted
            }
        }
    }

    public void releaseLock() throws KeeperException, InterruptedException {
        if (nodeCreated != null) {
            zk.delete(nodeCreated, -1);
            System.out.println("Lock released: " + nodeCreated);
            nodeCreated = null;
        }
    }

    public void close() throws InterruptedException {
        zk.close();
    }

    public static void main(String[] args) throws Exception {
        ZkDistributedLock lock = new ZkDistributedLock("localhost:2181");
        try {
            lock.acquireLock();
            // --- Critical Section ---
            System.out.println("Performing critical operation...");
            Thread.sleep(5000); // Simulate work
            System.out.println("Critical operation finished.");
            // --- End Critical Section ---
        } finally {
            lock.releaseLock();
            lock.close();
        }
    }
}

The most critical aspect of ZooKeeper’s approach is the use of ephemeral nodes and sequential node creation. When a client creates an ephemeral node, ZooKeeper guarantees that if the client’s session expires or the connection is lost, the node will be automatically deleted. By creating sequential nodes and having clients watch the node immediately preceding theirs, they only proceed when the preceding node is deleted. This mechanism inherently handles network partitions because if a client holding a lock goes down, its ephemeral node is cleaned up, allowing the next in line to acquire the lock.

The one thing that most people don’t realize is that the "safety" of a distributed lock often comes down to the guarantees of the underlying coordination service, not just the lock implementation itself. Simple Redis locks, while easy to implement, are generally considered unsafe for critical operations where absolute correctness is paramount and network partitions are a real possibility.

The next concept you’ll likely grapple with is how to handle lock renewal and re-entrancy in distributed systems, especially when operations might exceed the initial lock timeout.

Want structured learning?

Take the full Distributed Systems course →