Distributed systems often use load balancing to distribute incoming traffic across multiple servers.

Let’s see how this plays out in practice with a simple HTTP server setup. Imagine we have three backend servers: 192.168.1.101, 192.168.1.102, and 192.168.1.103. A load balancer, say at 192.168.1.1, receives a stream of incoming requests from clients.

Here’s a simplified view of requests hitting the load balancer and being forwarded:

Client A -> 192.168.1.1 (request 1) -> 192.168.1.101 Client B -> 192.168.1.1 (request 2) -> 192.168.1.102 Client C -> 192.168.1.1 (request 3) -> 192.168.1.103 Client D -> 192.168.1.1 (request 4) -> 192.168.1.101 Client E -> 192.168.1.1 (request 5) -> 192.168.1.102

This is a typical scenario for Round-Robin, the simplest load balancing algorithm. Each incoming request is assigned to the next server in a circular list. It’s straightforward, guarantees even distribution over time, and requires no state beyond the list of available servers.

However, Round-Robin breaks down when individual client connections need to be sticky, meaning all requests from a single client must go to the same server. This is crucial for applications that maintain session state on the server, like shopping carts or user authentication. If a client’s requests are split across servers, their session data might be lost, leading to a broken experience.

This is where algorithms like Least Connections come into play. Instead of just cycling through servers, it sends the next request to the server with the fewest active connections.

Client A -> 192.168.1.1 (req 1) -> 192.168.1.101 (1 conn) Client B -> 192.168.1.1 (req 2) -> 192.168.1.102 (1 conn) Client A -> 192.168.1.1 (req 3) -> 192.168.1.101 (2 conns) Client C -> 192.168.1.1 (req 4) -> 192.168.1.103 (1 conn) Client B -> 192.168.1.1 (req 5) -> 192.168.1.102 (2 conns) Client A -> 192.168.1.1 (req 6) -> 192.168.1.103 (2 conns)

In this scenario, assuming server 192.168.1.101 becomes overloaded, the load balancer would start directing more traffic to 192.168.1.102 and 192.168.1.103. This is better for distributing load based on current activity, but it still doesn’t solve the session stickiness problem directly if the load balancer itself is being scaled or restarted.

The real challenge emerges when the backend server pool changes. If 192.168.1.102 goes down, or if we add 192.168.1.104, a simple Round-Robin or Least Connections algorithm would re-distribute all traffic. For a system with sticky sessions, this means every active client connection would be broken and potentially need to re-authenticate or lose their session state.

This is precisely the problem Consistent Hashing aims to solve. Instead of a linear list, Consistent Hashing maps both servers and requests onto a conceptual ring. When a server is added or removed, only a small fraction of keys (and thus, client connections) need to be remapped.

Imagine our servers and requests as points on a circle.

      ^ 0/360
     / \
    /   \
 90 <-----> 270
    \   /
     \ /
      v 180

Let’s say we hash our server IPs and client request identifiers (e.g., user IDs) to positions on this ring. A request is routed to the next server clockwise on the ring from its hashed position.

If 192.168.1.101 is at 30, 192.168.1.102 at 120, and 192.168.1.103 at 240. A request for User ID XYZ hashes to 75. It goes to 192.168.1.102 (the next server clockwise from 75). Another request for User ID ABC hashes to 200. It goes to 192.168.1.103 (the next server clockwise from 200).

Now, if 192.168.1.102 goes down, only the requests that would have gone to 192.168.1.102 (i.e., those hashing between 30 and 120) are now remapped. They will be directed to 192.168.1.103 (the next server clockwise from 120). Requests that were already going to 192.168.1.101 or 192.168.1.103 are unaffected. This minimizes disruption.

To improve distribution further, especially with a small number of servers, Consistent Hashing often uses virtual nodes. Instead of placing each physical server once on the ring, we place it multiple times. For example, 192.168.1.101 might have virtual nodes at 30, 150, and 270. This makes the distribution of keys more uniform across the physical servers, even when servers are added or removed.

If you’re implementing this, you’ll likely use libraries that handle the hashing and ring management. For instance, in Python, you might use uhashring. The core idea is that the hash function itself is applied to both server identifiers and request identifiers, and the positions on the ring are determined by the output of this hash function. The "consistent" part comes from the fact that if you add or remove just one server, the minimum number of keys that need to change their assigned server is guaranteed.

When adding a new server, say 192.168.1.104 at hash position 180, only requests that hash between 120 and 180 will be affected, and they will now go to 192.168.1.104 instead of 192.168.1.103. This is a massive improvement over remapping almost everything.

Many modern load balancers (like Nginx with its hash directive or cloud provider load balancers) allow you to configure consistent hashing, often using the client’s IP address or a specific request header as the key to hash.

The most common pitfall with consistent hashing is not using enough virtual nodes. If you have very few virtual nodes per physical server, you might still see uneven distribution when servers are added or removed, or even under normal operation, because the hash space isn’t granularly mapped.

The next problem you’ll run into is managing the health of these servers and ensuring the load balancer only uses healthy instances.

Want structured learning?

Take the full Distributed Systems course →