Envoy’s load balancing algorithms aren’t about picking the least busy upstream server, but about picking the server that will remain least busy for the longest time.
Let’s see how this plays out with actual traffic. Imagine we have a simple HTTP service, my-service, with three upstream instances: 10.0.0.1:8080, 10.0.0.2:8080, and 10.0.0.3:8080. We’re sending requests to Envoy, which is configured to load balance to these upstreams.
Here’s a snippet of Envoy’s configuration (in YAML) that sets up a basic cluster with round-robin load balancing:
static_resources:
clusters:
- name: my_service
connect_timeout: 0.25s
type: LOGICAL_DNS
lb_policy: ROUND_ROBIN
hosts:
- socket_address:
address: 10.0.0.1
port_value: 8080
- socket_address:
address: 10.0.0.2
port_value: 8080
- socket_address:
address: 10.0.0.3
port_value: 8080
When requests start flowing, Envoy, in ROUND_ROBIN mode, will send the first request to 10.0.0.1:8080, the second to 10.0.0.2:8080, and the third to 10.0.0.3:8080. The fourth request will go back to 10.0.0.1:8080, and so on. It’s a simple, predictable cycle.
Now, let’s switch to LEAST_REQUEST. We change lb_policy to LEAST_REQUEST. Envoy now needs to track the "load" on each upstream. By default, Envoy considers the load to be the number of currently active requests to an upstream host. If a request takes 100ms to process, it contributes to the "active request count" for that duration.
Consider this scenario:
- Request 1 arrives. Envoy picks
10.0.0.1:8080(all have 0 active requests).10.0.0.1now has 1 active request. - Request 2 arrives. Envoy picks
10.0.0.2:8080(as it has fewer active requests than10.0.0.1).10.0.0.2now has 1 active request. - Request 3 arrives. Envoy picks
10.0.0.3:8080.10.0.0.3now has 1 active request. - Request 4 arrives. All upstreams have 1 active request. Envoy falls back to round-robin to break the tie, sending it to
10.0.0.1:8080.10.0.0.1now has 2 active requests. - Request 5 arrives. Envoy sees
10.0.0.2and10.0.0.3have 1 active request, while10.0.0.1has 2. It picks10.0.0.2:8080.10.0.0.2now has 2 active requests.
This continues, aiming to distribute the concurrent load. However, LEAST_REQUEST has a nuance: it uses a probabilistic approach. Envoy doesn’t strictly pick the host with the absolute minimum active requests. Instead, it picks a random host and checks its load. If that host’s load is significantly higher than another, it might pick another host. The exact strategy is configurable via choice_count, which defaults to 2. This means Envoy picks 2 random hosts and sends the request to the one with the lower load. This prevents a single slow upstream from being starved indefinitely if its load metric never drops below others.
Finally, RING_HASH. This is for stateful services where you want requests for a specific piece of data to always go to the same upstream instance. Think user sessions or caching. We configure this by adding lb_policy: RING_HASH and specifying a hasher. A common hasher is HTTP_COOKIE.
static_resources:
clusters:
- name: my_service
connect_timeout: 0.25s
type: LOGICAL_DNS
lb_policy: RING_HASH
lb_config:
ring_hash_lb_config:
hash_function: SHA256
use_ketama: true # Important for consistent hashing across restarts/rebalances
hosts:
- socket_address:
address: 10.0.0.1
port_value: 8080
- socket_address:
address: 10.0.0.2
port_value: 8080
- socket_address:
address: 10.0.0.3
port_value: 8080
With lb_policy: RING_HASH, Envoy uses consistent hashing. Each upstream host is assigned multiple points on a virtual ring. When a request comes in, Envoy hashes a specific part of the request (e.g., a cookie value, or a header) to get a point on that same ring. The upstream host responsible for the first point encountered clockwise from the request’s hash point on the ring receives the request.
If you configure lb_config with http_hash_lb_config and use_request_headers to include a specific header, say x-user-id, then all requests with x-user-id: 123 will always go to the same upstream. If x-user-id: 456 arrives, it will also consistently go to its assigned upstream. This is crucial for maintaining state.
The use_ketama: true option is key here. It implements Ketama consistent hashing, which ensures that when an upstream host is added or removed, only a small fraction of keys (requests) are remapped to different hosts, minimizing disruption. Without it, adding or removing a host would cause almost all requests to be re-hashed.
The most surprising true thing about Envoy’s load balancing is that ROUND_ROBIN isn’t always fair when upstream hosts have different processing speeds or connection capacities. A host that is slower but receives requests less frequently in a round-robin fashion might still become overloaded compared to a faster host that is simply receiving its turn in the rotation.
This is why LEAST_REQUEST exists, but even then, its default "active requests" metric can be misleading if requests have vastly different durations. A single long-lived request can tie up an upstream’s "load" metric for a long time, even if other upstreams are more heavily utilized by short-lived requests. Envoy also supports WEIGHTED_ROUND_ROBIN and RING_HASH with custom hashing schemes (e.g., based on request path or headers) for more fine-grained control.
The next concept to explore is health checking and how Envoy uses it to dynamically adjust its load balancing decisions.