Envoy’s rate limiting isn’t about dropping packets; it’s about telling upstream services "hold on a sec" before they get overwhelmed.
Let’s see it in action. Imagine a simple API service behind Envoy. We want to ensure no single client IP can hit it more than 10 times per second.
# envoy.yaml
static_resources:
listeners:
- name: listener_0
address:
socket_address: { address: 0.0.0.0, port_value: 10000 }
filter_chains:
- filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
- name: envoy.filters.http.local_ratelimit
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
stat_prefix: local_ratelimit
token_bucket:
max_tokens: 10
tokens_per_fill: 10
fill_interval: 1s
runtime_key_prefix: local_rate_limit.
domain: my_api.com
failure_mode_weight:
value: 100 # 100% chance of allowing requests if rate limit service is unavailable
clusters:
- name: my_api_cluster
connect_timeout: 0.25s
type: LOGICAL_DNS
lb_policy: ROUND_ROBIN
hosts:
- socket_address: { address: 127.0.0.1, port_value: 8080 }
Here, envoy.filters.http.local_ratelimit is configured with a token_bucket that allows 10 tokens per second. Each incoming request consumes one token. If the bucket is empty, requests are denied with a 429 Too Many Requests. The domain field is crucial for matching this configuration to specific hostnames.
This local rate limiter is client-side. It doesn’t coordinate with anything else. It’s fast, but it doesn’t prevent a distributed denial-of-service attack from multiple Envoy instances. For global rate limiting, you need the Envoy Rate Limit Service (RLS).
The RLS works by having Envoy ask a central service, "Can this request proceed?" The RLS evaluates a set of rules defined in its configuration. If the rules allow it, RLS tells Envoy "yes," and Envoy forwards the request. If not, RLS tells Envoy "no," and Envoy returns a 429.
Here’s a snippet of how Envoy talks to RLS:
# envoy.yaml (additional config)
listeners:
- name: listener_0
# ... other listener config ...
filter_chains:
- filters:
- name: envoy.filters.http.ratelimit
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
domain: my_api.com
rate_limit_service:
grpc_service:
envoy_grpc:
cluster_name: rate_limit_service
timeout: 0.5s
# ... other filters like router ...
clusters:
- name: rate_limit_service
connect_timeout: 1s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
hosts:
- socket_address: { address: 127.0.0.1, port_value: 8081 } # Assuming RLS runs on 8081
Envoy calls the RLS with a RateLimitRequest that includes the domain and a list of descriptors. Descriptors are key-value pairs that RLS uses to identify the "thing" being rate-limited. For example, [{key: "remote_address", value: "192.168.1.100"}] would mean rate-limit by IP address.
The RLS itself is a separate service. Its configuration defines how to interpret these descriptors and what limits to apply.
# rls.yaml
domain_manager:
ratelimit_database:
redis_database:
grpc_service:
envoy_grpc:
cluster_name: redis
# ... redis connection details ...
runtime_default_enabled: true
rate_limit_configs:
- domain: my_api.com
rate_limits:
- actions:
- remote_address: {} # Rate limit by client IP
- request_headers:
header_name: "x-api-key"
descriptor_key: "api_key" # Rate limit by API key header
- generic_key:
descriptor_key: "route"
value: "/products" # Rate limit specific routes
limit:
unit: SECOND
requests_per_unit: 100
In this RLS config, requests to my_api.com are checked against limits. If the remote_address action is present, it uses the client’s IP. If an x-api-key header is present, it also uses that as part of the key. If a route descriptor is sent from Envoy, it uses that too. The requests_per_unit is the actual limit.
The most surprising thing is that RLS doesn’t just count requests; it uses a sliding window algorithm for its limits, but the underlying implementation for Redis often uses atomic operations on counters within specific time buckets. This means the "per second" limit is actually a series of discrete, small windows that are summed up, providing a more accurate, albeit slightly delayed, view of rate adherence than a simple fixed window.
When Envoy receives a RateLimitResponse from RLS, it checks the code field. OK means allow, OVER_LIMIT means deny (429). If RLS is unavailable, Envoy’s failure_mode (configured in Envoy’s ratelimit filter) determines whether to allow or deny.
Once you’ve got global rate limiting sorted, the next challenge is implementing dynamic rate limits based on user tier or service-level agreements, which often involves integrating with a user database or an external policy engine.