Rate-Limit Traffic with Envoy and the Rate Limit Service (2026)

Envoy’s rate limiting isn’t about dropping packets; it’s about telling upstream services "hold on a sec" before they get overwhelmed.

Let’s see it in action. Imagine a simple API service behind Envoy. We want to ensure no single client IP can hit it more than 10 times per second.

# envoy.yaml
static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address: { address: 0.0.0.0, port_value: 10000 }
    filter_chains:
    - filters:
      - name: envoy.filters.http.router
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
      - name: envoy.filters.http.local_ratelimit
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
          stat_prefix: local_ratelimit
          token_bucket:
            max_tokens: 10
            tokens_per_fill: 10
            fill_interval: 1s
          runtime_key_prefix: local_rate_limit.
          domain: my_api.com
          failure_mode_weight:
            value: 100 # 100% chance of allowing requests if rate limit service is unavailable
  clusters:
  - name: my_api_cluster
    connect_timeout: 0.25s
    type: LOGICAL_DNS
    lb_policy: ROUND_ROBIN
    hosts:
    - socket_address: { address: 127.0.0.1, port_value: 8080 }

Here, envoy.filters.http.local_ratelimit is configured with a token_bucket that allows 10 tokens per second. Each incoming request consumes one token. If the bucket is empty, requests are denied with a 429 Too Many Requests. The domain field is crucial for matching this configuration to specific hostnames.

This local rate limiter is client-side. It doesn’t coordinate with anything else. It’s fast, but it doesn’t prevent a distributed denial-of-service attack from multiple Envoy instances. For global rate limiting, you need the Envoy Rate Limit Service (RLS).

The RLS works by having Envoy ask a central service, "Can this request proceed?" The RLS evaluates a set of rules defined in its configuration. If the rules allow it, RLS tells Envoy "yes," and Envoy forwards the request. If not, RLS tells Envoy "no," and Envoy returns a 429.

Here’s a snippet of how Envoy talks to RLS:

# envoy.yaml (additional config)
listeners:
- name: listener_0
  # ... other listener config ...
  filter_chains:
  - filters:
    - name: envoy.filters.http.ratelimit
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.filters.http.ratelimit.v3.RateLimit
        domain: my_api.com
        rate_limit_service:
          grpc_service:
            envoy_grpc:
              cluster_name: rate_limit_service
          timeout: 0.5s
        # ... other filters like router ...
clusters:
- name: rate_limit_service
  connect_timeout: 1s
  type: STRICT_DNS
  lb_policy: ROUND_ROBIN
  hosts:
  - socket_address: { address: 127.0.0.1, port_value: 8081 } # Assuming RLS runs on 8081

Envoy calls the RLS with a RateLimitRequest that includes the domain and a list of descriptors. Descriptors are key-value pairs that RLS uses to identify the "thing" being rate-limited. For example, [{key: "remote_address", value: "192.168.1.100"}] would mean rate-limit by IP address.

The RLS itself is a separate service. Its configuration defines how to interpret these descriptors and what limits to apply.

# rls.yaml
domain_manager:
  ratelimit_database:
    redis_database:
      grpc_service:
        envoy_grpc:
          cluster_name: redis
      # ... redis connection details ...
runtime_default_enabled: true
rate_limit_configs:
  - domain: my_api.com
    rate_limits:
    - actions:
      - remote_address: {} # Rate limit by client IP
      - request_headers:
          header_name: "x-api-key"
          descriptor_key: "api_key" # Rate limit by API key header
      - generic_key:
          descriptor_key: "route"
          value: "/products" # Rate limit specific routes
      limit:
        unit: SECOND
        requests_per_unit: 100

In this RLS config, requests to my_api.com are checked against limits. If the remote_address action is present, it uses the client’s IP. If an x-api-key header is present, it also uses that as part of the key. If a route descriptor is sent from Envoy, it uses that too. The requests_per_unit is the actual limit.

The most surprising thing is that RLS doesn’t just count requests; it uses a sliding window algorithm for its limits, but the underlying implementation for Redis often uses atomic operations on counters within specific time buckets. This means the "per second" limit is actually a series of discrete, small windows that are summed up, providing a more accurate, albeit slightly delayed, view of rate adherence than a simple fixed window.

When Envoy receives a RateLimitResponse from RLS, it checks the code field. OK means allow, OVER_LIMIT means deny (429). If RLS is unavailable, Envoy’s failure_mode (configured in Envoy’s ratelimit filter) determines whether to allow or deny.

Once you’ve got global rate limiting sorted, the next challenge is implementing dynamic rate limits based on user tier or service-level agreements, which often involves integrating with a user database or an external policy engine.