Inject Faults and Delays with Envoy Fault Injection Filter (2026)

Envoy’s fault injection filter doesn’t just simulate failures; it lets you architect for them, turning chaos into a predictable testing ground.

Let’s see it in action. Imagine a simple HTTP service, service-a, that calls service-b. We want to test how service-a handles slow responses from service-b.

Here’s a basic Envoy configuration for service-a’s listener:

static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 10000
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              routes:
              - match:
                  prefix: "/"
                route:
                  cluster: service_b_cluster
          http_filters:
          - name: envoy.filters.http.router
            typed_config: {}
  clusters:
  - name: service_b_cluster
    connect_timeout: 0.25s
    type: LOGICAL_DNS
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: service_b_cluster
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: service-b # Replace with actual service-b address
                port_value: 8080

Now, let’s add fault injection to simulate a 500ms delay for 10% of requests to service-b. We’ll insert the envoy.filters.http.fault filter before the router filter.

# ... (previous listener config) ...
          http_filters:
          - name: envoy.filters.http.fault
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.fault.v3.HTTPFault
              delay:
                percentage:
                  numerator: 10
                  denominator: 100
                fixed_delay: 500ms
              abort: # We'll use this later, but it's good to see it
                percentage:
                  numerator: 0
                  denominator: 100
                http_status: 503
          - name: envoy.filters.http.router
            typed_config: {}
# ... (cluster config remains the same) ...

With this configuration, if you send 100 requests to service-a (listening on port 10000), about 10 of them will experience a 500ms pause before service-a actually forwards them to service-b. The response time for those 10 requests will be roughly 500ms longer than usual.

This isn’t just about making things slow; it’s about understanding your system’s resilience. What happens if service-a’s timeout to service-b is 300ms? With the above config, those 10% of requests will now time out, and service-a might return a 503 error to its caller.

You can also inject HTTP errors. Let’s say we want to simulate service-b returning a 503 Service Unavailable error for 5% of requests. We’d modify the fault filter:

# ... (listener config) ...
          http_filters:
          - name: envoy.filters.http.fault
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.fault.v3.HTTPFault
              delay:
                percentage:
                  numerator: 0 # No delay for this example
                  denominator: 100
                fixed_delay: 0s
              abort:
                percentage:
                  numerator: 5
                  denominator: 100
                http_status: 503
          - name: envoy.filters.http.router
            typed_config: {}
# ... (cluster config remains the same) ...

Now, 5% of requests to service-a will result in Envoy returning a 503 response immediately, without even attempting to contact service-b. This is incredibly useful for testing how upstream services handle transient errors.

The fault filter is remarkably granular. You can target specific request headers, methods, or URIs to apply faults. For instance, to only inject delays for requests with a x-test-fault: delay header:

# ... (listener config) ...
          http_filters:
          - name: envoy.filters.http.fault
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.fault.v3.HTTPFault
              delay:
                percentage:
                  numerator: 100 # 100% of matching requests
                  denominator: 100
                fixed_delay: 500ms
                header_faults:
                - header:
                    name: "x-test-fault"
                    prefix: "delay" # Matches "x-test-fault: delay-something"
              abort:
                percentage:
                  numerator: 0
                  denominator: 100
                http_status: 503
          - name: envoy.filters.http.router
            typed_config: {}
# ... (cluster config remains the same) ...

This allows for very precise testing. You can spin up a few pods of service-b, route a specific test request through Envoy with the fault filter enabled, and observe the behavior of service-a under those exact conditions.

The HTTPFault configuration is a union of delay and abort actions. If both are configured, Envoy will first consider applying the delay. If the delay is not applied (either due to percentage or header matching failing), it will then consider applying the abort. If neither is applied, the request proceeds normally.

A key insight is that fault injection can be applied at any hop in your service mesh. You can configure it on the ingress gateway to simulate upstream service failures, or on a sidecar proxy to simulate failures of downstream services. This distributed nature allows you to test failure modes at various points in your request path.

When you configure a fixed_delay, Envoy doesn’t just sleep. It actually holds the connection open and waits for the specified duration before forwarding the request to the upstream cluster. This accurately mimics network latency or slow processing on the downstream service.

The most surprising thing about Envoy’s fault injection is how it can be used to test not just if your system fails, but how gracefully it fails. By injecting specific HTTP status codes (like 500, 503, or even 429), you can verify that your application’s retry logic, circuit breakers, and fallback mechanisms are functioning as intended. You’re not just breaking things; you’re building a more robust system by proactively understanding its breaking points.

Once you’ve mastered fault injection, the next logical step is to explore weighted clusters to gradually shift traffic to a new version of a service while simultaneously injecting faults into the older version.