Scrape Envoy Metrics with Prometheus (2026)

Envoy’s metrics aren’t just about counting requests; they’re a real-time, granular view into the distributed system’s behavior, and Prometheus is how you make sense of that firehose.

Let’s see Envoy’s metrics in action. Imagine a simple Envoy configuration serving a backend:

static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 10000
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              routes:
              - match:
                  prefix: "/"
                route:
                  cluster: some_service
          http_filters:
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
  clusters:
  - name: some_service
    connect_timeout: 0.25s
    type: LOGICAL_DNS
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: some_service
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 8080

Now, if we point Prometheus at Envoy’s admin interface on port 9901 (the default for /stats/prometheus), we can start scraping. The stat_prefix in the HttpConnectionManager configuration, ingress_http in this case, will prefix most of the generated metrics. So, you’d see metrics like ingress_http.downstream_cx_total, ingress_http.upstream_cx_rx_bytes_total, and ingress_http.rq_total.

The core problem Envoy’s metrics solve is providing visibility into network traffic and request handling without requiring changes to the upstream services themselves. Envoy acts as a transparent proxy, and its metrics expose what it’s seeing: connection attempts, successful connections, request counts, response codes, latency, and much more, all categorized by listener, cluster, route, and endpoint. This allows you to monitor the health and performance of your service mesh or API gateway layer independently.

Internally, Envoy maintains counters, gauges, and histograms for various events. When Prometheus scrapes the /stats/prometheus endpoint, Envoy serializes these internal metrics into the Prometheus exposition format. For example, a histogram for request duration might look like this:

ingress_http.request_duration_ms_bucket{le="100",} 150
ingress_http.request_duration_ms_bucket{le="200",} 200
ingress_http.request_duration_ms_bucket{le="+Inf",} 250
ingress_http.request_duration_ms_count 250
ingress_http.request_duration_ms_sum 12345.67

This tells you that 150 requests took less than 100ms, 200 requests took less than 200ms, and so on. The _count is the total number of requests observed, and _sum is the total duration for all requests, allowing Prometheus to calculate the average.

The key levers you control are primarily within the Envoy configuration, specifically the stat_prefix and the enablement of envoy.extensions.stat_sinks.prometheus.v3.Prometheus if you’re not using the built-in HTTP admin endpoint. You can also configure what metrics are exported and their granularity. For instance, enabling detailed tracing can generate more granular metrics, but also increase the load on Envoy.

A common point of confusion is understanding the difference between listener-level metrics and cluster-level metrics. Listener metrics, like downstream_cx_total, count connections to a specific listener. Cluster metrics, like upstream_cx_total, count connections from Envoy to a specific upstream cluster. This distinction is crucial for debugging network paths.

The request_denied_total metric, often found with a reason label, is incredibly useful for understanding why requests are being dropped before they even reach an upstream service. This could be due to rate limiting, circuit breakers, or authentication failures configured within Envoy itself. Monitoring this metric helps pinpoint issues within Envoy’s policy enforcement.

The next step is to explore how to use these metrics to build sophisticated alerting and dashboards in Grafana, leveraging PromQL queries to aggregate and analyze the vast amount of data Envoy exposes.