Deploy Envoy Proxy to Production with Best Practices (2026)

Envoy Proxy, when deployed in production, isn’t just a network proxy; it’s the central nervous system for your service communication, offering observability and control that’s often more comprehensive than application-level logic.

Let’s see Envoy in action, managing traffic for a simple microservice architecture. Imagine two services: frontend and backend.

Basic Setup:

frontend service: Listens on localhost:8080.
backend service: Listens on localhost:9090.

We’ll deploy Envoy as a sidecar to the frontend service. The frontend will talk to Envoy on localhost:10000, and Envoy will route traffic to the backend service on localhost:9090.

Envoy Configuration (envoy.yaml):

static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 10000
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          route_config:
            name: local_route
            virtual_hosts:
            - name: backend_service
              domains: ["*"] # Matches any host header
              routes:
              - match:
                  prefix: "/"
                route:
                  cluster: backend_service
          http_filters:
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
  clusters:
  - name: backend_service
    type: STRICT_DNS
    connect_timeout: 0.25s
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: backend_service
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 9090

Running Envoy:

envoy -c envoy.yaml

Running the Backend Service:

# backend.py
from http.server import BaseHTTPRequestHandler, HTTPServer

class SimpleHTTPRequestHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        self.send_response(200)
        self.send_header("Content-type", "text/plain")
        self.end_headers()
        self.wfile.write(b"Hello from backend!")

if __name__ == "__main__":
    server = HTTPServer(('localhost', 9090), SimpleHTTPRequestHandler)
    print("Starting backend server on port 9090...")
    server.serve_forever()

python backend.py

Running the Frontend Application (modified to talk to Envoy):

# frontend.py
import requests

if __name__ == "__main__":
    try:
        response = requests.get("http://localhost:10000/") # Talking to Envoy
        print(f"Frontend received: {response.text}")
    except requests.exceptions.RequestException as e:
        print(f"Frontend failed to connect: {e}")

python frontend.py

Output from frontend.py:

Frontend received: Hello from backend!

Here, Envoy on localhost:10000 received the request from frontend.py and, based on the route_config, forwarded it to the backend_service cluster, which resolves to localhost:9090. The response from the backend is then relayed back to the frontend.

The Mental Model: Envoy as a Universal Translator and Traffic Cop

At its core, Envoy acts as a universal translator and traffic cop for network requests.

Translator: Envoy understands and speaks various network protocols (HTTP/1.1, HTTP/2, gRPC, TCP). Applications don’t need to be experts in all of them. Your application can speak HTTP/1.1 to Envoy, and Envoy can speak HTTP/2 or gRPC to another service. This decouples application logic from network protocol complexity.
Traffic Cop: Envoy sits at the edge of services (or between them) and directs traffic. It uses listeners to accept incoming connections, filters to inspect and modify requests/responses, and clusters to define upstream services. The route configuration is the set of rules that tells Envoy which cluster to send a request to based on its properties (e.g., host, path, headers).

Key Components in Action:

Listeners: These are the "doors" that Envoy opens. listener_0 on port 10000 is listening for incoming connections.
HTTP Connection Manager (HCM): This is a powerful filter that handles HTTP-specific logic. It parses HTTP requests, applies routing rules, and manages upstream connections. stat_prefix is used for generating statistics.
Route Configuration: This is the brain of the traffic cop. local_route defines how requests are directed.
- virtual_hosts: A collection of rules. We have one, backend_service, that matches any host ("*").
- routes: Specific rules within a virtual host. The rule here is simple: if the request prefix is "/" (i.e., any path), route it to the backend_service cluster.
Clusters: These define the destinations for requests. backend_service is configured with:
- type: STRICT_DNS: Envoy will periodically resolve the DNS name specified in load_assignment.
- connect_timeout: How long Envoy will wait to establish a connection to an upstream host before giving up. 0.25s is a common starting point.
- lb_policy: ROUND_ROBIN: If there were multiple endpoints for backend_service, Envoy would distribute requests evenly among them.
- load_assignment: Specifies the actual network addresses for the cluster. Here, it’s 127.0.0.1:9090.

Best Practices for Production Deployment

1. Health Checking: Envoy can actively check the health of its upstream clusters, removing unhealthy instances from the load balancing pool.

Configuration Snippet (add to backend_service cluster):

    health_checks:
    - timeout: 1s
      interval: 5s
      http_health_check:
        path: "/healthz" # Your service needs to expose a /healthz endpoint

This tells Envoy to send an HTTP GET request to /healthz on each backend instance every 5 seconds, with a 1-second timeout. If the health check fails (e.g., returns a 5xx error or times out), the instance is considered unhealthy and traffic is no longer sent to it.

2. TLS Termination/Origination: Envoy is excellent at handling TLS. You can terminate TLS from clients and re-encrypt to upstream services, or just terminate from clients.

Configuration Snippet (add to listener_0):

    listener_filters:
    - name: envoy.filters.listener.tls_inspector
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.filters.listener.tls_inspector.v3.TlsInspector
    filter_chains:
    - filter_chain_match:
        transport_protocol: "tls"
      tls_context:
        common_tls_context:
          tls_certificates:
          - certificate_chain:
              filename: "/etc/envoy/certs/server.crt"
            private_key:
              filename: "/etc/envoy/certs/server.key"
      filters:
      - name: envoy.filters.network.http_connection_manager
        # ... rest of HCM config ...

This adds a tls_inspector to detect TLS connections and then applies a filter_chain with a tls_context that specifies the server certificate and key for terminating TLS.

3. Observability (Metrics, Logging, Tracing): Envoy emits detailed metrics, can log requests, and can integrate with distributed tracing systems.

Configuration Snippet (add to HttpConnectionManager typed_config):

          access_log:
          - name: envoy.access_loggers.file
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
              path: "/var/log/envoy/access.log"

          # For Prometheus metrics
          common_http_protocol_options:
            # This is where stats are enabled.
            # The stats_prefix is used to namespace the stats.
            # The stats server is configured globally.
            # The endpoint is typically exposed on port 8001 by default.
            # You can customize this in the admin section of the config.
            # E.g.: admin: { address: { socket_address: { address: 127.0.0.1, port_value: 9901 } } }
            # Then scrape http://127.0.0.1:9901/stats/prometheus

You’ll need to configure Envoy’s admin section to expose these metrics.

4. Circuit Breaking: Prevent cascading failures by limiting the number of concurrent connections, pending requests, or retries to upstream services.

Configuration Snippet (add to backend_service cluster):

    circuit_breakers:
      thresholds:
      - priority: HIGH
        max_connections: 100
        max_pending_requests: 10
        max_requests: 50

This limits the backend_service cluster to 100 concurrent connections, 10 pending requests, and 50 active requests per Envoy instance. If these thresholds are exceeded, Envoy will reject new requests.

5. Rate Limiting: Protect services from being overwhelmed by excessive traffic. Envoy can integrate with external rate-limiting services or use its own local rate limiter.

Configuration Snippet (add to HttpConnectionManager typed_config):

          ratelimit:
            domain: "my_service_domain" # Unique identifier for this service's rate limits
            # If using an external rate limiting service, configure its connection here.
            # Otherwise, Envoy can use its built-in local rate limiter.

6. Graceful Shutdown: Ensure Envoy drains existing connections and doesn’t accept new ones when the process is stopping. This is typically handled by the orchestrator (Kubernetes, Nomad) but Envoy itself supports signals.

When Envoy is killed with SIGTERM, it will enter a draining mode, finishing in-flight requests before exiting. This is crucial for avoiding dropped requests during deployments.

7. Configuration Management: Use dynamic configuration sources (Consul, etcd, Kubernetes API) instead of static files for easier updates and management in large deployments.

The most surprising thing about Envoy’s power is its ability to act as a "service mesh data plane" without an explicit control plane for basic routing and load balancing. Even with a static configuration, you gain immediate benefits in observability and resilience.

Envoy’s configuration is declarative, meaning you describe the desired state of your network proxying, and Envoy works to achieve it. This separation of concerns allows your application code to focus purely on business logic, offloading network complexities to Envoy.

The next step in mastering Envoy is often understanding how to implement advanced traffic management patterns like canary deployments or A/B testing using its sophisticated routing capabilities.