Envoy’s downstream connection terminated because the upstream service it was trying to talk to either actively refused the connection or closed it prematurely. This is interesting because Envoy is the downstream client from the perspective of the upstream service, and this error means the upstream service decided it didn’t want to talk to Envoy anymore, or at least not right then.

1. Upstream Service Overloaded/Crashing

This is the most common culprit. The upstream service is so swamped it can’t even accept new connections, or it’s crashing and dropping existing ones.

  • Diagnosis: Check the upstream service’s logs for errors like "too many open files," "resource temporarily unavailable," or stack traces indicating crashes. Monitor its CPU, memory, and network socket usage. If using Kubernetes, kubectl top pod <pod-name> -n <namespace> and kubectl logs <pod-name> -n <namespace> are your friends.
  • Fix: Scale up the upstream service. If in Kubernetes, increase the replicas count in your Deployment or StatefulSet. If it’s a standalone binary, increase the number of worker processes or threads. This gives the service more capacity to handle incoming connections.
  • Why it works: More instances or resources mean less load per instance, allowing it to accept and process connections without timing out or crashing.

2. Incorrect Upstream Service Port/IP Configuration

Envoy is trying to connect to a port or IP address that the upstream service isn’t actually listening on.

  • Diagnosis: Verify the service_name and port_value in your Envoy cluster configuration. Then, on the upstream service itself, check which IP address and port it’s bound to. For a curl check from the Envoy node (or a pod in the same network if using Kubernetes), try curl -v <upstream_ip>:<upstream_port>. You should see a successful HTTP/1.1 200 OK or similar, not a "Connection refused."
  • Fix: Correct the service_name and port_value in your Envoy cluster configuration to match the actual listening address and port of the upstream service. For example, if your service is listening on 0.0.0.0:8080, ensure Envoy is configured to point to that.
  • Why it works: Envoy will now send its connection requests to the correct network endpoint where the upstream service is actively listening.

3. Network Connectivity Issues (Firewall, Security Group, Network Policy)

A firewall, security group, or Kubernetes Network Policy is blocking connections between Envoy and the upstream service.

  • Diagnosis: From the Envoy pod/instance, try to telnet <upstream_ip> <upstream_port> or nc -vz <upstream_ip> <upstream_port>. If these fail, the network path is blocked. Check firewall rules on any intervening network devices, cloud provider security groups (e.g., AWS Security Groups, Azure NSGs), or Kubernetes Network Policies.
  • Fix: Adjust firewall rules, security groups, or Network Policies to explicitly allow traffic from Envoy’s IP address/range to the upstream service’s IP address/range on the required port. For example, a Kubernetes Network Policy might look like this:
    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: allow-envoy-to-upstream
      namespace: default
    spec:
      podSelector:
        matchLabels:
          app: my-upstream-app # Label of your upstream pods
      policyTypes:
      - Ingress
      ingress:
      - from:
        - podSelector:
            matchLabels:
              app: envoy # Label of your Envoy pods
        ports:
        - protocol: TCP
          port: 8080 # The port your upstream service listens on
    
  • Why it works: This explicitly permits the network packets to traverse from Envoy to the upstream service, satisfying the connectivity requirement.

4. Upstream Service Not Ready/Initialized

The upstream service’s pods are running, but the application inside hasn’t fully started and isn’t yet accepting connections. This is common during deployments or restarts.

  • Diagnosis: Check the upstream service’s application logs for initialization messages. If in Kubernetes, kubectl get pod <pod-name> -n <namespace> -o yaml and look at the readinessProbe status. If the probe is failing, the pod won’t be considered ready to receive traffic.
  • Fix: Ensure the upstream service has a robust readiness probe configured that accurately reflects when the application is ready to accept connections. Increase the initialDelaySeconds and periodSeconds for the readiness probe if the application takes longer to start.
  • Why it works: Envoy (and Kubernetes Services) use readiness probes to determine if an upstream endpoint is healthy and ready to serve traffic. A proper probe prevents Envoy from sending requests to an uninitialized service.

5. Upstream Service Graceful Shutdown Issues

The upstream service is shutting down but not handling SIGTERM (or equivalent) correctly. Instead of finishing existing requests and closing gracefully, it’s abruptly terminating connections.

  • Diagnosis: Observe the upstream service’s logs during a deployment or restart. Look for messages indicating it’s shutting down. If you see errors about active connections being terminated before the process exits, it’s likely a shutdown issue.
  • Fix: Implement proper graceful shutdown handling in the upstream application. This typically involves:
    • Catching SIGTERM (or SIGINT).
    • Stopping the listener from accepting new connections.
    • Waiting for a configured timeout (e.g., 30 seconds) for existing requests to complete.
    • Closing the listener and exiting. For example, in Node.js with Express:
    const server = app.listen(PORT, () => console.log('Server listening'));
    process.on('SIGTERM', () => {
      console.log('SIGTERM signal received: closing HTTP server');
      server.close(() => {
        console.log('HTTP server closed');
        process.exit(0);
      });
    });
    
  • Why it works: By properly handling shutdown signals, the upstream service ensures that ongoing requests are completed before closing connections, preventing premature termination errors for Envoy.

6. Upstream Service Resource Exhaustion (File Descriptors, Ephemeral Ports)

The upstream service has run out of available file descriptors or ephemeral ports to establish new outgoing connections (which is what it’s doing when accepting Envoy’s incoming connection).

  • Diagnosis: Check the upstream service’s OS-level limits. Use ulimit -n to see the file descriptor limit and sysctl net.ipv4.ip_local_port_range for ephemeral ports. If these are low and the service is experiencing high connection churn, this could be the cause. Monitor netstat -anp | grep <upstream_pid> for a large number of CLOSE_WAIT or TIME_WAIT states.
  • Fix: Increase the nofile limit in /etc/security/limits.conf or via systemd service unit files for the upstream service’s user. Also, consider increasing the ephemeral port range if it’s very small. Example for limits.conf:
    <upstream_user> soft nofile 65536
    <upstream_user> hard nofile 65536
    
    And for sysctl:
    sudo sysctl -w net.ipv4.ip_local_port_range="1024 65535"
    
    (Remember to make sysctl changes persistent by editing /etc/sysctl.conf).
  • Why it works: Increasing these OS-level limits provides the upstream service with more resources to manage its network connections, preventing it from failing to accept new ones due to exhaustion.

The next error you’ll likely see after fixing this is UpstreamConnectionTermination if the upstream service is still actively closing connections, or potentially a different Envoy error if the underlying problem shifts.

Want structured learning?

Take the full Envoy course →