Envoy’s no healthy upstream for given host error means that Envoy, acting as a proxy, couldn’t find any healthy instances of the service you’re trying to reach.
Here’s a breakdown of common causes and how to fix them:
1. Service Not Registered with Discovery Service (e.g., Kubernetes, Consul, Eureka)
Diagnosis: Check your service discovery system to ensure the target service and its healthy instances are actually registered.
- Kubernetes:
kubectl get endpoints <your-service-name> -n <your-namespace>- Look for an
ENDPOINTSsection with IP addresses. If it’s empty or<none>, the service isn’t discovering healthy pods.
- Look for an
- Consul:
consul servicesor check the Consul UI. - Eureka: Check the Eureka dashboard for registered services and their health status.
Fix:
- Kubernetes: Ensure your service definition (
Serviceobject) correctly selects the pods that are running your application. Verify that the pods themselves are healthy (e.g.,kubectl get pods -n <your-namespace> -l app=<your-app-label>). If pods are unhealthy, fix the pod issues (e.g., restart them, fix application errors). If theServiceselector is wrong, update it. - Consul/Eureka: Ensure your service registration mechanism is correctly configured to report healthy instances. This might involve fixing the health check configuration for your service or ensuring the registration agent is running and healthy.
Why it works: Envoy relies on a discovery service to know where to send traffic. If the service discovery system doesn’t know about your service or marks all its instances as unhealthy, Envoy has no backend to route to, hence the error.
2. Incorrect Cluster.name in Envoy Configuration
Diagnosis: Examine your Envoy clusters.yaml (or equivalent configuration) and compare the name field of the relevant cluster definition against the hostname Envoy is being asked to resolve.
- Look for a line like:
name: "my-backend-service"in yourclusters.yaml. - Then, look at the request Envoy is receiving. If the request’s
Hostheader isapi.example.com, Envoy will try to find a cluster namedapi.example.com. If your cluster is namedmy-backend-service, it won’t match.
Fix: Ensure the name field in your Envoy cluster configuration exactly matches the hostname Envoy is expecting or the hostname being presented in the incoming request’s Host header. Often, this means setting name to the DNS name of your backend service.
- Example Fix: If Envoy receives a request with
Host: api.internal.net, and you want to route it to a backend service, your cluster definition should look like:clusters: - name: api.internal.net connect_timeout: 0.25s type: STRICT_DNS dns_refresh_rate: 1s lb_policy: ROUND_ROBIN load_assignment: cluster_name: api.internal.net endpoints: - lb_endpoints: - endpoint: address: socket_address: address: "api.internal.net" # Envoy will resolve this DNS name port_value: 8080
Why it works: Envoy uses the cluster name as a key to look up backend configurations. If there’s no cluster with that specific name, it cannot proceed.
3. Envoy type: STRICT_DNS and DNS Resolution Issues
Diagnosis: If your cluster type is STRICT_DNS, Envoy will perform DNS lookups for the hostname specified in load_assignment.endpoints[0].lb_endpoints[0].endpoint.address.socket_address.address.
- From the Envoy proxy itself, try resolving the hostname:
kubectl exec -it <envoy-pod-name> -n <envoy-namespace> -- nslookup <backend-hostname>kubectl exec -it <envoy-pod-name> -n <envoy-namespace> -- dig <backend-hostname>
- If these commands fail or return no A records, Envoy won’t be able to find an IP address.
Fix:
- Ensure the DNS server Envoy is configured to use is functional and can resolve the backend service’s hostname.
- Verify that the
addressfield in yourload_assignmentcorrectly specifies the DNS name that should resolve to your backend service. - If using Kubernetes
Serviceobjects,STRICT_DNStypically works by resolving the Kubernetes service DNS name (e.g.,<service-name>.<namespace>.svc.cluster.local). Ensure this name is correct and that the Kubernetes DNS (like CoreDNS) is healthy.
Why it works: STRICT_DNS requires Envoy to actively resolve the provided hostname to an IP address. If that resolution fails, Envoy has no endpoints to connect to.
4. Incorrect hosts in load_assignment (for STATIC or ORIGINAL_DST types)
Diagnosis: If your cluster type is STATIC or ORIGINAL_DST, Envoy doesn’t use DNS discovery. It relies on hardcoded endpoints or dynamically inferred ones.
- STATIC: Check the
load_assignment.endpointssection for the correct IP addresses and ports of your backend services.- Example:
clusters: - name: my-static-backend type: STATIC load_assignment: cluster_name: my-static-backend endpoints: - lb_endpoints: - endpoint: address: socket_address: address: "10.0.1.5" # Hardcoded IP port_value: 8080 - Verify the IPs and ports are reachable from the Envoy pod.
kubectl exec -it <envoy-pod-name> -n <envoy-namespace> -- ping 10.0.1.5(if ping is available) ortelnet 10.0.1.5 8080.
- Example:
- ORIGINAL_DST: This type relies on the kernel’s routing table to determine the destination IP. Ensure your Envoy setup correctly configures the kernel for this. This is less common for direct "no cluster found" errors but can happen if the network setup is broken.
Fix:
- STATIC: Update the
addressandport_valuein yourload_assignmentto point to the correct, reachable IP addresses and ports of your backend services. - ORIGINAL_DST: Troubleshoot your network configuration, iptables rules, and Envoy’s
original_dst_listenersetup. This often involves ensuring traffic is correctly DNATed before reaching Envoy.
Why it works: For STATIC clusters, Envoy only knows about the IPs you’ve explicitly listed. If those IPs are wrong or unreachable, it has no backends. ORIGINAL_DST relies on the OS’s network stack to have done its job correctly.
5. Health Check Failures
Diagnosis: Envoy performs health checks on upstream clusters by default. If all instances in a cluster fail their health checks, Envoy will temporarily remove them from the active pool, leading to this error.
- Check Envoy’s logs for health check failure messages.
- Look at the Envoy Admin API for cluster and endpoint health status:
curl http://127.0.0.1:9901/clusters?format=jsoncurl http://127.0.0.1:9901/clusters/my-backend-cluster?format=json- Look for
healthy_activeendpoints. If it’s 0, all are unhealthy.
Fix: Investigate why the health checks are failing.
- Application Issues: The backend service might be crashing, overloaded, or not responding correctly to its health endpoint.
- Network Issues: Firewalls or network policies might be blocking Envoy’s health check probes to the backend service.
- Health Check Configuration: The health check configuration in Envoy might be too aggressive (e.g., too short a timeout, too few retries) or pointing to the wrong health endpoint. Adjust
interval,timeout,unhealthy_threshold,healthy_thresholdin your cluster’shealth_checkssection.clusters: - name: my-backend-service # ... other config ... health_checks: - timeout: 0.5s interval: 5s unhealthy_threshold: 3 healthy_threshold: 2 http_health_check: path: "/healthz" # ... other http options ...
Why it works: Envoy is designed to be resilient. It will stop sending traffic to endpoints that are not responding to health checks to avoid sending requests to dead services. If all endpoints fail, you get this error.
6. Incorrect Host Header Rewriting or Mismatch in VirtualHost
Diagnosis: Envoy routes based on the incoming request’s Host header (or other matching criteria) to a VirtualHost, which then selects a Cluster. If the Host header is modified incorrectly before Envoy or if your VirtualHost’s domains don’t match what’s being sent, Envoy might not find the correct VirtualHost and thus no cluster.
- Check the
domainsin yourhttp_connection_manager.route_config.virtual_hosts.- Example:
domains: ["api.example.com"]
- Example:
- Inspect the actual
Hostheader arriving at Envoy. Usetcpdumpon the Envoy pod or check upstream logs if Envoy is forwarding the header.
Fix:
- Ensure the
domainslist in yourVirtualHostconfiguration includes the exact hostname that clients are sending in theirHostheader. - If you have a load balancer in front of Envoy, ensure it’s not stripping or altering the
Hostheader in a way that breaks the match. - If you’re using
host_rewritein your routes, ensure it’s configured correctly to rewrite the incomingHostheader to match adomainin aVirtualHostor to match thenameof aClusterifcluster_specifier_pluginis used.
Why it works: The VirtualHost acts as a primary dispatcher. If the incoming request’s Host doesn’t match any domains defined for the VirtualHost Envoy is using, it cannot proceed to select a cluster.
The next error you’ll likely encounter, after fixing this, is a 503 Service Unavailable if Envoy can find the cluster but none of the individual endpoints within it are healthy or reachable.