Your Flask app suddenly stopped responding to requests, and the load balancer is reporting unhealthy instances. This happened because the load balancer’s health check probes are failing to get a valid HTTP response from your application instances.
Common Causes and Fixes
-
Port Mismatch: The load balancer is trying to connect to a different port than your Flask app is listening on.
- Diagnosis: Check your Flask application’s
app.run()call or your WSGI server configuration (e.g., Gunicorn, uWSGI). Forapp.run(), look for theportargument. If using Gunicorn, check its--bindoption.# Example for Flask's built-in server # Look for something like this in your app.py: # app.run(host='0.0.0.0', port=5000) # Example for Gunicorn # Look for something like this in your deployment script or systemd service file: # gunicorn --bind 0.0.0.0:5000 your_app:app - Fix: Ensure the port specified in your Flask app’s configuration matches the port your load balancer is configured to probe. If your Flask app runs on port
5000, your load balancer’s health check should also target port5000. - Why it works: The health check probe needs to reach the exact network port where your application is actively listening for incoming HTTP requests.
- Diagnosis: Check your Flask application’s
-
Firewall Blocking: A firewall (either on the instance itself or a network firewall) is preventing the health check probes from reaching your application’s port.
- Diagnosis: On the instance, use
ufworfirewalldto check active rules. For cloud environments, check security groups or network ACLs.# On the instance (Debian/Ubuntu) sudo ufw status verbose # On the instance (CentOS/RHEL) sudo firewall-cmd --list-all # On AWS, check EC2 Security Groups for inbound rules on the application port. # On GCP, check VPC Firewall Rules. - Fix: Add an inbound rule to allow traffic on your application’s port (e.g.,
5000) from the load balancer’s IP range or health check IP addresses.# Example for ufw sudo ufw allow 5000/tcp # Example for firewalld sudo firewall-cmd --permanent --add-port=5000/tcp sudo firewall-cmd --reload - Why it works: Firewalls act as gatekeepers; by explicitly allowing traffic on the application port, you permit the health check probes to pass through.
- Diagnosis: On the instance, use
-
Application Not Starting/Crashing: Your Flask application failed to start or crashed shortly after starting, meaning it’s not listening on the port at all.
- Diagnosis: Check application logs for startup errors. This could be unhandled exceptions during import, configuration issues, or missing dependencies.
# If using systemd, check journalctl sudo journalctl -u your-app.service -f # If running directly, tail your log file tail -f /var/log/your-app.log - Fix: Resolve the underlying error in your application code or dependencies. For example, if a database connection fails at startup, fix the connection string or ensure the database is accessible.
- Why it works: The application must be running and responsive to accept connections; fixing startup errors ensures it stays alive.
- Diagnosis: Check application logs for startup errors. This could be unhandled exceptions during import, configuration issues, or missing dependencies.
-
Incorrect Health Check Path: The load balancer is configured to check a URL path that doesn’t exist or doesn’t return a
200 OKstatus code.- Diagnosis: Review your load balancer’s health check configuration. Common default paths are
/or/health. If you haven’t defined a specific endpoint, the load balancer might be probing a non-existent route. - Fix: Implement a dedicated health check endpoint in your Flask app.
Then, configure your load balancer to probefrom flask import Flask, jsonify app = Flask(__name__) @app.route('/health') def health_check(): # You can add more sophisticated checks here, # e.g., database connectivity, external service status. return jsonify({"status": "ok"}), 200 if __name__ == '__main__': app.run(host='0.0.0.0', port=5000)/health. - Why it works: A defined endpoint that explicitly signals a healthy state (returning
200 OK) provides the load balancer with the clear signal it needs to consider the instance healthy.
- Diagnosis: Review your load balancer’s health check configuration. Common default paths are
-
Slow Application Response: Your application is technically running but takes too long to respond to the health check probe, causing the load balancer to time out.
- Diagnosis: Check the load balancer’s health check configuration for its "timeout" or "interval" settings. Also, examine your application logs for long-running requests, especially to the health check endpoint if one is defined.
- Fix: Optimize your health check endpoint to be as fast as possible. If it performs checks (like database queries), ensure those are efficient or cacheable. Alternatively, increase the load balancer’s health check timeout if acceptable.
In your load balancer settings, you might increase the timeout from# Example of a fast health check @app.route('/health') def health_check(): # No external calls, just a quick check return jsonify({"status": "ok"}), 2005sto10s. - Why it works: The health check must complete within the load balancer’s defined time window; reducing latency or increasing the window ensures a successful probe.
-
WSGI Server Configuration Issues: If you’re using a production WSGI server like Gunicorn or uWSGI, misconfiguration can cause it to not properly bind to the port or handle requests.
- Diagnosis: Check the configuration files or command-line arguments for your WSGI server. Ensure it’s bound to the correct IP address (
0.0.0.0for public access) and port.# Example Gunicorn command gunicorn --workers 4 --bind 0.0.0.0:5000 your_app:app # Example Gunicorn config file (gunicorn_config.py) # bind = "0.0.0.0:5000" # workers = 4 - Fix: Correct the
bindaddress and port in your WSGI server configuration or command. Ensure the number of workers is appropriate for your instance’s resources. - Why it works: The WSGI server is the intermediary that receives requests from the load balancer and passes them to your Flask app; it must be correctly configured to listen and forward requests.
- Diagnosis: Check the configuration files or command-line arguments for your WSGI server. Ensure it’s bound to the correct IP address (
After fixing these, you’ll likely encounter issues with your application’s actual request handling, such as 5xx errors due to unhandled exceptions or 404 Not Found if routes are missing.
The most surprising thing about health and readiness endpoints is that they aren’t just for keeping your app alive; they’re also crucial for gracefully replacing it.
Consider this Flask app with a simple /health endpoint:
from flask import Flask, jsonify
import time
import random
app = Flask(__name__)
is_ready = True # Simulate readiness state
@app.route('/')
def index():
# Simulate some work
time.sleep(random.uniform(0.1, 0.5))
return "Hello, World!"
@app.route('/health')
def health_check():
# This endpoint is checked by the load balancer
if is_ready:
return jsonify({"status": "ok", "ready": True}), 200
else:
# Not ready yet or shutting down gracefully
return jsonify({"status": "degraded", "ready": False}), 503
@app.route('/shutdown_gracefully')
def shutdown_gracefully():
global is_ready
print("Initiating graceful shutdown...")
is_ready = False
# In a real app, you'd signal your workers to stop accepting new requests
# and finish processing in-flight ones.
return "Shutting down gracefully. No new requests will be accepted.", 200
if __name__ == '__main__':
# In production, use a WSGI server like Gunicorn
app.run(host='0.0.0.0', port=5000)
When a load balancer probes /health, it typically looks for a 200 OK. But what if your app needs a moment to initialize, or is in the middle of a controlled shutdown?
Here’s how it works in practice with a load balancer (like AWS ALB, GCP Load Balancer, or Nginx):
- Startup: When a new instance starts, it might have background tasks (e.g., database connection pooling, cache warming). The
/healthendpoint could initially return503 Service Unavailableor a200 OKwith aready: falseflag. The load balancer, seeing503, won’t send traffic. Once initialization is complete,/healthstarts returning200 OKwithready: true. The load balancer then begins routing traffic to this instance. - Scaling Down/Deployments: When you want to take an instance out of service (e.g., for a deployment, or to scale down), you don’t just terminate it. Instead, you trigger a graceful shutdown. You might hit a
/shutdown_gracefullyendpoint (or have your deployment system do it). This endpoint setsis_readytoFalseand tells the load balancer (via the/healthendpoint returning503or200 OKwithready: false) to stop sending new traffic. The instance continues to process any requests it already received until they are complete. Once all in-flight requests are done, the instance can be safely terminated.
The key is that the health check endpoint can communicate more than just "am I alive?". It can communicate "am I ready to receive new traffic?". This readiness state is what enables zero-downtime deployments and scaling.
Many developers only think of the 200 OK response for health checks. They don’t realize that returning different status codes (like 503 Service Unavailable) or a JSON payload indicating a readiness state ({"ready": false}) is the mechanism that allows load balancers to orchestrate complex operational tasks like zero-downtime deployments. The load balancer interprets these signals to manage traffic flow, ensuring that no user request is dropped during an update or scaling event.
The next logical step after implementing robust health checks is understanding how to integrate these endpoints into your CI/CD pipeline for automated deployments.