Add Health and Readiness Endpoints to Flask for Load Balancers (2026)

Your Flask app suddenly stopped responding to requests, and the load balancer is reporting unhealthy instances. This happened because the load balancer’s health check probes are failing to get a valid HTTP response from your application instances.

Common Causes and Fixes

Port Mismatch: The load balancer is trying to connect to a different port than your Flask app is listening on.
- Diagnosis: Check your Flask application’s app.run() call or your WSGI server configuration (e.g., Gunicorn, uWSGI). For app.run(), look for the port argument. If using Gunicorn, check its --bind option.
```
# Example for Flask's built-in server
# Look for something like this in your app.py:
# app.run(host='0.0.0.0', port=5000)

# Example for Gunicorn
# Look for something like this in your deployment script or systemd service file:
# gunicorn --bind 0.0.0.0:5000 your_app:app
```
- Fix: Ensure the port specified in your Flask app’s configuration matches the port your load balancer is configured to probe. If your Flask app runs on port 5000, your load balancer’s health check should also target port 5000.
- Why it works: The health check probe needs to reach the exact network port where your application is actively listening for incoming HTTP requests.
Firewall Blocking: A firewall (either on the instance itself or a network firewall) is preventing the health check probes from reaching your application’s port.
- Diagnosis: On the instance, use ufw or firewalld to check active rules. For cloud environments, check security groups or network ACLs.
```
# On the instance (Debian/Ubuntu)
sudo ufw status verbose

# On the instance (CentOS/RHEL)
sudo firewall-cmd --list-all

# On AWS, check EC2 Security Groups for inbound rules on the application port.
# On GCP, check VPC Firewall Rules.
```
- Fix: Add an inbound rule to allow traffic on your application’s port (e.g., 5000) from the load balancer’s IP range or health check IP addresses.
```
# Example for ufw
sudo ufw allow 5000/tcp

# Example for firewalld
sudo firewall-cmd --permanent --add-port=5000/tcp
sudo firewall-cmd --reload
```
- Why it works: Firewalls act as gatekeepers; by explicitly allowing traffic on the application port, you permit the health check probes to pass through.
Application Not Starting/Crashing: Your Flask application failed to start or crashed shortly after starting, meaning it’s not listening on the port at all.
- Diagnosis: Check application logs for startup errors. This could be unhandled exceptions during import, configuration issues, or missing dependencies.
```
# If using systemd, check journalctl
sudo journalctl -u your-app.service -f

# If running directly, tail your log file
tail -f /var/log/your-app.log
```
- Fix: Resolve the underlying error in your application code or dependencies. For example, if a database connection fails at startup, fix the connection string or ensure the database is accessible.
- Why it works: The application must be running and responsive to accept connections; fixing startup errors ensures it stays alive.
Incorrect Health Check Path: The load balancer is configured to check a URL path that doesn’t exist or doesn’t return a 200 OK status code.
- Diagnosis: Review your load balancer’s health check configuration. Common default paths are / or /health. If you haven’t defined a specific endpoint, the load balancer might be probing a non-existent route.
- Fix: Implement a dedicated health check endpoint in your Flask app.
```
from flask import Flask, jsonify

app = Flask(__name__)

@app.route('/health')
def health_check():
    # You can add more sophisticated checks here,
    # e.g., database connectivity, external service status.
    return jsonify({"status": "ok"}), 200

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)
```
  Then, configure your load balancer to probe /health.
- Why it works: A defined endpoint that explicitly signals a healthy state (returning 200 OK) provides the load balancer with the clear signal it needs to consider the instance healthy.
Slow Application Response: Your application is technically running but takes too long to respond to the health check probe, causing the load balancer to time out.
- Diagnosis: Check the load balancer’s health check configuration for its "timeout" or "interval" settings. Also, examine your application logs for long-running requests, especially to the health check endpoint if one is defined.
- Fix: Optimize your health check endpoint to be as fast as possible. If it performs checks (like database queries), ensure those are efficient or cacheable. Alternatively, increase the load balancer’s health check timeout if acceptable.
```
# Example of a fast health check
@app.route('/health')
def health_check():
    # No external calls, just a quick check
    return jsonify({"status": "ok"}), 200
```
  In your load balancer settings, you might increase the timeout from 5s to 10s.
- Why it works: The health check must complete within the load balancer’s defined time window; reducing latency or increasing the window ensures a successful probe.
WSGI Server Configuration Issues: If you’re using a production WSGI server like Gunicorn or uWSGI, misconfiguration can cause it to not properly bind to the port or handle requests.
- Diagnosis: Check the configuration files or command-line arguments for your WSGI server. Ensure it’s bound to the correct IP address (0.0.0.0 for public access) and port.
```
# Example Gunicorn command
gunicorn --workers 4 --bind 0.0.0.0:5000 your_app:app

# Example Gunicorn config file (gunicorn_config.py)
# bind = "0.0.0.0:5000"
# workers = 4
```
- Fix: Correct the bind address and port in your WSGI server configuration or command. Ensure the number of workers is appropriate for your instance’s resources.
- Why it works: The WSGI server is the intermediary that receives requests from the load balancer and passes them to your Flask app; it must be correctly configured to listen and forward requests.

After fixing these, you’ll likely encounter issues with your application’s actual request handling, such as 5xx errors due to unhandled exceptions or 404 Not Found if routes are missing.

The most surprising thing about health and readiness endpoints is that they aren’t just for keeping your app alive; they’re also crucial for gracefully replacing it.

Consider this Flask app with a simple /health endpoint:

from flask import Flask, jsonify
import time
import random

app = Flask(__name__)
is_ready = True # Simulate readiness state

@app.route('/')
def index():
    # Simulate some work
    time.sleep(random.uniform(0.1, 0.5))
    return "Hello, World!"

@app.route('/health')
def health_check():
    # This endpoint is checked by the load balancer
    if is_ready:
        return jsonify({"status": "ok", "ready": True}), 200
    else:
        # Not ready yet or shutting down gracefully
        return jsonify({"status": "degraded", "ready": False}), 503

@app.route('/shutdown_gracefully')
def shutdown_gracefully():
    global is_ready
    print("Initiating graceful shutdown...")
    is_ready = False
    # In a real app, you'd signal your workers to stop accepting new requests
    # and finish processing in-flight ones.
    return "Shutting down gracefully. No new requests will be accepted.", 200

if __name__ == '__main__':
    # In production, use a WSGI server like Gunicorn
    app.run(host='0.0.0.0', port=5000)

When a load balancer probes /health, it typically looks for a 200 OK. But what if your app needs a moment to initialize, or is in the middle of a controlled shutdown?

Here’s how it works in practice with a load balancer (like AWS ALB, GCP Load Balancer, or Nginx):

Startup: When a new instance starts, it might have background tasks (e.g., database connection pooling, cache warming). The /health endpoint could initially return 503 Service Unavailable or a 200 OK with a ready: false flag. The load balancer, seeing 503, won’t send traffic. Once initialization is complete, /health starts returning 200 OK with ready: true. The load balancer then begins routing traffic to this instance.
Scaling Down/Deployments: When you want to take an instance out of service (e.g., for a deployment, or to scale down), you don’t just terminate it. Instead, you trigger a graceful shutdown. You might hit a /shutdown_gracefully endpoint (or have your deployment system do it). This endpoint sets is_ready to False and tells the load balancer (via the /health endpoint returning 503 or 200 OK with ready: false) to stop sending new traffic. The instance continues to process any requests it already received until they are complete. Once all in-flight requests are done, the instance can be safely terminated.

The key is that the health check endpoint can communicate more than just "am I alive?". It can communicate "am I ready to receive new traffic?". This readiness state is what enables zero-downtime deployments and scaling.

Many developers only think of the 200 OK response for health checks. They don’t realize that returning different status codes (like 503 Service Unavailable) or a JSON payload indicating a readiness state ({"ready": false}) is the mechanism that allows load balancers to orchestrate complex operational tasks like zero-downtime deployments. The load balancer interprets these signals to manage traffic flow, ensuring that no user request is dropped during an update or scaling event.

The next logical step after implementing robust health checks is understanding how to integrate these endpoints into your CI/CD pipeline for automated deployments.