Rate-Limit Flask Endpoints with Flask-Limiter (2026)

Flask-Limiter lets you slap rate limits on your Flask endpoints, but it’s not just about blocking requests; it’s about controlling the flow and protecting your service from abuse.

Here’s a basic Flask app with a rate-limited endpoint:

from flask import Flask, jsonify
from flask_limiter import Limiter
from flask_limiter.util import get_remote_address

app = Flask(__name__)
limiter = Limiter(
    app=app,
    key_func=get_remote_address,
    default_error_message="You've hit the rate limit!",
    storage_uri="memory://", # For demonstration, use Redis in production
)

@app.route("/slow")
@limiter.limit("1 per minute")
def slow_route():
    return jsonify(message="This is a slow route.")

@app.route("/fast")
@limiter.limit("100 per hour")
def fast_route():
    return jsonify(message="This is a fast route.")

if __name__ == "__main__":
    app.run(debug=True)

When you hit /slow more than once within a minute, Flask-Limiter will automatically return a 429 Too Many Requests response. The default_error_message is what you’ll see in that response body.

The real magic is in how you define these limits. The limiter.limit() decorator is your primary tool. The string argument defines the limit: a number, followed by a time unit. You can chain them for more complex rules. For instance, limiter.limit("100 per hour, 5000 per day") applies both constraints.

Key configuration options for Limiter:

key_func: This function determines what identifies a unique client. get_remote_address is common, but you might use request.headers.get('X-API-Key') or even a combination if you have authenticated users.
default_error_message: The message returned when a limit is exceeded.
storage_uri: Where the rate limit counters are stored. memory:// is fine for testing, but for production, you must use a shared backend like Redis (redis://localhost:6379/0). This is crucial because multiple Flask workers (or servers) need to share the same rate limit state. If you use memory, each worker will have its own independent counter, rendering the rate limiting ineffective.
strategy: How the limit is applied. The default is fixed-window, which resets the count at the start of each period (e.g., at the top of the hour). moving-window is often preferred as it counts requests over a rolling window, providing a smoother experience and preventing bursts at the window boundary.

Let’s look at moving-window in action:

from flask import Flask, jsonify
from flask_limiter import Limiter
from flask_limiter.util import get_remote_address

app = Flask(__name__)
limiter = Limiter(
    app=app,
    key_func=get_remote_address,
    storage_uri="redis://localhost:6379/0", # Use Redis for production
    strategy="moving-window",
    default_error_message="You've hit the rate limit!",
)

@app.route("/api/data")
@limiter.limit("5 per minute") # 5 requests per minute
def get_data():
    return jsonify(data="some sensitive data")

if __name__ == "__main__":
    app.run(debug=True)

With strategy="moving-window", if you set a limit of "5 per minute", Flask-Limiter will track the last 60 seconds of requests. If you make 5 requests at 00:00:01, then another at 00:00:02, the 00:00:02 request will be allowed because it’s within the rolling window. However, if you make 5 requests at 00:00:01 and then a 6th request at 00:00:59, that 6th request will be denied. This is more robust than fixed-window, which would allow 5 requests at 00:00:01 and then another 5 requests at 00:01:00, effectively allowing 10 requests in a very short period.

You can also apply rate limits globally or to groups of routes using decorators on the Limiter instance itself.

# ... (previous imports and app setup)

# Global limit for all routes
limiter.limit("1000 per day")(app)

@app.route("/public")
@limiter.limit("10 per minute") # Specific limit overrides global if hit first
def public_route():
    return jsonify(message="Public access.")

@app.route("/admin")
@limiter.limit("5 per hour")
def admin_route():
    return jsonify(message="Admin access.")

The order of evaluation matters: specific route limits are checked before global limits. If a specific route limit is hit, the client receives a 429. If the specific route limit is not hit but the global limit is, the client receives a 429 based on the global limit.

The most counterintuitive aspect of rate limiting is how it interacts with distributed systems and shared state. If you’re running multiple instances of your Flask app behind a load balancer, each instance needs to consult a single, shared rate limit counter. This is why using an external store like Redis (redis://localhost:6379/0) for storage_uri is non-negotiable for production. If you only use memory://, each Flask worker process will have its own independent set of counters, and your rate limits will be effectively bypassed by any traffic hitting different workers.

Beyond simple request counts, Flask-Limiter supports distributed rate limiting strategies like fixed-window-elastic and moving-window-elastic which can help smooth out traffic bursts even further by slightly adjusting the window based on server load, though this is more advanced.

The next thing you’ll likely want to explore is custom rate limit keys beyond just the IP address, such as API keys or user IDs, to implement per-user or per-API-key rate limiting.