Rate-Limit FastAPI Endpoints with SlowAPI (2026)

FastAPI endpoints can be rate-limited by integrating the slowapi library.

Let’s see slowapi in action. Imagine a simple FastAPI application with an endpoint that we want to protect from excessive requests.

from fastapi import FastAPI, Request, HTTPException
from slowapi import Limiter
from slowapi.util import get_remote_address
from slowapi.errors import RateLimitExceeded

app = FastAPI()

# Initialize the limiter
# We're using the in-memory storage for simplicity here.
# For production, consider Redis or Memcached for shared rate limiting across instances.
limiter = Limiter(key_func=get_remote_address)

# Apply the limiter to the entire app
app.state.limiter = limiter

# Define a rate limit: 5 requests per minute per IP address
@app.get("/")
@limiter.limit("5/minute")
async def read_root(request: Request):
    return {"message": "Hello! You are within the rate limit."}

# Custom exception handler for rate limit exceeded
@app.exception_handler(RateLimitExceeded)
async def rate_limit_exceeded_handler(request: Request, exc: RateLimitExceeded):
    return HTTPException(
        status_code=429,
        detail=f"Rate limit exceeded. Try again in {exc.retry_after:.2f} seconds.",
    )

# Example of another endpoint with a different rate limit
@app.get("/items/{item_id}")
@limiter.limit("10/hour")
async def read_item(item_id: int, request: Request):
    return {"item_id": item_id, "message": "This item endpoint has a different limit."}

This code sets up a FastAPI application and applies rate limiting to its endpoints. The Limiter is initialized with get_remote_address as the key function, meaning rate limits will be enforced based on the client’s IP address. The @limiter.limit("5/minute") decorator on read_root restricts that endpoint to a maximum of 5 requests per minute from any single IP. A custom exception handler rate_limit_exceeded_handler is defined to return a 429 Too Many Requests status code with a helpful message indicating when the client can retry.

The fundamental problem slowapi solves is preventing abuse and ensuring fair usage of your API resources. Without rate limiting, a single user or a malicious actor could overwhelm your server with requests, leading to degraded performance, increased costs, and denial of service for legitimate users. slowapi provides a straightforward way to define and enforce these limits.

Internally, slowapi uses a storage backend to keep track of request counts and timestamps for each unique key (in our case, IP addresses). When a request comes in, it looks up the client’s key in the storage. If the number of requests within the defined time window exceeds the limit, it raises a RateLimitExceeded exception. The choice of storage backend is crucial for scalability. The default in-memory storage is fine for single-process applications or development, but for multi-process deployments or distributed systems, you’ll need a shared backend like Redis or Memcached.

The key levers you control are the rate limit strings themselves and the storage backend. The rate limit string format is intuitive: count/unit, where unit can be second, minute, hour, or day. You can apply these limits globally to the entire application by assigning the limiter to app.state.limiter, or on a per-route basis using the @limiter.limit() decorator as shown. You can also define multiple decorators on a single route to apply different limits, and slowapi will enforce the most restrictive one.

The key_func is another powerful customization point. While get_remote_address is common, you might want to limit based on authenticated user IDs, API keys, or even specific request headers. You can define a custom function that returns a unique string identifier for the entity you want to rate limit. For example, if you have user authentication, you could create a key_func that extracts the user_id from the JWT token.

The strategy parameter within Limiter (not shown in the basic example) offers fine-grained control over how the rate limiting is calculated. The default is MovingWindowRateLimiter, which is generally a good balance of accuracy and performance. However, for specific scenarios, you might explore FixedWindowRateLimiter or FixedWindowRateLimiterWithTimestamp. Each has different trade-offs in terms of accuracy and computational overhead. For instance, a fixed window might allow a burst of requests at the boundary of two windows, whereas a moving window smooths this out.

When you configure slowapi with a Redis backend, the actual storage and retrieval of counts and timestamps happen via Redis commands. For a limit like 5/minute, slowapi might use Redis commands like INCR and EXPIRE or a Lua script to atomically increment a counter and set an expiration time. The retry_after value in the RateLimitExceeded exception is often calculated based on the expiration time of the relevant Redis key or a calculated time until the current window resets.

The next step in securing your API with rate limiting is to consider distributed rate limiting across multiple instances of your FastAPI application.