FastAPI endpoints can be rate-limited by integrating the slowapi library.

Let’s see slowapi in action. Imagine a simple FastAPI application with an endpoint that we want to protect from excessive requests.

from fastapi import FastAPI, Request, HTTPException
from slowapi import Limiter
from slowapi.util import get_remote_address
from slowapi.errors import RateLimitExceeded

app = FastAPI()

# Initialize the limiter
# We're using the in-memory storage for simplicity here.
# For production, consider Redis or Memcached for shared rate limiting across instances.
limiter = Limiter(key_func=get_remote_address)

# Apply the limiter to the entire app
app.state.limiter = limiter

# Define a rate limit: 5 requests per minute per IP address
@app.get("/")
@limiter.limit("5/minute")
async def read_root(request: Request):
    return {"message": "Hello! You are within the rate limit."}

# Custom exception handler for rate limit exceeded
@app.exception_handler(RateLimitExceeded)
async def rate_limit_exceeded_handler(request: Request, exc: RateLimitExceeded):
    return HTTPException(
        status_code=429,
        detail=f"Rate limit exceeded. Try again in {exc.retry_after:.2f} seconds.",
    )

# Example of another endpoint with a different rate limit
@app.get("/items/{item_id}")
@limiter.limit("10/hour")
async def read_item(item_id: int, request: Request):
    return {"item_id": item_id, "message": "This item endpoint has a different limit."}

This code sets up a FastAPI application and applies rate limiting to its endpoints. The Limiter is initialized with get_remote_address as the key function, meaning rate limits will be enforced based on the client’s IP address. The @limiter.limit("5/minute") decorator on read_root restricts that endpoint to a maximum of 5 requests per minute from any single IP. A custom exception handler rate_limit_exceeded_handler is defined to return a 429 Too Many Requests status code with a helpful message indicating when the client can retry.

The fundamental problem slowapi solves is preventing abuse and ensuring fair usage of your API resources. Without rate limiting, a single user or a malicious actor could overwhelm your server with requests, leading to degraded performance, increased costs, and denial of service for legitimate users. slowapi provides a straightforward way to define and enforce these limits.

Internally, slowapi uses a storage backend to keep track of request counts and timestamps for each unique key (in our case, IP addresses). When a request comes in, it looks up the client’s key in the storage. If the number of requests within the defined time window exceeds the limit, it raises a RateLimitExceeded exception. The choice of storage backend is crucial for scalability. The default in-memory storage is fine for single-process applications or development, but for multi-process deployments or distributed systems, you’ll need a shared backend like Redis or Memcached.

The key levers you control are the rate limit strings themselves and the storage backend. The rate limit string format is intuitive: count/unit, where unit can be second, minute, hour, or day. You can apply these limits globally to the entire application by assigning the limiter to app.state.limiter, or on a per-route basis using the @limiter.limit() decorator as shown. You can also define multiple decorators on a single route to apply different limits, and slowapi will enforce the most restrictive one.

The key_func is another powerful customization point. While get_remote_address is common, you might want to limit based on authenticated user IDs, API keys, or even specific request headers. You can define a custom function that returns a unique string identifier for the entity you want to rate limit. For example, if you have user authentication, you could create a key_func that extracts the user_id from the JWT token.

The strategy parameter within Limiter (not shown in the basic example) offers fine-grained control over how the rate limiting is calculated. The default is MovingWindowRateLimiter, which is generally a good balance of accuracy and performance. However, for specific scenarios, you might explore FixedWindowRateLimiter or FixedWindowRateLimiterWithTimestamp. Each has different trade-offs in terms of accuracy and computational overhead. For instance, a fixed window might allow a burst of requests at the boundary of two windows, whereas a moving window smooths this out.

When you configure slowapi with a Redis backend, the actual storage and retrieval of counts and timestamps happen via Redis commands. For a limit like 5/minute, slowapi might use Redis commands like INCR and EXPIRE or a Lua script to atomically increment a counter and set an expiration time. The retry_after value in the RateLimitExceeded exception is often calculated based on the expiration time of the relevant Redis key or a calculated time until the current window resets.

The next step in securing your API with rate limiting is to consider distributed rate limiting across multiple instances of your FastAPI application.

Want structured learning?

Take the full Fastapi course →