Benchmark FastAPI Performance with wrk and Locust (2026)

FastAPI is designed for speed, but how fast is it really? Let’s find out by benchmarking its performance using wrk and locust.

First, we need a simple FastAPI application to test.

# main.py
from fastapi import FastAPI

app = FastAPI()

@app.get("/")
async def read_root():
    return {"Hello": "World"}

@app.get("/items/{item_id}")
async def read_item(item_id: int, q: str | None = None):
    return {"item_id": item_id, "q": q}

Save this as main.py. To run it, we’ll use uvicorn:

uvicorn main:app --reload --host 0.0.0.0 --port 8000

This starts the application on http://0.0.0.0:8000.

Benchmarking with `wrk`

wrk is a modern HTTP benchmarking tool capable of generating significant load. It’s great for raw throughput testing.

First, install wrk:

# On macOS
brew install wrk

# On Debian/Ubuntu
sudo apt update && sudo apt install wrk

Now, let’s hit our root endpoint. We’ll simulate 100 concurrent connections, each making 10 requests, for a total of 1000 requests.

wrk -t4 -c100 -d10s http://0.0.0.0:8000/

-t4: Use 4 threads.
-c100: Maintain 100 concurrent connections.
-d10s: Run the benchmark for 10 seconds.

The output will show you metrics like requests per second (RPS), latency, and transfer speed. For our simple {"Hello": "World"} endpoint, you should see very high RPS.

To test the endpoint with path parameters and query strings:

wrk -t4 -c100 -d10s http://0.0.0.0:8000/items/5?q=test

wrk is excellent for understanding the maximum theoretical throughput of your application under ideal conditions. It’s simple, fast, and provides a good baseline.

Benchmarking with `locust`

locust is a more sophisticated load testing tool that allows you to define user behavior in Python code. This makes it ideal for simulating realistic user scenarios.

First, install locust:

pip install locust

Create a locustfile.py:

# locustfile.py
from locust import HttpUser, task, between

class WebsiteUser(HttpUser):
    wait_time = between(1, 5)  # Simulate users waiting 1-5 seconds between tasks

    @task
    def index(self):
        self.client.get("/")

    @task
    def view_item(self):
        item_id = 1
        self.client.get(f"/items/{item_id}?q=test_query")

This locustfile defines a user that will hit the root endpoint and the /items/ endpoint. The wait_time simulates think time between user actions.

To run locust, execute:

locust -f locustfile.py

This will start a web UI at http://0.0.0.0:8089. Open it in your browser. You’ll be prompted to enter the number of users to simulate and the hatch rate (how quickly those users are spawned). Start with a small number, say 10 users and a hatch rate of 2.

As locust ramps up, you’ll see statistics on RPS, response times, and failure rates. You can increase the number of users to see how your application scales.

Locust is powerful because you can define complex user flows, think times, and failure conditions, giving you a more nuanced view of performance under various conditions. You can also run it headlessly for automated testing:

locust -f locustfile.py --headless -u 1000 -r 100 --run-time 5m -H http://0.0.0.0:8000 --only-summary

This command runs locust with 1000 users, a hatch rate of 100, for 5 minutes, targeting http://0.0.0.0:8000, and only prints a summary at the end.

The key difference between wrk and locust is the level of simulation. wrk is about pushing raw requests; locust is about simulating user behavior. For FastAPI, which is built on asynchronous I/O, understanding how it handles concurrent users and their actions (simulated by locust) is often more insightful than just raw request throughput (measured by wrk).

One crucial aspect of FastAPI performance lies in its asynchronous nature. When you define an endpoint with async def, you’re telling FastAPI that this operation can yield control back to the event loop while waiting for I/O. This means that while one request is waiting for a database query or an external API call, the server can immediately start processing another request. This is fundamentally different from traditional synchronous web frameworks where a worker process would be blocked, unable to handle other requests until the I/O operation completed. The uvicorn server, running ASGI applications like FastAPI, is specifically designed to leverage this asynchronous capability by managing an event loop and multiple worker processes, each running its own event loop.

Understanding the interplay between your async functions, the uvicorn workers, and the event loop is critical for optimizing performance. If you have CPU-bound tasks within an async function, they will still block the event loop. For such tasks, you’d typically use run_in_executor to offload them to a thread pool, preventing them from stalling your asynchronous I/O.

The next step in performance tuning would involve profiling your application to identify specific bottlenecks, perhaps using tools like cProfile or async-profiler.

Benchmarking with wrk

Benchmarking with locust

Benchmarking with `wrk`

Benchmarking with `locust`