How FastAPI Async Request Handling Works with Starlette (2026)

FastAPI’s asynchronous request handling, while appearing magical, is fundamentally a well-orchestrated dance between Python’s async/await keywords and Starlette’s underlying event loop management.

Let’s see this in action. Imagine a simple FastAPI app:

from fastapi import FastAPI
import asyncio

app = FastAPI()

@app.get("/slow_task/{item_id}")
async def read_item(item_id: int, q: str | None = None):
    await asyncio.sleep(5)  # Simulate a 5-second I/O bound operation
    return {"item_id": item_id, "q": q}

@app.get("/fast_task/")
async def read_root():
    return {"message": "This is fast!"}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

When you run this with uvicorn main:app --reload and hit http://localhost:8000/slow_task/1?q=test in one browser tab, and http://localhost:8000/fast_task/ in another, something interesting happens. The /fast_task/ request doesn’t wait for the 5 seconds to pass. It returns immediately. This is because the await asyncio.sleep(5) in read_item doesn’t block the entire server. Instead, it tells the event loop, "Hey, I’m going to be busy doing nothing for 5 seconds. Wake me up when it’s done, and in the meantime, feel free to go run other tasks."

FastAPI leverages Starlette, which in turn uses an ASGI server like Uvicorn. ASGI (Asynchronous Server Gateway Interface) is the Python standard for asynchronous web servers. When a request comes in, the ASGI server receives it and passes it to Starlette. If a route handler is defined with async def, Starlette knows it needs to schedule this coroutine on the Python event loop.

The core mechanism is asyncio.sleep(). In our read_item endpoint, await asyncio.sleep(5) yields control back to the event loop. The event loop, which is constantly looking for tasks to run, sees that /fast_task/ is ready to go and executes its handler. Once the 5 seconds are up, asyncio.sleep() completes, and the event loop resumes the read_item coroutine from where it left off, allowing it to return the response. This is the essence of non-blocking I/O: while one request is waiting for an external resource (like a database or an external API, simulated here by asyncio.sleep), the server can concurrently handle other requests that are ready.

The async keyword in Python is what enables functions to be coroutines. A coroutine is a special type of function that can pause its execution and resume later. The await keyword is used inside a coroutine to pause its execution until another awaitable (like another coroutine or a Future) completes. When await is encountered, the coroutine yields control back to the event loop. The event loop then switches to another ready task.

The mental model you need is one of a single thread managing many concurrent tasks. The event loop is the manager. When a task says "I’m going to wait," the manager doesn’t just sit there; it immediately looks for another task that’s ready to work. This is incredibly efficient for I/O-bound operations because waiting for I/O is a common bottleneck, and it’s wasteful to tie up a thread doing nothing but waiting.

If you’re using async def for your route handlers, you should also be awaiting any I/O operations within them. If you call a blocking function (one that doesn’t use await and performs I/O or heavy computation) directly inside an async def handler, you will block the event loop, negating the benefits of asynchronous handling. For truly blocking operations that can’t be made asynchronous, you’d typically run them in a separate thread pool using asyncio.to_thread() (Python 3.9+) or loop.run_in_executor().

The surprising efficiency of this system comes from how it handles concurrency. Instead of needing a new thread for every concurrent request (which has significant overhead), a single thread with an event loop can manage thousands of concurrent I/O-bound operations by rapidly switching between them whenever one is waiting.

The next thing you’ll run into is understanding how to properly manage dependencies and background tasks that need to run even after the request has finished.