FastAPI can stream files directly to the client without ever touching disk, which is a surprisingly efficient way to serve large assets.

Let’s see it in action. Imagine you have a large video file, say big_video.mp4, sitting in your application’s static directory.

from fastapi import FastAPI, Request
from fastapi.responses import StreamingResponse
import os

app = FastAPI()

@app.get("/stream-video/{filename}")
async def stream_video(filename: str):
    file_path = os.path.join("static", filename) # Assuming your video is in a 'static' folder
    if not os.path.exists(file_path):
        return {"error": "File not found"}, 404

    async def file_iterator(file_path: str):
        with open(file_path, "rb") as f:
            while chunk := f.read(8192): # Read in 8KB chunks
                yield chunk

    return StreamingResponse(file_iterator(file_path), media_type="video/mp4")

# To run this:
# 1. Save the code as main.py
# 2. Create a 'static' directory in the same folder.
# 3. Place a video file named 'big_video.mp4' inside the 'static' directory.
# 4. Run with: uvicorn main:app --reload
# 5. Access in your browser or with curl: http://127.0.0.1:8000/stream-video/big_video.mp4

This StreamingResponse is the core mechanism. Instead of reading the entire file into memory and then sending it, it reads the file in small chunks (8KB in this example) and sends each chunk as it becomes available. This is crucial for large files because it keeps memory usage low, preventing your server from crashing under load. The media_type parameter tells the browser what kind of content it’s receiving, allowing it to start playing the video as soon as enough data has arrived.

The problem this solves is obvious: serving large files efficiently. Traditional web servers or basic application logic might read the whole file into RAM before sending it. For a 1GB video, that’s 1GB of RAM per concurrent user, which quickly becomes unsustainable. FastAPI’s StreamingResponse sidesteps this by treating the file as a stream of bytes. The async def file_iterator function is an asynchronous generator. When StreamingResponse needs more data, it awaits the next yield from this generator. This is where the magic happens: the yield chunk pauses the generator, allowing other tasks to run (like handling other requests), and resumes only when the StreamingResponse requests the next chunk.

The actual levers you control are the chunk_size in f.read(chunk_size) and the media_type. A smaller chunk_size can lead to slightly higher network overhead due to more frequent HTTP headers, but might feel more responsive for very slow connections. A larger chunk_size reduces overhead but might introduce more latency before the first byte is sent. The media_type must accurately reflect the file type for the client to interpret it correctly.

What most people don’t realize is that StreamingResponse isn’t just for files. You can yield any iterable of bytes, including data generated on the fly, such as real-time logs, dynamic CSV reports, or even compressed data streams. The async keyword in async def file_iterator is also critical; it allows the file reading operation itself to be non-blocking, meaning that while one request is waiting for disk I/O to read a chunk, the server can immediately switch to handling another request without being blocked.

The next common hurdle is handling partial content requests (e.g., seeking within a video), which requires implementing the Accept-Ranges and Content-Range headers.

Want structured learning?

Take the full Fastapi course →