Stream Large Responses from Flask with Generator Functions (2026)

Flask’s Response object is surprisingly flexible, and you can make it stream data without holding the entire response in memory by leveraging Python generator functions.

Let’s see it in action. Imagine you have a very large dataset, or you’re fetching data from an external API that returns a lot of information. You don’t want to load all of that into your Flask application’s memory before sending it to the client. Instead, you can yield chunks of data as they become available.

Here’s a basic Flask app demonstrating this:

from flask import Flask, Response, stream_with_context
import time

app = Flask(__name__)

def generate_large_data():
    for i in range(10):
        yield f"Data chunk {i}\n"
        time.sleep(0.5) # Simulate data fetching delay

@app.route('/stream')
def stream_data():
    return Response(stream_with_context(generate_large_data()), mimetype='text/plain')

if __name__ == '__main__':
    app.run(debug=True)

When you hit /stream in your browser or with curl, you won’t get the whole output at once. Instead, you’ll see "Data chunk 0" appear, then after half a second, "Data chunk 1", and so on, until all ten chunks are delivered. This demonstrates that Flask is sending data incrementally.

The magic here is the Response object accepting a generator function directly. Flask iterates over this generator. For each item yielded by the generator, Flask sends it as a chunk of the HTTP response body. The stream_with_context wrapper is crucial because it ensures that the Flask application context (which includes things like request information and session data) is available within the generator function. Without it, if your generator needed access to request or session, it would fail.

The core problem this solves is memory exhaustion for large responses. If you were to build a string or list containing all the data before returning it, a massive dataset could easily crash your server. Streaming allows you to serve data as it’s produced, keeping memory usage low and improving perceived performance for the client, as they start receiving data sooner.

Internally, Flask creates an iterable from your generator. When the Response object is sent back to the client, Flask’s WSGI server begins iterating. Each yield from your generator becomes a piece of the response body. The mimetype is set so the client knows how to interpret the data.

The time.sleep(0.5) in the example is important; it simulates real-world scenarios where data might not be immediately available. This could be due to database queries, external API calls, or complex computations. The generator pattern elegantly handles these asynchronous or time-consuming data retrieval processes without blocking the entire request.

One common pitfall is forgetting stream_with_context. If your generator needs access to request-specific data (like authentication tokens, query parameters, or user information), it will raise a RuntimeError: Working outside of application context if you don’t wrap it. This is because the generator runs in a different context than the request handler by default. stream_with_context propagates the active application context to the generator, making these resources available.

Another detail is that the client must support chunked transfer encoding. Most modern HTTP clients and browsers do, but it’s worth noting that if you were targeting a very old or specialized client, you might encounter issues. Flask and its underlying WSGI server handle the Transfer-Encoding: chunked header automatically when you stream.

The next hurdle you’ll likely face is handling errors gracefully within the generator itself.