Write WSGI Middleware for Flask Applications (2026)

WSGI middleware lets you wrap your Flask application in other WSGI applications, effectively adding pre- or post-processing steps to every request and response.

Here’s a basic Flask app:

from flask import Flask

app = Flask(__name__)

@app.route('/')
def hello_world():
    return 'Hello, World!'

if __name__ == '__main__':
    app.run(debug=True)

Now, let’s create a simple middleware that logs the request method and path. This middleware is itself a WSGI application.

import time

class LoggingMiddleware:
    def __init__(self, app):
        self.app = app

    def __call__(self, environ, start_response):
        start_time = time.time()
        method = environ.get('REQUEST_METHOD', 'UNKNOWN')
        path = environ.get('PATH_INFO', '/')

        print(f"Incoming request: {method} {path}")

        # Call the downstream WSGI application
        response_body = self.app(environ, start_response)

        end_time = time.time()
        duration = (end_time - start_time) * 1000  # milliseconds
        print(f"Request processed in {duration:.2f}ms")

        # Yield the response body
        for data in response_body:
            yield data

To use this middleware, you wrap your Flask app instance:

from flask import Flask
import time

app = Flask(__name__)

@app.route('/')
def hello_world():
    return 'Hello, World!'

class LoggingMiddleware:
    def __init__(self, app):
        self.app = app

    def __call__(self, environ, start_response):
        start_time = time.time()
        method = environ.get('REQUEST_METHOD', 'UNKNOWN')
        path = environ.get('PATH_INFO', '/')

        print(f"Incoming request: {method} {path}")

        # Call the downstream WSGI application
        # This returns an iterable (the response body)
        response_iterable = self.app(environ, start_response)

        # We need to iterate through the response to ensure
        # the downstream app's __call__ is fully executed,
        # and to capture any potential exceptions it might raise.
        # We also want to time the entire process, including the downstream app.
        response_data = []
        try:
            for data in response_iterable:
                response_data.append(data)
                yield data
        finally:
            end_time = time.time()
            duration = (end_time - start_time) * 1000  # milliseconds
            print(f"Request processed in {duration:.2f}ms")

# Wrap the Flask app with the middleware
application = LoggingMiddleware(app)

if __name__ == '__main__':
    # When running directly with `app.run()`, it bypasses the WSGI server.
    # To properly test WSGI middleware, you'd typically use a WSGI server like Gunicorn.
    # For demonstration, we'll show how it would be used with a server.
    # In a real scenario, you'd deploy `application` to Gunicorn.
    print("To run this with middleware, use a WSGI server like Gunicorn:")
    print("gunicorn -w 4 my_module:application")
    # If you *must* run it directly for quick testing (not recommended for production):
    # app.run(debug=True) # This will NOT use the middleware

When deployed with a WSGI server like Gunicorn, you’d typically point the server at your middleware-wrapped application object. For example, if your code is in my_app.py and the wrapped app is named application, you’d run:

gunicorn -w 4 my_app:application

The application variable in my_app.py would be an instance of LoggingMiddleware that wraps the original Flask app. When Gunicorn receives a request, it calls application.__call__(environ, start_response). LoggingMiddleware.__call__ then logs the request details, calls the original Flask app (self.app), and then logs the duration before yielding the response data.

The key to WSGI middleware is understanding the __call__ method. It must accept environ (a dictionary containing request information) and start_response (a callable provided by the WSGI server). The middleware’s __call__ method is responsible for either directly generating a response or calling the next WSGI application in the chain (which could be another middleware or the final application) and then processing its response.

A common pattern is to have middleware that modifies the response after the downstream application has run. For instance, to add a custom header:

class CustomHeaderMiddleware:
    def __init__(self, app, header_name, header_value):
        self.app = app
        self.header_name = header_name
        self.header_value = header_value

    def __call__(self, environ, start_response):
        # This function will be called by the WSGI server
        # to get the response from our middleware.
        def custom_start_response(status, headers, exc_info=None):
            # Add our custom header before passing it to the original start_response
            headers.append((self.header_name, self.header_value))
            return start_response(status, headers, exc_info)

        # Call the downstream WSGI application with our modified start_response callable
        response_iterable = self.app(environ, custom_start_response)

        # Yield the response body
        for data in response_iterable:
            yield data

# To use this, you'd chain it:
# application = CustomHeaderMiddleware(LoggingMiddleware(app), 'X-Powered-By', 'MyAwesomeApp')

In CustomHeaderMiddleware, we don’t directly call start_response. Instead, we create a new callable, custom_start_response, which internally calls the original start_response but injects our custom header into the headers list. This allows us to intercept and modify the response headers before they are sent to the client.

The power of WSGI middleware lies in its composability. You can stack multiple middleware components, each performing a specific task (authentication, logging, compression, request modification, response modification), creating a robust request processing pipeline without cluttering your core application logic. Each middleware component is an independent WSGI application, making them reusable across different frameworks and applications.

What most people don’t realize is that Flask’s app object is a WSGI application. When you run app.run(), Flask uses its built-in development server, which is WSGI-compliant. However, this development server is not intended for production. In production, you’ll use a dedicated WSGI server like Gunicorn, uWSGI, or Waitress, which will correctly call the __call__ method of your outermost WSGI application (which would be your final middleware if you’re using any).

The next concept you’ll likely encounter is creating more complex middleware that needs to interact with the response body, such as for compression or HTML modification, which often involves buffering the response.