Profile Flask Application Performance with Werkzeug and py-spy (2026)

Werkzeug’s debugger is your first line of defense when Flask applications misbehave, but for performance tuning, you need to look deeper.

Let’s see it in action. Imagine a Flask app with a route that simulates some heavy computation:

from flask import Flask
import time

app = Flask(__name__)

@app.route('/slow')
def slow_route():
    time.sleep(2) # Simulate work
    return "Done!"

if __name__ == '__main__':
    app.run(debug=True, port=5000)

When you hit http://127.0.0.1:5000/slow in your browser, it’ll take two seconds to respond. But why two seconds? Is it time.sleep? Is it something else within the Flask request lifecycle?

This is where py-spy comes in. It’s a sampling profiler that can attach to a running Python process without modifying its code.

First, install py-spy:

pip install py-spy

Now, in a separate terminal, find your Flask application’s process ID (PID). If you’re running the example above, it might look something like this:

ps aux | grep "python -m flask run"

Let’s say the PID is 12345. You can then run py-spy to capture a snapshot of what your application is doing:

py-spy record -o profile.svg --pid 12345

This command tells py-spy to record for a while (it will keep running until you press Ctrl+C) and output the results as an SVG file named profile.svg. Open this SVG in your browser. You’ll see a flame graph.

The flame graph visually represents your application’s call stack over time. The wider a bar, the more time spent in that function. In our example, you’ll likely see a prominent bar for time.sleep within the slow_route function. This immediately tells you where the bottleneck is.

To understand the full mental model, think about how Python executes code. When a request comes in, Flask, through Werkzeug, handles routing, calls your view function, and returns a response. Each step involves function calls. Profiling, especially sampling profiling with py-spy, is like taking rapid-fire snapshots of the active function call stack. By aggregating these snapshots, py-spy builds a picture of which functions are consuming the most CPU time.

The py-spy record command is powerful. You can also use py-spy top to see a live, top-like view of your Python process, showing which functions are currently consuming the most CPU. This is great for interactive debugging.

py-spy top --pid 12345

You’ll see output similar to this, updating in real-time:

Total Samples: 150
%Own Time  %Total Time  Function
---------- -----------  ------------------------------------------
100.00%    100.00%      time.sleep
 0.00%      0.00%      _thread.bootstrap_inner
 0.00%      0.00%      builtins.exec

This confirms that time.sleep is the dominant function. For more complex applications, you might see a breakdown of your own application’s functions, Flask’s internal functions, or even library calls, helping you pinpoint where to optimize.

The key levers you control are:

The target process: You specify the PID.
The duration of the profiling: For record, you press Ctrl+C. For top, it runs continuously.
The output format: SVG for flame graphs, or just live terminal output.

One thing that often surprises people is how much time can be spent in seemingly innocuous library calls or even Python’s internal mechanisms when you’re not actively looking. You might expect to see your application code dominate, but sometimes a poorly optimized dependency or an unexpected interaction can be the culprit, and py-spy will reveal it. For instance, you might see significant time spent in json.loads if you’re deserializing large JSON payloads inefficiently.

After identifying bottlenecks with py-spy, the next step is often to dive into the specific code sections revealed and optimize them, or to investigate alternative libraries or algorithms.