Criterion is the de facto standard for benchmarking Rust code, offering much more than just timing your functions.
Let’s see it in action. Imagine you have a function that calculates Fibonacci numbers recursively. A naive implementation might look like this:
fn fibonacci_recursive(n: u64) -> u64 {
match n {
0 => 0,
1 => 1,
_ => fibonacci_recursive(n - 1) + fibonacci_recursive(n - 2),
}
}
To benchmark this with Criterion, you’d first add criterion to your Cargo.toml:
[dev-dependencies]
criterion = "0.5"
[[bench]]
name = "my_benchmark"
harness = false
Then, create a benches/my_benchmark.rs file:
use criterion::{black_box, criterion_group, criterion_main, Criterion};
fn fibonacci_recursive(n: u64) -> u64 {
match n {
0 => 0,
1 => 1,
_ => fibonacci_recursive(n - 1) + fibonacci_recursive(n - 2),
}
}
fn benchmark_fibonacci(c: &mut Criterion) {
c.bench_function("fibonacci_recursive 20", |b| {
b.iter(|| fibonacci_recursive(black_box(20)))
});
}
criterion_group!(benches, benchmark_fibonacci);
criterion_main!(benches);
Running cargo bench will now compile and execute this benchmark. Criterion will run your function multiple times, collect statistics, and present a detailed report. You’ll see output like this (simplified):
fibonacci_recursive 20
time: [15.234 µs 15.456 µs 15.789 µs]
change: [-1.234% +0.456% +2.345%] (p = 0.98)
No change in performance detected.
Found 12 outliers among 100 measurements.
The core problem Criterion solves is providing reliable performance measurements in the face of modern CPU complexities like caching, branch prediction, and varying instruction pipelines. Simple println!("Time: {:?}", Instant::now() - start); is wildly unreliable. Criterion performs statistical analysis to account for this noise.
Internally, criterion.bench_function sets up a benchmark. The closure you provide (|b| { b.iter(|| fibonacci_recursive(black_box(20))) }) is where the magic happens. b.iter repeatedly calls the inner closure (|| fibonacci_recursive(black_box(20))) until it has enough data. black_box is crucial; it’s a function that prevents the compiler from optimizing away the code you’re trying to measure, ensuring it’s actually executed.
The criterion_group! macro defines a benchmark group, and criterion_main! generates the main function that runs all defined groups. You can group related benchmarks together for better organization. For instance, if you had an iterative Fibonacci function, you could add another bench_function to the same benchmark_fibonacci function or create a new benchmark function and add it to the criterion_group!.
c.bench_with_input is another useful method. It allows you to benchmark a function with different input values, automatically generating separate measurements for each. This is invaluable for understanding how performance scales with input size. For example:
fn benchmark_fibonacci_with_inputs(c: &mut Criterion) {
let mut group = c.benchmark_group("Fibonacci");
for n in [10, 20, 30].iter() {
group.bench_with_input(
criterion::BenchmarkId::new("fibonacci_recursive", n),
n,
|b, &n_val| b.iter(|| fibonacci_recursive(black_box(n_val))),
);
}
group.finish();
}
This would create a benchmark group named "Fibonacci" with three individual benchmarks for fibonacci_recursive with inputs 10, 20, and 30.
The most surprising aspect is how much statistical rigor Criterion applies. It doesn’t just take the average time. It performs a series of statistical tests (like the Student’s t-test) to determine if there’s a statistically significant difference between runs or against a baseline, and it identifies and reports outliers. This means you can trust its "no change" conclusions more than a quick manual timing.
Beyond basic timing, Criterion can also perform memory profiling and generate detailed HTML reports with interactive graphs that let you visually inspect performance trends and distributions. These reports are invaluable for deep dives into performance bottlenecks.
The next step after mastering basic benchmarking is exploring parameter sweeps and custom metrics to understand complex performance characteristics.