BenchmarkDotNet is the de facto standard for benchmarking .NET code, but its real power lies in its ability to reveal performance bottlenecks you didn’t even know existed.
Let’s say you have this simple method that converts a string to an integer:
public class StringParsing
{
[Benchmark]
public int ParseWithTryParse()
{
return int.Parse("12345");
}
[Benchmark]
public int ParseWithTryParseSpan()
{
ReadOnlySpan<char> span = "12345";
return int.Parse(span);
}
}
To run this, you’d create a Program.cs with:
using BenchmarkDotNet.Running;
public class Program
{
public static void Main(string[] args)
{
var summary = BenchmarkRunner.Run<StringParsing>();
}
}
And then run your project. BenchmarkDotNet will spin up multiple processes, run your code thousands of times, and give you a detailed report.
Here’s a snippet of what that report might look like:
| Method | Mean | Error | StdDev | Allocated |
|---|---|---|---|---|
| ParseWithTryParse | 20.345 ns | 0.123 ns | 0.110 ns | 0 B |
| ParseWithTryParseSpan | 15.123 ns | 0.098 ns | 0.087 ns | 0 B |
This tells you ParseWithTryParseSpan is faster and allocates less. But that’s just the surface.
The real magic happens when you add diagnostic tools. Let’s add [MemoryDiagnoser] to our class:
using BenchmarkDotNet.Attributes;
public class StringParsing
{
[Benchmark]
[MemoryDiagnoser]
public int ParseWithTryParse()
{
return int.Parse("12345");
}
[Benchmark]
[MemoryDiagnoser]
public int ParseWithTryParseSpan()
{
ReadOnlySpan<char> span = "12345";
return int.Parse(span);
}
}
Now, the report includes memory allocation details:
| Method | Mean | Error | StdDev | Gen0 | Gen1 | Allocated |
|---|---|---|---|---|---|---|
| ParseWithTryParse | 20.345 ns | 0.123 ns | 0.110 ns | 0.0012 | 0.0000 | 16 B |
| ParseWithTryParseSpan | 15.123 ns | 0.098 ns | 0.087 ns | 0.0000 | 0.0000 | 0 B |
The Allocated column is crucial. It shows that int.Parse("12345") is allocating 16 bytes, while the ReadOnlySpan<char> version allocates nothing. This is because int.Parse(string) needs to create a new string object internally to work with, whereas ReadOnlySpan<char> operates directly on the existing string’s memory.
To go deeper, you can use [DisassemblyDiagnoser] to see the generated assembly code. This is invaluable for understanding why one method is faster than another at a CPU instruction level. It can reveal things like unexpected loop unrolling, branch prediction misses, or efficient vectorization.
You can also add [EventProfiler] to get insights into CPU events, such as cache misses. This helps diagnose performance issues that aren’t directly related to CPU cycles or memory allocations, but rather how efficiently the CPU is accessing data.
The [HardwareCounters] diagnoser can expose even more low-level details, like the number of instructions executed, cycles per instruction, and branch mispredictions. This allows you to tune your code for specific CPU architectures.
Consider the [IterationTimeDiagnoser] which shows the time spent in each iteration of your benchmark. This can be useful for identifying outliers or specific iterations that are taking significantly longer, suggesting a garbage collection event or some other non-deterministic behavior affecting a particular run.
BenchmarkDotNet’s power comes from its ability to layer these diagnostics. You start with basic timing and allocation, then peel back the layers with disassembly and hardware counters to understand the root cause of performance differences. The key is to treat the report not just as a "faster/slower" indicator, but as a diagnostic tool for the underlying machine code and hardware interactions.
The "allocated" column in the MemoryDiagnoser output isn’t just about the total bytes. It also shows the number of garbage collections (Gen0, Gen1, Gen2). A high Gen0 count for a fast method might indicate a lot of short-lived objects being created, which can still impact overall application performance even if the benchmark itself is quick.
The next step after mastering basic benchmarking is understanding how to configure BenchmarkDotNet for different scenarios, such as warm-up strategies and custom job configurations.