Rust binaries can be surprisingly large and slow if you don’t pay attention to how you’re building them for production.

Let’s see this in action. Imagine we have a simple Rust program:

fn main() {
    let mut sum = 0;
    for i in 0..1_000_000 {
        sum += i;
    }
    println!("Sum: {}", sum);
}

If we build this with the default settings (cargo build), we get an executable. Now, let’s compare its size and performance to a release build.

First, the default build:

cargo build
ls -lh target/debug/rust_optimization_example

You’ll likely see a file size in the megabytes, and running it will be reasonably fast, but not lightning fast.

Now, let’s build for release:

cargo build --release
ls -lh target/release/rust_optimization_example

Notice the dramatic reduction in file size. Running this binary will also be significantly faster. Why? Because cargo build --release does a lot more than just skip the debug symbols.

The core of optimized Rust binaries lies in the release profile, which is configured in your Cargo.toml. When you run cargo build --release, Cargo uses this profile.

Here’s what happens under the hood:

  • Optimization Levels: The most significant factor is the optimization level. The release profile defaults to opt-level = 3. This tells the Rust compiler (LLVM) to apply aggressive optimizations. This includes loop unrolling, function inlining, dead code elimination, and instruction reordering, all aimed at producing the fastest possible code. The debug profile, by contrast, defaults to opt-level = 0, which prioritizes fast compilation times and debuggability over runtime performance.
  • Debug Information: Release builds typically disable or strip debug information (debug = false). This is crucial for two reasons: it drastically reduces the binary size, and it prevents attackers from easily reverse-engineering your code by removing the symbols and line number information that map compiled code back to your source.
  • Link-Time Optimization (LTO): The release profile often enables LTO (lto = true). This is a powerful optimization that happens during the linking stage. Instead of optimizing each crate (Rust’s compilation unit) in isolation, LTO allows the compiler to see across crate boundaries. It can then perform more aggressive inlining and dead code elimination, as it has a global view of the entire program. This can lead to further performance gains and smaller binaries.
  • Codegen Units: codegen-units is set to a lower number (often 1 for release). This means the compiler processes the entire crate as a single unit for optimization. While this takes longer to compile, it allows for more cross-function optimizations within the crate. Debug builds usually have a higher codegen-units to speed up compilation by allowing parallel compilation of different code units.
  • Panic Behavior: Release builds typically set panic = "abort" (or panic = "unwind" if you specifically need unwinding, though abort is usually smaller and faster). panic = "abort" means that if a panic occurs, the program will immediately terminate without attempting to unwind the stack. This avoids the overhead associated with stack unwinding, which can be significant. Debug builds default to panic = "unwind" to facilitate debugging.

You can see and customize these settings in your Cargo.toml:

[package]
name = "rust_optimization_example"
version = "0.1.0"
edition = "2021"

[profile.release]
opt-level = 3       # Optimization level (0-3, 's', 'z')
lto = true          # Link-time optimization
codegen-units = 1   # Number of codegen units
panic = "abort"     # Panic strategy
strip = true        # Strip symbols from binary

The strip = true option is often enabled by default for release profiles and is equivalent to running strip on the resulting binary. It removes symbol tables and other debugging information, further reducing the binary size and making it harder to inspect.

Choosing the right optimization levels and configurations can make the difference between a binary that’s slow and bloated and one that’s lean, fast, and production-ready. For instance, setting opt-level = "z" instead of 3 can prioritize binary size over raw speed, which is useful for embedded systems or when bandwidth is a concern.

The next challenge you’ll face is understanding how to effectively profile these optimized binaries to find the actual bottlenecks, rather than just relying on general optimizations.

Want structured learning?

Take the full Cargo course →