Rust’s build system, Cargo, can become a significant bottleneck for large teams working in monorepos.
Let’s see how Cargo handles a large monorepo, simulating a common scenario. Imagine a workspace with several crates, say app_server, api_client, and shared_utils.
# Cargo.toml in the workspace root
[workspace]
members = [
"crates/app_server",
"crates/api_client",
"crates/shared_utils",
]
When you run cargo build --workspace, Cargo traverses the dependency graph. If app_server depends on api_client, which in turn depends on shared_utils, Cargo will build shared_utils first, then api_client, and finally app_server. This seems straightforward, but in a monorepo with hundreds of crates and complex interdependencies, the build times can explode.
The core problem is not just the number of crates, but the redundant work Cargo might do. Without proper configuration, Cargo might recompile dependencies across different build targets or even rebuild unchanged crates when only a small part of the monorepo has changed. For a team of 50 Rust developers, a 30-minute build time for a simple change is unacceptable.
Here’s how to mitigate this:
1. Cargo Workspaces: You’re already using this, but it’s the foundational piece. Workspaces allow you to manage multiple related crates within a single Cargo.toml at the root of your repository. This enables features like shared target directories and unified dependency management.
2. Incremental Compilation: Cargo’s incremental compilation is enabled by default and is crucial. It works by storing intermediate build artifacts. If a source file hasn’t changed, Cargo tries to reuse the previously compiled output.
3. Parallel Compilation: Cargo automatically uses all available CPU cores for compilation. You can see this in action with cargo build -j 8 (where 8 is the number of jobs). If you have an 8-core machine, Cargo will try to build 8 things simultaneously.
4. Shared Target Directory: By default, each crate in a workspace has its own target directory. This leads to a lot of duplicated compilation. You can unify this by setting CARGO_TARGET_DIR environment variable or by using a .cargo/config.toml file in your workspace root:
# .cargo/config.toml
[build]
target-dir = "target" # This will be relative to the workspace root
This ensures all crates within the workspace build into the same target directory, significantly reducing redundant compilation of common dependencies.
5. Dependency Pinning and Version Management: In a monorepo, it’s essential to have a consistent view of dependencies. Use a single version for each external dependency across all crates in the workspace. This prevents Cargo from having to build multiple versions of the same external crate. Tools like cargo-edit or simply manual updates in Cargo.toml files are key.
6. cargo-watch and Incremental Rebuilds: For development, constantly running full cargo build --workspace is inefficient. Tools like cargo-watch allow you to monitor file changes and automatically recompile only the affected crates.
# Example: watch and build only the app_server crate when its files change
cargo watch -x "build -p app_server"
This command watches files in the current directory and runs cargo build -p app_server whenever a change is detected. The -p flag tells Cargo to only build the specified package.
7. Local Registry for Internal Crates: If you have many internal crates that are not yet published, consider using a local Cargo registry. This can be achieved by pointing Cargo to a local directory that acts as a registry. This helps Cargo resolve internal dependencies more efficiently, especially when dealing with a very large number of internal crates.
8. Build Caching with sccache: For even more aggressive caching, especially in CI environments, integrate sccache. sccache is a compiler cache that can store and retrieve pre-compiled artifacts. You can configure Cargo to use sccache by setting the RUSTC_WRAPPER environment variable:
# In your CI environment or local shell
export RUSTC_WRAPPER=sccache
cargo build --workspace
sccache can cache across different machines if you configure a shared remote cache (e.g., S3, Redis).
The most surprising thing about optimizing Cargo for monorepos is that the biggest gains often come from ensuring Cargo doesn’t do work it doesn’t need to, rather than just making the compilation process faster. This means aggressively sharing build artifacts and ensuring Cargo has a clear, unified view of your project’s dependencies. A common pitfall is neglecting the shared target directory, leading to massive duplication of effort.
The next challenge you’ll likely face is optimizing test execution time across a large monorepo.