Caching dependencies in CircleCI for Node.js and Python projects can dramatically speed up your builds by avoiding redundant downloads.
Here’s a typical Node.js build with caching:
jobs:
build:
docker:
- image: cimg/node:18.17.0
steps:
- checkout
- restore_cache:
keys:
- node-deps-{{ checksum "package-lock.json" }}
- node-deps-
- run: npm install
- save_cache:
key: node-deps-{{ checksum "package-lock.json" }}
paths:
- node_modules/
And here’s a Python equivalent:
jobs:
build:
docker:
- image: cimg/python:3.11.4
steps:
- checkout
- restore_cache:
keys:
- python-deps-{{ checksum "poetry.lock" }}
- python-deps-
- run: |
python -m pip install --user pipenv
pipenv install --deploy --system
- save_cache:
key: python-deps-{{ checksum "poetry.lock" }}
paths:
- ~/.cache/pip # For pip
- ~/.local/share/virtualenvs # For pipenv
The core idea is to store your downloaded dependencies on CircleCI’s infrastructure. When a new build starts, CircleCI first tries to restore_cache using a set of keys. The most specific key is usually based on a checksum of your dependency manifest file (like package-lock.json or poetry.lock). If a matching cache is found, it’s downloaded and placed in the specified paths. This bypasses the npm install or pipenv install step entirely. If no cache is found, the installation command runs as usual. Finally, after a successful installation, save_cache is used to upload the newly installed dependencies for future builds, keyed by the same checksum. The fallback keys (node-deps- and python-deps-) ensure that even if the dependency manifest changes in a way that doesn’t alter the checksum (a rare but possible scenario), you still have a chance to restore a general cache.
The keys are evaluated in order. The first one that matches a previously saved cache wins. The {{ checksum "..." }} syntax is crucial; it generates a unique cache key whenever your dependency file changes. This ensures that you only use cached dependencies when they are guaranteed to be compatible with your project’s exact requirements. For Python, ~/.cache/pip is the default location for pip’s cache, and ~/.local/share/virtualenvs is where pipenv stores its virtual environments when installed with --system.
When using pip with a requirements.txt file, the setup looks slightly different:
jobs:
build:
docker:
- image: cimg/python:3.11.4
steps:
- checkout
- restore_cache:
keys:
- python-pip-{{ checksum "requirements.txt" }}
- python-pip-
- run: |
python -m pip install --user pip
pip install -r requirements.txt
- save_cache:
key: python-pip-{{ checksum "requirements.txt" }}
paths:
- ~/.cache/pip
Here, the cache key is tied to requirements.txt, and the paths point to ~/.cache/pip, which is where pip stores downloaded wheels and source distributions. This prevents re-downloading packages every time, even if the requirements.txt file itself hasn’t changed. The --user flag for pip install is often used to install packages into the user’s home directory, avoiding permission issues within Docker containers.
The real magic of restore_cache and save_cache lies in their subtle interaction with the paths. CircleCI doesn’t just dump files; it creates a compressed archive of the specified paths and uploads/downloads it. This means that if your node_modules directory contains unrelated files or has been modified in ways not tracked by package-lock.json (which is unlikely but theoretically possible), the cache might still be restored. However, the primary mechanism for cache invalidation is the key itself. If the key doesn’t match, the cache is ignored.
One common pitfall is not including all necessary directories in the paths for save_cache. For example, if your Python project uses Poetry and you only cache ~/.cache/pip, but your poetry.lock file also implies specific versions of tools installed globally or in a virtual environment managed by Poetry, those might not be restored. You might need to add ~/.local or the virtual environment directory itself if it’s not automatically managed by the cache key. Similarly, for Node.js, if you have global npm packages or other build artifacts outside node_modules that are critical, they must be explicitly included.
The most surprising true thing about CircleCI caching is that it’s not just about speed; it’s a fundamental mechanism for ensuring build reproducibility. When a cache is restored, you’re getting the exact same set of dependencies that were used in a previous successful build. This can help catch subtle environmental differences that might otherwise go unnoticed.
The next concept you’ll likely grapple with is optimizing cache invalidation. You might find yourself with stale caches that are slowing down builds or causing unexpected failures, leading you to explore more advanced cache key strategies or manual cache management.