CDK’s bundling option can feel like a black box, but it’s essentially a way to run arbitrary Docker commands to prepare your Lambda assets.
Let’s say you have a Python Lambda function that needs a specific native dependency, like psycopg2, which isn’t available on the standard Lambda runtime layers. You also want to ensure your project is built in a clean, reproducible environment.
Here’s how we can set this up in CDK:
from aws_cdk import (
aws_lambda as lambda_,
Stack,
RemovalPolicy
)
from constructs import Construct
import os
class LambdaDockerBundleStack(Stack):
def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
super().__init__(scope, construct_id, **kwargs)
# Define the path to your Lambda function code
lambda_path = os.path.join(os.path.dirname(__file__), "lambda_src")
# Create the Lambda function with custom bundling
my_lambda = lambda_.Function(
self, "MyBundledLambda",
runtime=lambda_.Runtime.PYTHON_3_9,
handler="handler.main",
code=lambda_.Code.from_asset(
lambda_path,
bundling={
"image": lambda_.Runtime.PYTHON_3_9.bundling_image,
"command": [
"bash", "-c",
"pip install --no-cache-dir psycopg2-binary -t /asset-output && cp -au . /asset-output"
],
"user": "root", # Often needed for pip install in Docker
"output_type": lambda_.BundlingOutput.ARCHIVE,
}
),
memory_size=256,
timeout=cdk.Duration.seconds(30)
)
In this example, we’re telling CDK to use the standard Python 3.9 Docker image provided by AWS Lambda. The command is the crucial part:
"bash", "-c": This executes the subsequent string as a bash command."pip install --no-cache-dir psycopg2-binary -t /asset-output": This installs thepsycopg2-binarypackage directly into the/asset-outputdirectory, which is where CDK expects the bundled code to land.--no-cache-dirprevents the Docker image from growing unnecessarily."&& cp -au . /asset-output": This copies all the files from your original Lambda source directory (which is mounted into the Docker container at/asset-inputby default) into/asset-output. The-aflag ensures permissions and ownership are preserved, and-uonly copies newer files.
The user: "root" is often necessary because pip install typically requires root privileges within the Docker container to write to system directories. output_type: lambda_.BundlingOutput.ARCHIVE tells CDK to expect the output as a zip archive, which is standard for Lambda deployments.
When you synthesize this CDK stack, it will build a Docker image, run the specified command inside it, and then take the contents of /asset-output and package them up as your Lambda deployment artifact. This ensures that psycopg2 is compiled for the correct architecture and available in your Lambda function, even if it’s a native dependency.
The real magic here is that the bundling object is highly flexible. You’re not limited to just pip install. You can run any shell command, execute build scripts, compile code, or even run other package managers.
Imagine you needed to compile a Rust binary for your Lambda. You could use a Docker image with the Rust toolchain installed and a command like:
bundling={
"image": lambda_.DockerImage.from_registry("rust:latest"),
"command": [
"bash", "-c",
"cargo build --release && cp target/release/my_binary /asset-output/"
],
"user": "root",
"output_type": lambda_.BundlingOutput.ARCHIVE,
}
This allows you to bring your own build environment and tooling into the CDK deployment process, making it incredibly powerful for complex Lambda function setups.
A common pitfall is forgetting to copy your actual Lambda handler code (handler.py in our example) into /asset-output. The cp -au . /asset-output part is essential for this. If you omit it, your Lambda function will only contain the installed packages, not your code.
Another detail is understanding the working directory within the Docker container. By default, your Lambda code is mounted at /asset-input. The . in cp -au . refers to this directory. The /asset-output directory is where the final bundle should be placed.
What most people don’t realize is that the bundling configuration can also specify local commands that run without Docker if the asset isn’t modified. This is an optimization for faster builds when your Lambda code hasn’t changed. However, for dependencies like native libraries that must be built in a specific environment, relying on the Docker image is the robust approach.
The next thing you’ll likely encounter is managing multiple dependencies and build steps. You might want to chain commands using && or even execute a custom shell script within the Docker container for more complex build logic.