You can launch EC2 instances that automatically run scripts on boot, and it’s way more powerful than you might think.
Here’s a new EC2 instance launching and immediately downloading a package, configuring a service, and starting it, all without you touching it after the initial aws ec2 run-instances command:
aws ec2 run-instances \
--image-id ami-0abcdef1234567890 \
--instance-type t2.micro \
--key-name MyKeyPair \
--security-group-ids sg-0123456789abcdef0 \
--user-data file://bootstrap.sh \
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=MyBootstrappedInstance}]'
And here’s the bootstrap.sh script that makes it happen:
#!/bin/bash
# Update package lists
apt-get update -y
# Install Nginx
apt-get install nginx -y
# Create a simple index.html
echo "<h1>Hello from User Data!</h1>" > /var/www/html/index.html
# Start Nginx
systemctl start nginx
systemctl enable nginx
When you run that aws ec2 run-instances command, the bootstrap.sh script is sent to the EC2 instance and executed as the root user during the instance’s first boot. This is incredibly useful for setting up servers automatically: installing software, configuring services, downloading application code, or even registering the instance with a load balancer.
The core idea is that EC2 instances have a special metadata service accessible via a link-local IP address (169.254.169.254). User data is one of the pieces of information you can provide when launching an instance, and the EC2 agent on the instance fetches this data from the metadata service on boot.
The agent then executes the script. It’s important to note that user data runs only on the first boot. If you stop and start an instance, user data does not re-run. For repeated configurations or updates, you’d typically use tools like cloud-init, which can be configured to run scripts on every boot, or use configuration management tools like Ansible, Chef, or Puppet that are installed via user data and then manage the instance state.
The cloud-init service, which is pre-installed on most Amazon Linux, Ubuntu, and other popular AMIs, is the workhorse here. It’s the service that actually reads and executes the user data. You can provide cloud-init with directives in various formats (shell scripts, cloud-config YAML), and cloud-init handles running them in the correct order and with the right permissions.
Here’s how cloud-init processes user data. When an instance boots, cloud-init checks for user data. If found, it processes it. For shell scripts, it simply executes them. For cloud-config data, it parses the YAML and performs actions like installing packages, creating files, or running commands. This is why the #!/bin/bash shebang at the top of your script is crucial – it tells the system which interpreter to use.
You can also access other instance metadata from your user data scripts. For example, to get the instance ID:
INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
echo "This instance ID is: $INSTANCE_ID" >> /var/log/user-data.log
This allows your bootstrap scripts to be dynamic and aware of their environment. They can fetch their own instance ID, private IP address, public IP address, and even the region they are running in.
The maximum size for user data is 16KB. If your script is larger, you’ll need to store it in S3 and then have your user data script download it.
#!/bin/bash
aws s3 cp s3://my-bootstrap-scripts-bucket/complex_setup.sh /tmp/complex_setup.sh
chmod +x /tmp/complex_setup.sh
/tmp/complex_setup.sh
This is a common pattern for more involved bootstrap processes. The small user data script acts as a downloader and executor for a larger, more complex script stored in S3.
The cloud-init service also provides a mechanism to output logs, which are invaluable for debugging. You can find these logs in /var/log/cloud-init.log and /var/log/cloud-init-output.log. The latter usually contains the standard output and standard error of your user data script.
If you stop and start an instance, the user data script will not run again. To achieve that, you can use cloud-init’s ability to run scripts on every boot. You would typically achieve this by placing your script in /etc/rc.local (though this is becoming deprecated) or by creating a systemd service that runs your script on boot. A more robust cloud-init approach is to use the runcmd directive in a cloud-config YAML.
#cloud-config
runcmd:
- [ apt-get, update, -y ]
- [ apt-get, install, nginx, -y ]
- [ systemctl, start, nginx ]
- [ systemctl, enable, nginx ]
This YAML can be passed as user data instead of a shell script. cloud-init parses this and executes the commands. The runcmd directive is executed on every boot.
The most surprising thing about user data is its lifecycle tied to the instance’s creation, not its running state. You can provide user data at launch, and it executes once. If you need to re-run configuration or perform actions on every boot, you must explicitly configure that within your user data script or use cloud-init’s runcmd or other directives designed for repeated execution. It’s not a magic "run this on reboot" button out of the box for shell scripts.
The next thing you’ll likely encounter is managing secrets within your user data. Embedding credentials directly is a security risk. You’ll want to explore more secure methods like using AWS Secrets Manager or Parameter Store, and fetching those secrets within your bootstrap scripts.