ECS tasks can fail to send logs to CloudWatch because the IAM role attached to the task execution doesn’t have permission to publish logs.
Here’s how to fix it:
Cause 1: Missing logs:CreateLogStream and logs:PutLogEvents Permissions
Diagnosis:
Your ECS task definition is likely configured to send logs to CloudWatch, but the IAM role associated with the task execution (the taskRoleArn in your task definition, or the default if not specified) is missing the necessary permissions to create log streams and write log events.
Check:
Navigate to the IAM console, find the role specified in your ECS task definition’s executionRoleArn or taskRoleArn. Examine its attached policies. Look for a policy that grants permissions to CloudWatch Logs.
Fix: Add the following statements to the IAM policy attached to your ECS task execution role:
{
"Effect": "Allow",
"Action": [
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "*"
}
This grants the necessary permissions for the ECS agent to create log streams within your CloudWatch Logs log group and to push log events to those streams.
Cause 2: Incorrect Log Configuration in Task Definition
Diagnosis:
The logConfiguration section within your ECS task definition might be malformed, pointing to a non-existent log group, or using an invalid region.
Check:
Inspect the logConfiguration block in your ECS task definition JSON.
Example of a correct configuration:
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/my-application-logs",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
}
Fix:
Ensure logDriver is set to "awslogs". Verify that awslogs-group specifies a valid log group name (it doesn’t need to exist beforehand; CloudWatch will create it if it doesn’t). Confirm awslogs-region matches the AWS region where your ECS cluster and CloudWatch Logs are located. The awslogs-stream-prefix is optional but good practice for organizing logs.
Cause 3: Log Group Not Created or Incorrectly Named
Diagnosis:
If you’ve specified an awslogs-group in your task definition that doesn’t exist, and the IAM role lacks permissions to create log groups (though logs:CreateLogGroup is less common to omit than PutLogEvents), or if there’s a typo in the log group name, your logs won’t be sent.
Check: Go to the CloudWatch console, navigate to "Log groups" under "Logs". Search for the log group name specified in your task definition.
Fix:
Manually create the log group in the CloudWatch console with the exact name specified in your task definition (e.g., /ecs/my-application-logs). Alternatively, ensure the awslogs-group name in your task definition is spelled correctly.
Cause 4: Incorrect Region Specified in Task Definition
Diagnosis:
The awslogs-region in your task definition’s logConfiguration does not match the actual AWS region where your ECS cluster is running or where you expect to find your logs.
Check:
Compare the awslogs-region value in your task definition with the region of your ECS cluster.
Fix:
Update the awslogs-region in your task definition to match the correct AWS region. For example, if your cluster is in us-west-2, change it to "awslogs-region": "us-west-2".
Cause 5: ECS Agent Issues or Task State
Diagnosis:
Sometimes, the ECS agent on the EC2 instance (if using EC2 launch type) or the Fargate agent might be experiencing transient issues, preventing it from establishing the connection to CloudWatch Logs. The task might also be in a state where it cannot perform actions, like STOPPING or PENDING.
Check:
Check the status of your ECS tasks. If they are stuck in PENDING or STOPPING states, investigate the ECS service events for more details. For EC2 launch types, check the ECS agent logs on the EC2 instance (often found at /var/log/ecs/ecs-agent.log).
Fix:
For Fargate, try recreating the task or service. For EC2, restarting the ECS agent on the affected instance might help. Ensure your ECS service is configured to allow tasks to reach a RUNNING state and stay there.
Cause 6: Network Connectivity Issues
Diagnosis: The ECS tasks, especially if running in a private subnet without a NAT gateway or VPC endpoint for CloudWatch Logs, might not have network access to the CloudWatch Logs API endpoints.
Check:
If your tasks are in private subnets, verify your VPC’s route tables. Ensure there’s a route to the internet via a NAT gateway or an interface VPC endpoint for logs.<region>.amazonaws.com.
Fix: Configure a NAT gateway in your VPC and add a route for your private subnets to direct traffic to the NAT gateway. Alternatively, create a VPC endpoint for CloudWatch Logs in the relevant subnets.
After ensuring all these points, your next immediate concern will likely be understanding how to filter and search these logs efficiently within CloudWatch.