Container Insights is a more powerful way to collect, aggregate, and summarize metrics and logs from your containerized applications and microservices.
Here’s an EKS cluster running a simple Nginx deployment and collecting metrics with Container Insights:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:latest
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: nginx-service
spec:
selector:
app: nginx
ports:
- protocol: TCP
port: 80
targetPort: 80
type: LoadBalancer
This setup deploys three Nginx pods and exposes them via a LoadBalancer service. Now, let’s enable Container Insights.
The core of Container Insights relies on a DaemonSet that runs a fluentd or fluent-bit collector on each EKS node. This collector is responsible for gathering container logs and metrics and forwarding them to CloudWatch.
To enable Container Insights, you need to deploy the CloudWatch agent as a DaemonSet. This DaemonSet is typically managed by an AWS-provided Helm chart or a CloudFormation template.
Here’s how you’d typically install it using the AWS CLI and kubectl after setting up the necessary IAM permissions (which are crucial and often overlooked – ensure your EKS nodes have an IAM role that allows them to send data to CloudWatch Logs and Metrics):
First, create a namespace for the agent:
kubectl create namespace amazon-cloudwatch
Next, apply the CloudWatch agent DaemonSet manifest. You can get the latest version of this manifest from the official AWS documentation. A simplified example of what’s inside that manifest looks like this, focusing on the key parts:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: cloudwatch-agent
namespace: amazon-cloudwatch
spec:
selector:
matchLabels:
app: cloudwatch-agent
template:
metadata:
labels:
app: cloudwatch-agent
spec:
containers:
- name: cloudwatch-agent
image: public.ecr.aws/amazoncloudwatch/cloudwatch-agent:latest # Use a specific, stable version in production
command:
- /opt/aws/bin/cloudwatch-agent-ctl
- -a
- fetch-config
- -m
- ec2
- -c
- ssm:AmazonCloudWatch-ContainerInsights-EKS
- -f
- /etc/cloudwatch-agent/container-insights-fluent-bit-config.yaml
volumeMounts:
- name: host-log
mountPath: /var/log
- name: host-docker
mountPath: /var/lib/docker
- name: host-run
mountPath: /var/run/docker.sock
volumes:
- name: host-log
hostPath:
path: /var/log
- name: host-docker
hostPath:
path: /var/lib/docker
- name: host-run
hostPath:
path: /var/run/docker.sock
The critical part of this configuration is the fetch-config command. It tells the agent to download a pre-defined configuration for Container Insights from AWS Systems Manager (SSM) Parameter Store. The parameter name ssm:AmazonCloudWatch-ContainerInsights-EKS is key here – it specifies the Container Insights configuration tailored for EKS. This configuration includes rules for parsing logs and collecting specific metrics like CPU, memory, network, and disk I/O from containers, pods, and nodes.
Once the DaemonSet is running, you should start seeing metrics and logs appearing in CloudWatch under the "Container Insights" section. You’ll find aggregated views for your cluster, nodes, pods, and services.
The system works by having the fluent-bit (or fluentd) process, running as a DaemonSet, collect container runtime information (like Docker or containerd stats) and logs. It then formats this data and sends it to CloudWatch Logs for log aggregation and to CloudWatch Metrics with specific dimensions (like ClusterName, PodName, ContainerName) for detailed performance analysis.
The container-insights-fluent-bit-config.yaml file, fetched via SSM, contains the parsing rules and output destinations. It’s configured to look for specific log formats and to collect metrics from the container runtime API.
The most surprising true thing about Container Insights is that it doesn’t just collect metrics; it also correlates them with your Kubernetes resource definitions. When you look at a pod’s metrics in CloudWatch, you’ll see dimensions that directly map back to Kubernetes concepts like PodName, Namespace, ClusterName, and even ReplicaSet and Deployment names, allowing you to trace performance issues from the node all the way up to your application deployment.
The next concept you’ll want to explore is setting up custom metrics and log forwarding for your applications, beyond what Container Insights provides out-of-the-box.