EC2 instances can run out of memory just like your laptop, and CloudWatch, your AWS monitoring service, doesn’t track it by default.

Here’s a quick demo of how you’d set up custom memory metrics. First, on your EC2 instance, you need the CloudWatch agent installed. If you don’t have it, you’d typically download it using wget and then install it with sudo rpm -iv cloudwatch-agent-*.rpm (for Amazon Linux/CentOS) or sudo dpkg -i cloudwatch-agent_*.deb (for Ubuntu).

Once installed, you need a configuration file. This is usually a JSON file, say /opt/aws/amazon-cloudwatch-agent/bin/config.json. Here’s a snippet focusing on memory:

{
  "agent": {
    "metrics_collection_interval": 60
  },
  "metrics": {
    "namespace": "EC2Memory",
    "metrics_collected": {
      "mem": {
        "measurement": [
          "mem_used_percent"
        ],
        "total_mem": 0,
        "append_dimensions": {
          "InstanceId": "${aws:InstanceId}"
        }
      }
    }
  }
}

The metrics_collection_interval tells the agent how often to gather data (60 seconds here). The namespace is how you’ll find these metrics in CloudWatch – I’ve chosen EC2Memory. Under metrics_collected, mem tells it we want memory stats. mem_used_percent is the specific metric we’re after. total_mem: 0 tells the agent to auto-detect the total memory. append_dimensions is crucial; InstanceId: "${aws:InstanceId}" automatically tags the metric with the EC2 instance it came from, so you can easily filter and alarm on specific machines.

After saving that config file, you’d start the agent with:

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json -s

The -m ec2 specifies it’s running on EC2, -c file:... points to your config, and -s starts it.

Now, if you go to the CloudWatch console, under "Metrics," you should find a new namespace called "EC2Memory." Within that, you’ll see metrics like mem_used_percent dimensioned by InstanceId. You can then create alarms based on this, for instance, an alarm that triggers if mem_used_percent goes above 80% for 5 minutes.

The problem this solves is blind spots. EC2 instances can become unresponsive or crash due to memory exhaustion, and without custom metrics, you’re flying blind. You might see CPU pegged, but that doesn’t tell the whole story.

Internally, the CloudWatch agent is a daemon that runs on the instance. It reads system statistics using libraries (like libstatgrab or direct /proc filesystem access on Linux) and then batches these metrics up to the CloudWatch API. The append_dimensions part is key because it leverages EC2 instance metadata (${aws:InstanceId}) to automatically enrich the metrics with the instance’s identity, making them actionable in CloudWatch.

The exact levers you control are the collection interval, the specific memory metrics you choose (e.g., mem_used_percent, mem_available_percent, swap_used_percent), and the CloudWatch namespace. You can also add other system metrics like disk I/O or network traffic to the same configuration file.

Most people don’t realize that the mem_used_percent metric reported by the agent is calculated as (total_memory - available_memory) / total_memory * 100. What’s often misunderstood is that "available memory" includes cache and buffers on Linux. So, a high percentage might still be fine if the kernel is effectively using that memory for caching. You might want to alarm on mem_available_percent falling below a true low threshold, which is a more direct indicator of memory pressure.

Once you’ve got memory alarms set up, the next step is usually to automate remediation, perhaps by triggering an Auto Scaling action or a Lambda function to investigate or restart the offending process.

Want structured learning?

Take the full Ec2 course →