T3 instances don’t actually have "unlimited" CPU, they just have a very large buffer of CPU credits.

Here’s how you can keep an eye on your T3 instance’s CPU credit balance before it dips too low, and what to do when it does.

T3 instances operate on a burstable CPU model. They earn CPU credits when they are idle and spend them when they are actively using CPU. If an instance consistently consumes more CPU than it earns, its CPU credit balance will deplete. When the balance hits zero, the instance’s CPU performance will be throttled to a baseline level until it accrues more credits. This can lead to unexpected performance degradation for applications that require sustained CPU.

Monitoring CPU Credit Balance

The most effective way to monitor your T3 instance’s CPU credit balance is by using Amazon CloudWatch. CloudWatch collects metrics from your EC2 instances, including CPU credit usage.

1. Identify the Relevant Metric: The specific CloudWatch metric you need is CPUCreditBalance. This metric reports the number of CPU credits available to your instance.

2. Set Up a CloudWatch Alarm: You’ll want to create a CloudWatch alarm that triggers when the CPUCreditBalance falls below a certain threshold. A good starting point for the threshold is 1000 credits. This gives you a buffer before throttling occurs.

  • To create an alarm via the AWS CLI:

    aws cloudwatch put-metric-alarm \
        --alarm-name T3-CPU-Credit-Low-Alarm \
        --alarm-description "Alarm when T3 instance CPU credit balance is low" \
        --metric-name CPUCreditBalance \
        --namespace AWS/EC2 \
        --statistic Average \
        --period 300 \
        --threshold 1000 \
        --comparison-operator LowerThanThreshold \
        --dimensions Name=InstanceId,Value=i-0123456789abcdef0 \
        --evaluation-periods 2 \
        --datapoints-to-alarm 2 \
        --treat-missing-data notBreaching \
        --alarm-actions arn:aws:sns:us-east-1:123456789012:MySNSTopic
    
    • Replace i-0123456789abcdef0 with your instance ID.
    • Replace us-east-1 and 123456789012 with your region and AWS account ID.
    • Replace arn:aws:sns:us-east-1:123456789012:MySNSTopic with the ARN of your SNS topic to receive notifications.

3. Configure Notifications: When the alarm triggers, you’ll want to be notified. Configure the alarm to send a notification to an Amazon SNS topic. You can then subscribe your email address or other endpoints to this SNS topic.

Common Causes of CPU Credit Depletion and Their Fixes

When you receive an alert that your CPUCreditBalance is low, it means your instance is consistently consuming more CPU than it earns. Here are the most common reasons and how to address them:

  1. Unexpectedly High CPU Load:

    • Diagnosis: Connect to your instance via SSH and run top or htop. Look for processes consuming a disproportionate amount of CPU. Also, check CloudWatch metrics for CPUUtilization to see the overall trend.
    • Fix:
      • Optimize the application: Identify the bottleneck in your application code and optimize it. This might involve refactoring code, improving database queries, or optimizing algorithms.
      • Scale up the instance: If the high CPU load is legitimate and necessary for your workload, consider upgrading to a larger instance type (e.g., from t3.medium to t3.large or m5.large).
        # Example of modifying instance type via AWS CLI
        aws ec2 modify-instance-attribute --instance-id i-0123456789abcdef0 --instance-type '{ "Value": "t3.large" }'
        
      • Why it works: Optimizing the application reduces the demand for CPU. Scaling up provides more baseline CPU performance and a higher credit earning rate.
    • When to consider: This is the most frequent cause.
  2. Background Processes or Scheduled Tasks:

    • Diagnosis: Check your instance’s crontab (crontab -l for the current user, or check /etc/cron.* directories) and systemd timers (systemctl list-timers --all) for any jobs that might be running frequently or consuming significant CPU. Also, check for agents like monitoring agents, log shippers, or backup software that might be misconfigured or running intensive tasks.
    • Fix:
      • Reschedule or disable unnecessary tasks: Adjust the schedule of non-critical tasks to run during off-peak hours or disable them if they are not needed.
      • Configure resource limits: For some agents, you might be able to configure CPU limits.
      • Why it works: By reducing the frequency or intensity of background tasks, you lower the overall CPU demand on the instance.
    • When to consider: If no single application process is obviously at fault.
  3. Misconfiguration of Bursting Behavior:

    • Diagnosis: While less common, it’s possible that an application is designed to burst aggressively and continuously, negating the benefit of the credit system. Analyze the CPUCreditUsage metric alongside CPUUtilization. If CPUCreditUsage is consistently high and CPUUtilization is also high, the instance is simply not designed for this workload.
    • Fix:
      • Modify application behavior: If possible, adjust the application to be less CPU-intensive or to burst in shorter, less frequent intervals.
      • Switch to a fixed-performance instance: For workloads that require sustained high CPU, consider migrating to an instance family like m5, c5, or r5, which offer consistent CPU performance without a credit system.
        # Example of modifying instance type via AWS CLI
        aws ec2 modify-instance-attribute --instance-id i-0123456789abcdef0 --instance-type '{ "Value": "m5.large" }'
        
      • Why it works: Either the application is made more efficient, or the underlying hardware is changed to one that doesn’t rely on credits for performance.
    • When to consider: If the workload is genuinely CPU-bound and the bursting model isn’t a good fit.
  4. High Number of Idle Instances Accumulating Credits:

    • Diagnosis: While not directly causing depletion on a specific instance, if you have many T3 instances that are not using CPU, they accumulate credits. If you then launch a new, CPU-intensive workload on a fresh T3 instance, it starts with zero credits. If you also have an older T3 instance that has been running for a long time and has a large credit balance, you can transfer credits to a new instance.
    • Fix:
      • Transfer credits from an older instance: If you have another T3 instance with a healthy credit balance, you can stop that instance, and its accumulated credits will be lost. However, you can associate an Elastic IP address and EBS volume from the stopped instance to a new instance of the same instance type. This effectively "transfers" the potential for credit accumulation if the new instance is launched with the same configuration. Note: This is a workaround, not a direct credit transfer. A more direct approach is to ensure the new instance is launched with a sufficient number of vCPUs and memory if the workload demands it.
      • Why it works: This isn’t a direct fix for depletion but a way to manage credit pools across instances if you have a mix of active and idle T3s. The fundamental solution remains addressing the CPU load on the instance that is depleting its credits.
    • When to consider: When you have a fleet of T3s and notice some are consistently maxed out while others are idle.
  5. Under-provisioned Instance Type for the Workload:

    • Diagnosis: Review the CPUUtilization metric history. If it’s consistently above 70-80% for extended periods, the instance type might be too small for the sustained workload, even if individual processes aren’t showing extreme spikes.
    • Fix:
      • Upgrade to a larger T3 instance type: If your workload is burstable but requires more raw CPU power than your current T3 offers, move to a larger T3 instance (e.g., t3.large to t3.xlarge).
        aws ec2 modify-instance-attribute --instance-id i-0123456789abcdef0 --instance-type '{ "Value": "t3.xlarge" }'
        
      • Why it works: Larger T3 instances have higher baseline CPU performance and earn CPU credits at a faster rate, providing more capacity to handle bursts.
    • When to consider: When sustained CPU utilization is high, indicating the baseline performance is insufficient.
  6. Network-intensive Workloads Causing CPU Spikes:

    • Diagnosis: Sometimes, high network I/O can indirectly lead to increased CPU usage due to packet processing, firewall rules, or network-related application logic. Check network-related metrics in CloudWatch (NetworkIn, NetworkOut) and correlate them with CPUUtilization.
    • Fix:
      • Optimize network configurations: Ensure your network configuration is efficient. For high-throughput scenarios, consider instances with enhanced networking capabilities or offload features.
      • Offload processing: If possible, offload network processing to dedicated network appliances or services.
      • Why it works: Reducing the CPU overhead associated with network traffic frees up CPU cycles for application tasks.
    • When to consider: If network traffic is abnormally high and coincides with CPU spikes.

Once your CPUCreditBalance is consistently above your alarm threshold, you’ll likely encounter a CPUUtilization metric that is no longer being throttled.

Want structured learning?

Take the full Ec2 course →