IaC Testing: Terratest, InSpec, Packer Secrets

You can’t actually test infrastructure code before you apply it to production; you can only test the effects of applying it.

Here’s how to get a grip on that:

Let’s say you’re managing AWS resources with Terraform. You’ve written some new code to provision a new RDS instance. When you run terraform plan, it tells you what will happen. It’s not running the code, it’s just parsing it and comparing it to the current state. The real "test" happens when you run terraform apply.

Consider this scenario: you’re updating an existing security group. You’ve added a new ingress rule to allow traffic on port 8080.

resource "aws_security_group" "webserver" {
  name        = "webserver-sg"
  description = "Allow TLS inbound traffic"
  vpc_id      = "vpc-0123456789abcdef0"

  ingress {
    description = "HTTP"
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    description = "HTTPS"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    description = "New App Port" # <-- This is new
    from_port   = 8080
    to_port     = 8080
    protocol    = "tcp"
    cidr_blocks = ["10.0.0.0/16"] # <-- And this is a specific internal network
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "webserver-sg"
  }
}

When you run terraform plan, it will show you:

Terraform will perform the following actions:

  # aws_security_group.webserver will be updated in-place
  ~ resource "aws_security_group" "webserver" {
        id                                   = "sg-0abcdef1234567890"
      ~ ingress                              {
          ~ description = "HTTPS" -> "HTTPS"
            from_port   = 443
            to_port     = 443
            protocol    = "tcp"
            cidr_blocks = ["0.0.0.0/0"]
          ~ description = "HTTP" -> "HTTP"
            from_port   = 80
            to_port     = 80
            protocol    = "tcp"
            cidr_blocks = ["0.0.0.0/0"]
          + description = "New App Port" # <-- This line is new
          + from_port   = 8080           # <-- This line is new
          + to_port     = 8080           # <-- This line is new
          + protocol    = "tcp"          # <-- This line is new
          + cidr_blocks = ["10.0.0.0/16"]# <-- This line is new
        }
        # (1 unchanged resource attribute)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

This plan output is your first "test." It’s a static analysis. It tells you what changes Terraform intends to make. It’s not executing anything in AWS. It’s just comparing your desired state (the code) with the current state (what AWS reports).

The real test happens with terraform apply. This command sends API calls to AWS to make the changes. If your code is correct and the AWS API is available, the security group rule will be added.

But what if the effect of that rule is wrong?

This is where testing gets interesting. You’ve added the rule, but your application still can’t connect. Why?

Common Causes for Failed Infrastructure Code Application:

Network ACLs (NACLs) Blocking Traffic: NACLs operate at the subnet level and are stateless. A common mistake is forgetting that NACLs, not just security groups, can block traffic.
- Diagnosis: Check the NACLs associated with the subnet(s) your RDS instance is in. Look for rules that deny traffic on port 8080.
```
aws ec2 describe-network-acls --network-acl-ids <your-nacls-id> --query 'NetworkAcls[0].Entries[*].{RuleNumber:RuleNumber,Protocol:Protocol,PortRange:PortRange,CidrBlock:CidrBlock,Egress:Egress,RuleAction:RuleAction}' --output table
```
- Fix: Add an explicit ALLOW rule for TCP port 8080 from your application’s CIDR block (e.g., 10.0.0.0/16) to the NACL. Ensure the rule number is lower than any DENY rule for the same traffic.
```
aws ec2 create-network-acl-entry --network-acl-id <your-nacls-id> --rule-number 100 --protocol 6 --port-range From=8080,To=8080 --cidr-block 10.0.0.0/16 --rule-action allow --egress false
```
- Why it works: NACLs are evaluated in order of rule number. By adding an explicit ALLOW rule with a lower number, you ensure traffic is permitted before any potential DENY rules are encountered.
IAM Permissions Not Granular Enough: Your application might have an IAM role, but that role might not have permissions to send traffic to the destination port, or the destination might not have permission to receive it. This is less common for direct network traffic but can occur with AWS services.
- Diagnosis: Examine the IAM policy attached to the service attempting to make the connection.
```
aws iam get-role-policy --role-name <your-role-name> --policy-name <your-policy-name>
```
- Fix: Update the IAM policy to grant necessary network-firewall:Allow or similar permissions if applicable, or ensure the target service’s IAM policy allows inbound connections.
- Why it works: IAM policies control what actions principals (users, roles) can perform on AWS resources. Explicitly granting the correct permissions allows the intended communication.
Route 53 or DNS Resolution Issues: If your application is connecting via a hostname, DNS might be resolving to the wrong IP address, or the record might not exist.
- Diagnosis: Use dig or nslookup to check DNS resolution.
```
dig <your-rds-hostname>
```
- Fix: Update your Route 53 record set to point to the correct IP address or CNAME.
```
aws route53 change-resource-record-sets --hosted-zone-id <your-hosted-zone-id> --change-batch '{
  "Comment": "Update RDS endpoint",
  "Changes": [
    {
      "Action": "UPSERT",
      "ResourceRecordSet": {
        "Name": "<your-rds-hostname>",
        "Type": "CNAME",
        "TTL": 300,
        "ResourceRecords": [
          {"Value": "<new-rds-endpoint>"}
        ]
      }
    }
  ]
}'
```
- Why it works: Correct DNS resolution is fundamental for establishing network connections. By updating the DNS record, you ensure clients can find the correct network endpoint.
Subnet Routing Configuration: If your application and the RDS instance are in different subnets (even within the same VPC), the route tables associated with those subnets need to allow traffic between them.
- Diagnosis: Check the route tables for the subnets involved.
```
aws ec2 describe-route-tables --route-table-ids <your-route-table-id> --query 'RouteTables[0].Routes[*].{DestinationCidrBlock:DestinationCidrBlock,GatewayId:GatewayId,InstanceId:InstanceId,NetworkInterfaceId:NetworkInterfaceId,Origin:Origin,State:State,VpcPeeringConnectionId:VpcPeeringConnectionId}' --output table
```
- Fix: Add a route in the application’s subnet route table that directs traffic destined for the RDS instance’s subnet CIDR block to the appropriate gateway or interface.
```
aws ec2 create-route --route-table-id <your-route-table-id> --destination-cidr-block 10.0.1.0/24 --gateway-id igw-xxxxxxxxxxxxxxxxx
```
- Why it works: Route tables dictate how network traffic is directed within a VPC. Ensuring a valid route exists ensures packets can traverse between the subnets.
RDS Instance Configuration (Public Accessibility, VPC Settings): The RDS instance itself might not be configured to accept connections from your application’s IP range, or it might be in a private subnet without proper routing.
- Diagnosis: Check the RDS instance’s network and security settings in the AWS console or via the CLI.
```
aws rds describe-db-instances --db-instance-identifier <your-db-instance-id> --query 'DBInstances[*].{DBInstanceIdentifier:DBInstanceIdentifier,VPCSecurityGroups:VPCSecurityGroups,DBSubnetGroup:DBSubnetGroup,PubliclyAccessible:PubliclyAccessible}'
```
- Fix: Ensure the RDS instance is in a subnet that allows inbound traffic from your application’s CIDR range and that its associated security groups are correctly configured. If it’s in a private subnet, ensure there’s a NAT Gateway or VPC Endpoint allowing outbound access if needed.
- Why it works: The RDS instance is the ultimate destination. Its own network configuration and security group rules must permit the incoming connection.
Application Configuration: The application itself might be misconfigured, expecting to connect to a different host or port, or it might have internal firewall rules.
- Diagnosis: Review the application’s configuration files (e.g., application.properties, .env, connection strings) and logs for connection errors.
- Fix: Correct the application’s connection string, hostname, port, or any internal firewall settings.
- Why it works: The application must be told where to connect. Incorrect application configuration will prevent it from reaching even a correctly provisioned resource.

The next error you’ll hit is a connection timed out or connection refused if you’ve correctly opened the port but the application is still not responding because the service on that port isn’t running.