Testing Infrastructure Code#

Infrastructure code has a unique testing challenge: the thing you are testing is expensive to instantiate. You cannot spin up a VPC, an RDS instance, and an EKS cluster for every pull request and tear it down 5 minutes later without significant cost and time. But you also cannot ship untested infrastructure changes to production without risk.

The solution is the same as in software engineering: a testing pyramid. Fast, cheap tests at the bottom catch most errors. Slower, expensive tests at the top catch the rest. The key is knowing what to test at which level.

The Infrastructure Testing Pyramid#

                    ┌─────────────────┐
                    │   Integration   │  Real cloud resources
                    │   (Terratest)   │  Expensive, slow (10-30 min)
                    │   Run: nightly  │  Catches: actual API behavior
                   ┌┴─────────────────┴┐
                   │    Plan-Based     │  Real plan output, no apply
                   │  (Conftest/OPA)   │  Moderate (1-3 min)
                   │  Run: every PR    │  Catches: policy violations
                  ┌┴───────────────────┴┐
                  │    Cost Estimation   │  Plan output → cost analysis
                  │    (Infracost)       │  Moderate (1-2 min)
                  │    Run: every PR     │  Catches: budget overruns
                 ┌┴─────────────────────┴┐
                 │    Static Analysis     │  No cloud access needed
                 │  (tflint, checkov,     │  Fast (seconds)
                 │   terraform validate)  │  Catches: syntax, config errors
                 │  Run: every commit     │  Run: every commit
                 └───────────────────────┘

Each level catches different classes of errors. Skipping a level means those errors reach the next level (which is slower and more expensive to run) or reach production.

Level 1: Static Analysis (Seconds)#

Static analysis checks code without executing it or connecting to any cloud API. It runs on every commit in pre-commit hooks or early in CI.

terraform validate#

Checks HCL syntax and basic resource configuration:

terraform init -backend=false    # initialize providers without backend
terraform validate               # check syntax and resource references

Catches: missing required arguments, invalid resource types, broken references, type mismatches. Does not catch: values that are syntactically valid but logically wrong.

tflint#

Catches provider-specific errors that validate misses:

tflint --init          # download provider-specific rulesets
tflint --recursive     # lint all modules
# .tflint.hcl
plugin "aws" {
  enabled = true
  version = "0.30.0"
  source  = "github.com/terraform-linters/tflint-ruleset-aws"
}

rule "terraform_naming_convention" {
  enabled = true
  format  = "snake_case"
}

rule "terraform_documented_variables" {
  enabled = true
}

Catches: invalid instance types (t3.superxlarge does not exist), deprecated resource arguments, naming convention violations, variables without descriptions.

checkov#

Scans for security misconfigurations and compliance issues:

checkov -d . --framework terraform

Catches: unencrypted S3 buckets, public security groups, missing logging, databases without backups, KMS keys without rotation. Checkov has 2,500+ built-in policies covering CIS benchmarks, SOC2, PCI-DSS, and HIPAA.

terraform fmt#

Not a test per se, but enforces consistent formatting:

terraform fmt -check -recursive -diff

Run this first in CI. If formatting fails, the PR has style issues that should be fixed before deeper analysis.

Static Analysis Pipeline#

#!/bin/bash
# pre-commit or CI script
set -e

echo "=== Format check ==="
terraform fmt -check -recursive -diff

echo "=== Validate ==="
terraform init -backend=false
terraform validate

echo "=== tflint ==="
tflint --init
tflint --recursive

echo "=== Checkov ==="
checkov -d . --framework terraform --quiet

echo "=== All static checks passed ==="

Total runtime: 5-30 seconds. No cloud credentials needed. No API calls.

Level 2: Cost Estimation (1-2 Minutes)#

Cost estimation runs terraform plan and analyzes the planned resources against pricing data. It catches budget surprises before they reach production.

Infracost#

# Generate plan
terraform plan -out=tfplan
terraform show -json tfplan > plan.json

# Estimate cost
infracost breakdown --path=plan.json --format=json --out-file=cost.json
infracost output --path=cost.json --format=table

Output example:

Project: infrastructure/compute

 Name                                     Monthly Qty  Unit   Monthly Cost
 ─────────────────────────────────────────────────────────────────────────
 aws_instance.app
 ├─ Instance usage (t3.large)                     730  hours        $60.74
 ├─ root_block_device
 │  └─ Storage (gp3, 50 GB)                        50  GB           $4.00
 └─ ebs_block_device[0]
    └─ Storage (gp3, 200 GB)                       200  GB          $16.00

 aws_rds_cluster.main
 ├─ Aurora capacity units                         730  ACU-hours   $87.60
 └─ Storage                                        50  GB           $5.00

 OVERALL TOTAL                                                    $173.34

Cost Guardrails#

Add policy checks for cost:

# Fail if monthly cost exceeds threshold
COST=$(jq '.totalMonthlyCost | tonumber' cost.json)
THRESHOLD=500
if (( $(echo "$COST > $THRESHOLD" | bc -l) )); then
  echo "ERROR: Estimated monthly cost \$$COST exceeds threshold \$$THRESHOLD"
  exit 1
fi

What Cost Estimation Catches#

Issue Example Without Cost Check
Oversized instances r5.4xlarge instead of t3.large Discovered on first bill
Missing spot/reserved pricing On-demand for always-on workloads Overpaying by 40-70%
Storage accumulation 500GB EBS per instance × 20 instances $800/mo in EBS alone
NAT gateway surprise NAT per AZ + high throughput $100-500/mo unplanned
Data transfer Cross-region replication, internet egress Largest surprise cost

Level 3: Plan-Based Testing (1-3 Minutes)#

Plan-based testing runs terraform plan, converts the output to JSON, and evaluates it against policy rules. The plan is never applied — no resources are created.

Conftest with OPA#

# Generate plan JSON
terraform plan -out=tfplan
terraform show -json tfplan > plan.json

# Test against policies
conftest test plan.json --policy policies/

Policy examples:

# policies/tags.rego
package main

deny[msg] {
  resource := input.resource_changes[_]
  actions := resource.change.actions
  actions[_] == "create"

  # Check for required tags
  tags := resource.change.after.tags
  not tags.Environment
  msg := sprintf("Resource %s missing 'Environment' tag", [resource.address])
}

deny[msg] {
  resource := input.resource_changes[_]
  actions := resource.change.actions
  actions[_] == "create"

  tags := resource.change.after.tags
  not tags.ManagedBy
  msg := sprintf("Resource %s missing 'ManagedBy' tag", [resource.address])
}
# policies/security.rego
package main

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_security_group_rule"
  resource.change.after.cidr_blocks[_] == "0.0.0.0/0"
  resource.change.after.type == "ingress"
  resource.change.after.from_port != 443
  resource.change.after.from_port != 80
  msg := sprintf(
    "Security group rule %s allows 0.0.0.0/0 on port %d (only 80 and 443 allowed)",
    [resource.address, resource.change.after.from_port]
  )
}
# policies/cost.rego
package main

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_instance"
  instance_type := resource.change.after.instance_type
  expensive := {"r5.4xlarge", "r5.8xlarge", "m5.8xlarge", "c5.9xlarge"}
  expensive[instance_type]
  msg := sprintf(
    "Instance %s uses expensive type %s — requires approval",
    [resource.address, instance_type]
  )
}

What Plan-Based Testing Catches#

Category Examples
Missing tags Resources created without required tags
Security violations Open security groups, unencrypted resources, public access
Naming violations Resources not matching naming conventions
Size constraints Instances larger than approved sizes
Destructive changes Resources being replaced or destroyed (flag for review)
Drift-related changes Resources changing that were not in the code diff

Level 4: Integration Testing (10-30 Minutes)#

Integration testing creates real infrastructure, validates it works, and tears it down. This is expensive in time and money — reserve it for nightly runs, pre-release validation, or module certification.

Terratest#

package test

import (
    "testing"
    "fmt"

    "github.com/gruntwork-io/terratest/modules/terraform"
    "github.com/gruntwork-io/terratest/modules/aws"
    "github.com/stretchr/testify/assert"
)

func TestNetworkingModule(t *testing.T) {
    t.Parallel()

    opts := &terraform.Options{
        TerraformDir: "../infrastructure/networking",
        Vars: map[string]interface{}{
            "environment": "test",
            "vpc_cidr":    "10.99.0.0/16",
        },
    }

    defer terraform.Destroy(t, opts)
    terraform.InitAndApply(t, opts)

    // Verify VPC was created
    vpcId := terraform.Output(t, opts, "vpc_id")
    assert.Contains(t, vpcId, "vpc-")

    // Verify subnets are in the correct VPC
    subnetIds := terraform.OutputList(t, opts, "private_subnet_ids")
    assert.Equal(t, 2, len(subnetIds))

    for _, subnetId := range subnetIds {
        subnet := aws.GetSubnet(t, subnetId, "us-east-1")
        assert.Equal(t, vpcId, subnet.VpcId)
    }

    // Verify DNS resolution works
    vpc := aws.GetVpcById(t, vpcId, "us-east-1")
    assert.True(t, vpc.EnableDnsHostnames)
}

When to Run Integration Tests#

Trigger What to Test Why
Nightly scheduled run All modules Catch provider API changes, drift in AMI IDs, expired certificates
Before tagging a module release The module being released Verify it works against real APIs before consumers adopt it
After a major provider upgrade All modules using that provider Verify compatibility with new API behaviors
After a significant refactoring The refactored module Verify the refactoring did not break functionality

Integration Test Cost Management#

  • Run in a dedicated test account with billing alerts
  • Use the smallest viable resource sizes (t3.micro, db.t3.micro)
  • Set aggressive timeouts: defer terraform.Destroy() ensures cleanup even on failure
  • Tag all test resources with Environment = "test" and a TTL tag
  • Run a nightly sweeper that destroys any resources older than 24 hours in the test account

Choosing What to Test Where#

What You Want to Verify Test Level Tool Cost
Valid HCL syntax Static terraform validate Free, instant
Provider-specific config errors Static tflint Free, instant
Security misconfigurations Static checkov Free, instant
Required tags present Plan-based conftest Free, 1-3 min
No open security groups Plan-based conftest Free, 1-3 min
No accidental destroys Plan-based conftest Free, 1-3 min
Monthly cost within budget Plan-based infracost Free tier, 1-2 min
Resources actually work Integration terratest Cloud costs, 10-30 min
Cross-resource connectivity Integration terratest Cloud costs, 10-30 min
Module output contracts Integration terratest Cloud costs, 10-30 min

The 80/20 rule: Static analysis and plan-based testing catch 80% of issues at 1% of the cost. Integration testing catches the remaining 20% at 99% of the cost. Invest heavily in levels 1-3 before spending on level 4.

The Agent Testing Workflow#

When an agent writes or modifies Terraform:

1. Write the changes
2. Run: terraform fmt (fix formatting)
3. Run: terraform validate (catch syntax errors)
4. Run: tflint (catch provider-specific issues)
5. Run: checkov (catch security issues)
   ─── Fix any errors found in steps 2-5 ───
6. Run: terraform plan -out=tfplan
7. Run: conftest test (policy checks on plan)
8. Run: infracost breakdown (cost estimate)
9. Present plan summary + cost estimate to human
10. WAIT for approval
11. On approval: terraform apply tfplan

Steps 2-5 are automated and self-correcting — the agent fixes issues it finds. Steps 6-8 produce information for the human. Step 9 is the safety gate. Steps 2-8 together take 2-5 minutes and catch the vast majority of issues before a human ever sees the plan.