Testing Infrastructure Code#

Infrastructure code has a unique testing challenge: the thing you are testing is expensive to instantiate. You cannot spin up a VPC, an RDS instance, and an EKS cluster for every pull request and tear it down 5 minutes later without significant cost and time. But you also cannot ship untested infrastructure changes to production without risk.

The solution is the same as in software engineering: a testing pyramid. Fast, cheap tests at the bottom catch most errors. Slower, expensive tests at the top catch the rest. The key is knowing what to test at which level.

The Infrastructure Testing Pyramid#

                    ┌─────────────────┐
                    │   Integration   │  Real cloud resources
                    │   (Terratest)   │  Expensive, slow (10-30 min)
                    │   Run: nightly  │  Catches: actual API behavior
                   ┌┴─────────────────┴┐
                   │    Plan-Based     │  Real plan output, no apply
                   │  (Conftest/OPA)   │  Moderate (1-3 min)
                   │  Run: every PR    │  Catches: policy violations
                  ┌┴───────────────────┴┐
                  │    Cost Estimation   │  Plan output → cost analysis
                  │    (Infracost)       │  Moderate (1-2 min)
                  │    Run: every PR     │  Catches: budget overruns
                 ┌┴─────────────────────┴┐
                 │    Static Analysis     │  No cloud access needed
                 │  (tflint, checkov,     │  Fast (seconds)
                 │   terraform validate)  │  Catches: syntax, config errors
                 │  Run: every commit     │  Run: every commit
                 └───────────────────────┘

Each level catches different classes of errors. Skipping a level means those errors reach the next level (which is slower and more expensive to run) or reach production.

Level 1: Static Analysis (Seconds)#

Static analysis checks code without executing it or connecting to any cloud API. It runs on every commit in pre-commit hooks or early in CI.

terraform validate#

Checks HCL syntax and basic resource configuration:

terraform init -backend=false    # initialize providers without backend
terraform validate               # check syntax and resource references

Catches: missing required arguments, invalid resource types, broken references, type mismatches. Does not catch: values that are syntactically valid but logically wrong.

tflint#

Catches provider-specific errors that validate misses:

tflint --init          # download provider-specific rulesets
tflint --recursive     # lint all modules

# .tflint.hcl
plugin "aws" {
  enabled = true
  version = "0.30.0"
  source  = "github.com/terraform-linters/tflint-ruleset-aws"
}

rule "terraform_naming_convention" {
  enabled = true
  format  = "snake_case"
}

rule "terraform_documented_variables" {
  enabled = true
}

Catches: invalid instance types (t3.superxlarge does not exist), deprecated resource arguments, naming convention violations, variables without descriptions.

checkov#

Scans for security misconfigurations and compliance issues:

checkov -d . --framework terraform

Catches: unencrypted S3 buckets, public security groups, missing logging, databases without backups, KMS keys without rotation. Checkov has 2,500+ built-in policies covering CIS benchmarks, SOC2, PCI-DSS, and HIPAA.

terraform fmt#

Not a test per se, but enforces consistent formatting:

terraform fmt -check -recursive -diff

Run this first in CI. If formatting fails, the PR has style issues that should be fixed before deeper analysis.

Static Analysis Pipeline#

#!/bin/bash
# pre-commit or CI script
set -e

echo "=== Format check ==="
terraform fmt -check -recursive -diff

echo "=== Validate ==="
terraform init -backend=false
terraform validate

echo "=== tflint ==="
tflint --init
tflint --recursive

echo "=== Checkov ==="
checkov -d . --framework terraform --quiet

echo "=== All static checks passed ==="

Total runtime: 5-30 seconds. No cloud credentials needed. No API calls.

Level 2: Cost Estimation (1-2 Minutes)#

Cost estimation runs terraform plan and analyzes the planned resources against pricing data. It catches budget surprises before they reach production.

Infracost#

# Generate plan
terraform plan -out=tfplan
terraform show -json tfplan > plan.json

# Estimate cost
infracost breakdown --path=plan.json --format=json --out-file=cost.json
infracost output --path=cost.json --format=table

Output example:

Project: infrastructure/compute

 Name                                     Monthly Qty  Unit   Monthly Cost
 ─────────────────────────────────────────────────────────────────────────
 aws_instance.app
 ├─ Instance usage (t3.large)                     730  hours        $60.74
 ├─ root_block_device
 │  └─ Storage (gp3, 50 GB)                        50  GB           $4.00
 └─ ebs_block_device[0]
    └─ Storage (gp3, 200 GB)                       200  GB          $16.00

 aws_rds_cluster.main
 ├─ Aurora capacity units                         730  ACU-hours   $87.60
 └─ Storage                                        50  GB           $5.00

 OVERALL TOTAL                                                    $173.34

Cost Guardrails#

Add policy checks for cost:

# Fail if monthly cost exceeds threshold
COST=$(jq '.totalMonthlyCost | tonumber' cost.json)
THRESHOLD=500
if (( $(echo "$COST > $THRESHOLD" | bc -l) )); then
  echo "ERROR: Estimated monthly cost \$$COST exceeds threshold \$$THRESHOLD"
  exit 1
fi

What Cost Estimation Catches#

Issue	Example	Without Cost Check
Oversized instances	`r5.4xlarge` instead of `t3.large`	Discovered on first bill
Missing spot/reserved pricing	On-demand for always-on workloads	Overpaying by 40-70%
Storage accumulation	500GB EBS per instance × 20 instances	$800/mo in EBS alone
NAT gateway surprise	NAT per AZ + high throughput	$100-500/mo unplanned
Data transfer	Cross-region replication, internet egress	Largest surprise cost

Level 3: Plan-Based Testing (1-3 Minutes)#

Plan-based testing runs terraform plan, converts the output to JSON, and evaluates it against policy rules. The plan is never applied — no resources are created.

Conftest with OPA#

# Generate plan JSON
terraform plan -out=tfplan
terraform show -json tfplan > plan.json

# Test against policies
conftest test plan.json --policy policies/

Policy examples:

# policies/tags.rego
package main

deny[msg] {
  resource := input.resource_changes[_]
  actions := resource.change.actions
  actions[_] == "create"

  # Check for required tags
  tags := resource.change.after.tags
  not tags.Environment
  msg := sprintf("Resource %s missing 'Environment' tag", [resource.address])
}

deny[msg] {
  resource := input.resource_changes[_]
  actions := resource.change.actions
  actions[_] == "create"

  tags := resource.change.after.tags
  not tags.ManagedBy
  msg := sprintf("Resource %s missing 'ManagedBy' tag", [resource.address])
}

# policies/security.rego
package main

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_security_group_rule"
  resource.change.after.cidr_blocks[_] == "0.0.0.0/0"
  resource.change.after.type == "ingress"
  resource.change.after.from_port != 443
  resource.change.after.from_port != 80
  msg := sprintf(
    "Security group rule %s allows 0.0.0.0/0 on port %d (only 80 and 443 allowed)",
    [resource.address, resource.change.after.from_port]
  )
}

# policies/cost.rego
package main

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_instance"
  instance_type := resource.change.after.instance_type
  expensive := {"r5.4xlarge", "r5.8xlarge", "m5.8xlarge", "c5.9xlarge"}
  expensive[instance_type]
  msg := sprintf(
    "Instance %s uses expensive type %s — requires approval",
    [resource.address, instance_type]
  )
}

What Plan-Based Testing Catches#

Category	Examples
Missing tags	Resources created without required tags
Security violations	Open security groups, unencrypted resources, public access
Naming violations	Resources not matching naming conventions
Size constraints	Instances larger than approved sizes
Destructive changes	Resources being replaced or destroyed (flag for review)
Drift-related changes	Resources changing that were not in the code diff

Level 4: Integration Testing (10-30 Minutes)#

Integration testing creates real infrastructure, validates it works, and tears it down. This is expensive in time and money — reserve it for nightly runs, pre-release validation, or module certification.

Terratest#

package test

import (
    "testing"
    "fmt"

    "github.com/gruntwork-io/terratest/modules/terraform"
    "github.com/gruntwork-io/terratest/modules/aws"
    "github.com/stretchr/testify/assert"
)

func TestNetworkingModule(t *testing.T) {
    t.Parallel()

    opts := &terraform.Options{
        TerraformDir: "../infrastructure/networking",
        Vars: map[string]interface{}{
            "environment": "test",
            "vpc_cidr":    "10.99.0.0/16",
        },
    }

    defer terraform.Destroy(t, opts)
    terraform.InitAndApply(t, opts)

    // Verify VPC was created
    vpcId := terraform.Output(t, opts, "vpc_id")
    assert.Contains(t, vpcId, "vpc-")

    // Verify subnets are in the correct VPC
    subnetIds := terraform.OutputList(t, opts, "private_subnet_ids")
    assert.Equal(t, 2, len(subnetIds))

    for _, subnetId := range subnetIds {
        subnet := aws.GetSubnet(t, subnetId, "us-east-1")
        assert.Equal(t, vpcId, subnet.VpcId)
    }

    // Verify DNS resolution works
    vpc := aws.GetVpcById(t, vpcId, "us-east-1")
    assert.True(t, vpc.EnableDnsHostnames)
}

When to Run Integration Tests#

Trigger	What to Test	Why
Nightly scheduled run	All modules	Catch provider API changes, drift in AMI IDs, expired certificates
Before tagging a module release	The module being released	Verify it works against real APIs before consumers adopt it
After a major provider upgrade	All modules using that provider	Verify compatibility with new API behaviors
After a significant refactoring	The refactored module	Verify the refactoring did not break functionality

Integration Test Cost Management#

Run in a dedicated test account with billing alerts
Use the smallest viable resource sizes (t3.micro, db.t3.micro)
Set aggressive timeouts: defer terraform.Destroy() ensures cleanup even on failure
Tag all test resources with Environment = "test" and a TTL tag
Run a nightly sweeper that destroys any resources older than 24 hours in the test account

Choosing What to Test Where#

What You Want to Verify	Test Level	Tool	Cost
Valid HCL syntax	Static	`terraform validate`	Free, instant
Provider-specific config errors	Static	`tflint`	Free, instant
Security misconfigurations	Static	`checkov`	Free, instant
Required tags present	Plan-based	`conftest`	Free, 1-3 min
No open security groups	Plan-based	`conftest`	Free, 1-3 min
No accidental destroys	Plan-based	`conftest`	Free, 1-3 min
Monthly cost within budget	Plan-based	`infracost`	Free tier, 1-2 min
Resources actually work	Integration	`terratest`	Cloud costs, 10-30 min
Cross-resource connectivity	Integration	`terratest`	Cloud costs, 10-30 min
Module output contracts	Integration	`terratest`	Cloud costs, 10-30 min

The 80/20 rule: Static analysis and plan-based testing catch 80% of issues at 1% of the cost. Integration testing catches the remaining 20% at 99% of the cost. Invest heavily in levels 1-3 before spending on level 4.

The Agent Testing Workflow#

When an agent writes or modifies Terraform:

1. Write the changes
2. Run: terraform fmt (fix formatting)
3. Run: terraform validate (catch syntax errors)
4. Run: tflint (catch provider-specific issues)
5. Run: checkov (catch security issues)
   ─── Fix any errors found in steps 2-5 ───
6. Run: terraform plan -out=tfplan
7. Run: conftest test (policy checks on plan)
8. Run: infracost breakdown (cost estimate)
9. Present plan summary + cost estimate to human
10. WAIT for approval
11. On approval: terraform apply tfplan

Steps 2-5 are automated and self-correcting — the agent fixes issues it finds. Steps 6-8 produce information for the human. Step 9 is the safety gate. Steps 2-8 together take 2-5 minutes and catch the vast majority of issues before a human ever sees the plan.