Testing Infrastructure Code#

Infrastructure code has a unique testing challenge: the thing you are testing is expensive to instantiate. You cannot spin up a VPC, an RDS instance, and an EKS cluster for every pull request and tear it down 5 minutes later without significant cost and time. But you also cannot ship untested infrastructure changes to production without risk.

The solution is the same as in software engineering: a testing pyramid. Fast, cheap tests at the bottom catch most errors. Slower, expensive tests at the top catch the rest. The key is knowing what to test at which level.

The Infrastructure Testing Pyramid#

                    ┌─────────────────┐
                    │   Integration   │  Real cloud resources
                    │   (Terratest)   │  Expensive, slow (10-30 min)
                    │   Run: nightly  │  Catches: actual API behavior
                   ┌┴─────────────────┴┐
                   │    Plan-Based     │  Real plan output, no apply
                   │  (Conftest/OPA)   │  Moderate (1-3 min)
                   │  Run: every PR    │  Catches: policy violations
                  ┌┴───────────────────┴┐
                  │    Cost Estimation   │  Plan output → cost analysis
                  │    (Infracost)       │  Moderate (1-2 min)
                  │    Run: every PR     │  Catches: budget overruns
                 ┌┴─────────────────────┴┐
                 │    Static Analysis     │  No cloud access needed
                 │  (tflint, checkov,     │  Fast (seconds)
                 │   terraform validate)  │  Catches: syntax, config errors
                 │  Run: every commit     │  Run: every commit
                 └───────────────────────┘

Each level catches different classes of errors. Skipping a level means those errors reach the next level (which is slower and more expensive to run) or reach production.

Level 1: Static Analysis (Seconds)#

Static analysis checks code without executing it or connecting to any cloud API. It runs on every commit in pre-commit hooks or early in CI.

terraform validate#

Checks HCL syntax and basic resource configuration:

terraform init -backend=false    # initialize providers without backend
terraform validate               # check syntax and resource references

Catches: missing required arguments, invalid resource types, broken references, type mismatches. Does not catch: values that are syntactically valid but logically wrong.

tflint#

Catches provider-specific errors that validate misses:

tflint --init          # download provider-specific rulesets
tflint --recursive     # lint all modules
# .tflint.hcl
plugin "aws" {
  enabled = true
  version = "0.30.0"
  source  = "github.com/terraform-linters/tflint-ruleset-aws"
}

rule "terraform_naming_convention" {
  enabled = true
  format  = "snake_case"
}

rule "terraform_documented_variables" {
  enabled = true
}

Catches: invalid instance types (t3.superxlarge does not exist), deprecated resource arguments, naming convention violations, variables without descriptions.

checkov#

Scans for security misconfigurations and compliance issues:

checkov -d . --framework terraform

Catches: unencrypted S3 buckets, public security groups, missing logging, databases without backups, KMS keys without rotation. Checkov has 2,500+ built-in policies covering CIS benchmarks, SOC2, PCI-DSS, and HIPAA.

terraform fmt#

Not a test per se, but enforces consistent formatting:

terraform fmt -check -recursive -diff

Run this first in CI. If formatting fails, the PR has style issues that should be fixed before deeper analysis.

Static Analysis Pipeline#

#!/bin/bash
# pre-commit or CI script
set -e

echo "=== Format check ==="
terraform fmt -check -recursive -diff

echo "=== Validate ==="
terraform init -backend=false
terraform validate

echo "=== tflint ==="
tflint --init
tflint --recursive

echo "=== Checkov ==="
checkov -d . --framework terraform --quiet

echo "=== All static checks passed ==="

Total runtime: 5-30 seconds. No cloud credentials needed. No API calls.

Level 2: Cost Estimation (1-2 Minutes)#

Cost estimation runs terraform plan and analyzes the planned resources against pricing data. It catches budget surprises before they reach production.

Infracost#

# Generate plan
terraform plan -out=tfplan
terraform show -json tfplan > plan.json

# Estimate cost
infracost breakdown --path=plan.json --format=json --out-file=cost.json
infracost output --path=cost.json --format=table

Output example:

Project: infrastructure/compute

 Name                                     Monthly Qty  Unit   Monthly Cost
 ─────────────────────────────────────────────────────────────────────────
 aws_instance.app
 ├─ Instance usage (t3.large)                     730  hours        $60.74
 ├─ root_block_device
 │  └─ Storage (gp3, 50 GB)                        50  GB           $4.00
 └─ ebs_block_device[0]
    └─ Storage (gp3, 200 GB)                       200  GB          $16.00

 aws_rds_cluster.main
 ├─ Aurora capacity units                         730  ACU-hours   $87.60
 └─ Storage                                        50  GB           $5.00

 OVERALL TOTAL                                                    $173.34

Cost Guardrails#

Add policy checks for cost:

# Fail if monthly cost exceeds threshold
COST=$(jq '.totalMonthlyCost | tonumber' cost.json)
THRESHOLD=500
if (( $(echo "$COST > $THRESHOLD" | bc -l) )); then
  echo "ERROR: Estimated monthly cost \$$COST exceeds threshold \$$THRESHOLD"
  exit 1
fi

What Cost Estimation Catches#

IssueExampleWithout Cost Check
Oversized instancesr5.4xlarge instead of t3.largeDiscovered on first bill
Missing spot/reserved pricingOn-demand for always-on workloadsOverpaying by 40-70%
Storage accumulation500GB EBS per instance × 20 instances$800/mo in EBS alone
NAT gateway surpriseNAT per AZ + high throughput$100-500/mo unplanned
Data transferCross-region replication, internet egressLargest surprise cost

Level 3: Plan-Based Testing (1-3 Minutes)#

Plan-based testing runs terraform plan, converts the output to JSON, and evaluates it against policy rules. The plan is never applied — no resources are created.

Conftest with OPA#

# Generate plan JSON
terraform plan -out=tfplan
terraform show -json tfplan > plan.json

# Test against policies
conftest test plan.json --policy policies/

Policy examples:

# policies/tags.rego
package main

deny[msg] {
  resource := input.resource_changes[_]
  actions := resource.change.actions
  actions[_] == "create"

  # Check for required tags
  tags := resource.change.after.tags
  not tags.Environment
  msg := sprintf("Resource %s missing 'Environment' tag", [resource.address])
}

deny[msg] {
  resource := input.resource_changes[_]
  actions := resource.change.actions
  actions[_] == "create"

  tags := resource.change.after.tags
  not tags.ManagedBy
  msg := sprintf("Resource %s missing 'ManagedBy' tag", [resource.address])
}
# policies/security.rego
package main

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_security_group_rule"
  resource.change.after.cidr_blocks[_] == "0.0.0.0/0"
  resource.change.after.type == "ingress"
  resource.change.after.from_port != 443
  resource.change.after.from_port != 80
  msg := sprintf(
    "Security group rule %s allows 0.0.0.0/0 on port %d (only 80 and 443 allowed)",
    [resource.address, resource.change.after.from_port]
  )
}
# policies/cost.rego
package main

deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_instance"
  instance_type := resource.change.after.instance_type
  expensive := {"r5.4xlarge", "r5.8xlarge", "m5.8xlarge", "c5.9xlarge"}
  expensive[instance_type]
  msg := sprintf(
    "Instance %s uses expensive type %s — requires approval",
    [resource.address, instance_type]
  )
}

What Plan-Based Testing Catches#

CategoryExamples
Missing tagsResources created without required tags
Security violationsOpen security groups, unencrypted resources, public access
Naming violationsResources not matching naming conventions
Size constraintsInstances larger than approved sizes
Destructive changesResources being replaced or destroyed (flag for review)
Drift-related changesResources changing that were not in the code diff

Level 4: Integration Testing (10-30 Minutes)#

Integration testing creates real infrastructure, validates it works, and tears it down. This is expensive in time and money — reserve it for nightly runs, pre-release validation, or module certification.

Terratest#

package test

import (
    "testing"
    "fmt"

    "github.com/gruntwork-io/terratest/modules/terraform"
    "github.com/gruntwork-io/terratest/modules/aws"
    "github.com/stretchr/testify/assert"
)

func TestNetworkingModule(t *testing.T) {
    t.Parallel()

    opts := &terraform.Options{
        TerraformDir: "../infrastructure/networking",
        Vars: map[string]interface{}{
            "environment": "test",
            "vpc_cidr":    "10.99.0.0/16",
        },
    }

    defer terraform.Destroy(t, opts)
    terraform.InitAndApply(t, opts)

    // Verify VPC was created
    vpcId := terraform.Output(t, opts, "vpc_id")
    assert.Contains(t, vpcId, "vpc-")

    // Verify subnets are in the correct VPC
    subnetIds := terraform.OutputList(t, opts, "private_subnet_ids")
    assert.Equal(t, 2, len(subnetIds))

    for _, subnetId := range subnetIds {
        subnet := aws.GetSubnet(t, subnetId, "us-east-1")
        assert.Equal(t, vpcId, subnet.VpcId)
    }

    // Verify DNS resolution works
    vpc := aws.GetVpcById(t, vpcId, "us-east-1")
    assert.True(t, vpc.EnableDnsHostnames)
}

When to Run Integration Tests#

TriggerWhat to TestWhy
Nightly scheduled runAll modulesCatch provider API changes, drift in AMI IDs, expired certificates
Before tagging a module releaseThe module being releasedVerify it works against real APIs before consumers adopt it
After a major provider upgradeAll modules using that providerVerify compatibility with new API behaviors
After a significant refactoringThe refactored moduleVerify the refactoring did not break functionality

Integration Test Cost Management#

  • Run in a dedicated test account with billing alerts
  • Use the smallest viable resource sizes (t3.micro, db.t3.micro)
  • Set aggressive timeouts: defer terraform.Destroy() ensures cleanup even on failure
  • Tag all test resources with Environment = "test" and a TTL tag
  • Run a nightly sweeper that destroys any resources older than 24 hours in the test account

Choosing What to Test Where#

What You Want to VerifyTest LevelToolCost
Valid HCL syntaxStaticterraform validateFree, instant
Provider-specific config errorsStatictflintFree, instant
Security misconfigurationsStaticcheckovFree, instant
Required tags presentPlan-basedconftestFree, 1-3 min
No open security groupsPlan-basedconftestFree, 1-3 min
No accidental destroysPlan-basedconftestFree, 1-3 min
Monthly cost within budgetPlan-basedinfracostFree tier, 1-2 min
Resources actually workIntegrationterratestCloud costs, 10-30 min
Cross-resource connectivityIntegrationterratestCloud costs, 10-30 min
Module output contractsIntegrationterratestCloud costs, 10-30 min

The 80/20 rule: Static analysis and plan-based testing catch 80% of issues at 1% of the cost. Integration testing catches the remaining 20% at 99% of the cost. Invest heavily in levels 1-3 before spending on level 4.

The Agent Testing Workflow#

When an agent writes or modifies Terraform:

1. Write the changes
2. Run: terraform fmt (fix formatting)
3. Run: terraform validate (catch syntax errors)
4. Run: tflint (catch provider-specific issues)
5. Run: checkov (catch security issues)
   ─── Fix any errors found in steps 2-5 ───
6. Run: terraform plan -out=tfplan
7. Run: conftest test (policy checks on plan)
8. Run: infracost breakdown (cost estimate)
9. Present plan summary + cost estimate to human
10. WAIT for approval
11. On approval: terraform apply tfplan

Steps 2-5 are automated and self-correcting — the agent fixes issues it finds. Steps 6-8 produce information for the human. Step 9 is the safety gate. Steps 2-8 together take 2-5 minutes and catch the vast majority of issues before a human ever sees the plan.