Terraform Cost Management#

The most expensive line in your cloud bill was written in a .tf file. A single instance_type choice, a forgotten NAT Gateway, or an over-provisioned RDS instance can cost thousands per month — and none of these show up in terraform plan. Plan shows what changes. It does not show what it costs.

This article covers how to write cost-aware Terraform and catch expensive decisions before they reach production.

The Cost Visibility Gap#

terraform plan output:

# aws_instance.app will be created
+ resource "aws_instance" "app" {
    + instance_type = "r6g.2xlarge"
    ...
  }

Plan: 1 to add, 0 to change, 0 to destroy.

What the plan does not tell you: r6g.2xlarge costs $0.4032/hr = $294/month. A t3.medium would handle the workload at $0.0416/hr = $30/month.

Infracost: Cost Estimates in the Workflow#

Infracost reads your Terraform plan and estimates the monthly cost of every resource.

Setup#

# Install
brew install infracost  # macOS
# or
curl -fsSL https://raw.githubusercontent.com/infracost/infracost/master/scripts/install.sh | sh

# Register for free API key
infracost auth login

# Generate cost estimate from a plan
cd infrastructure/
terraform plan -out=tfplan
infracost breakdown --path=.

Example Output#

Project: infrastructure

Name                                     Monthly Qty  Unit         Monthly Cost
aws_instance.app
├─ Instance usage (Linux/UNIX, on-demand, r6g.2xlarge)   730  hours              $294.34
├─ root_block_device
│  └─ Storage (general purpose SSD, gp3)                  50  GB                   $4.00
└─ ebs_block_device[0]
   └─ Storage (general purpose SSD, gp3)                 200  GB                  $16.00

aws_nat_gateway.main
├─ NAT gateway                                           730  hours               $32.85
└─ Data processed                                  Monthly cost depends on usage

aws_db_instance.main
├─ Database instance (on-demand, db.r6g.large)           730  hours              $175.20
└─ Storage (general purpose SSD, gp3)                    100  GB                  $11.50

OVERALL TOTAL                                                                    $533.89

──────────────────────────────────
12 cloud resources were detected:
∙ 3 were estimated, 9 were free or had no cost data.

Infracost in CI/CD#

# GitHub Actions — post cost estimate as PR comment
- name: Infracost Breakdown
  run: |
    infracost breakdown --path=. \
      --format=json \
      --out-file=/tmp/infracost.json

- name: Infracost Comment
  run: |
    infracost comment github \
      --path=/tmp/infracost.json \
      --repo=${{ github.repository }} \
      --pull-request=${{ github.event.pull_request.number }} \
      --github-token=${{ secrets.GITHUB_TOKEN }} \
      --behavior=update

Infracost Policy (Budget Guardrails)#

# infracost.yml — fail PR if cost exceeds threshold
version: 0.1
policies:
  - path: infrastructure/
    max_monthly_cost: 1000  # fail if estimated cost > $1000/month

The Most Expensive Resources#

Resources that cause the biggest bill surprises:

AWS#

Resource Common Mistake Monthly Cost Fix
NAT Gateway One per AZ in dev $32/mo each, idle Use 1 in dev, per-AZ in prod only
RDS Multi-AZ Enabled in dev 2x instance cost multi_az = false for dev
EBS volumes gp3 200GB per instance $16/mo each Right-size, delete unattached
EKS cluster Cluster fee exists even empty $73/mo Cannot avoid, factor into budget
Elastic IP Allocated but unattached $3.65/mo each Attach or release
CloudWatch Logs High-verbosity logging $0.50/GB ingested Reduce log levels in dev
Data transfer Cross-AZ traffic $0.01/GB Place communicating resources in same AZ

Azure#

Resource Common Mistake Monthly Cost Fix
AKS node pool Standard_D4s_v5 in dev $140/mo per node Use Standard_B2s for dev
Azure Firewall Always-on in dev $912/mo Use NSGs in dev, Firewall in prod only
Log Analytics Ingesting everything $2.76/GB Configure data collection rules
App Gateway v2 Running in dev $175/mo Use simple LB in dev
Premium SSD P30 (1TB) for small workloads $122/mo Use Standard SSD or right-size

GCP#

Resource Common Mistake Monthly Cost Fix
GKE cluster Management fee $73/mo Cannot avoid (or use Autopilot)
Cloud NAT Per-VM charge + data $32/mo base Limit in dev
Cloud SQL db-custom-4-16384 in dev $230/mo Use db-f1-micro or shared-core for dev
Cloud Armor Per-policy + per-request $7/mo + usage Dev does not need DDoS protection
Persistent Disk SSD 500GB per node $85/mo each Right-size, use standard PD in dev

Right-Sizing Patterns#

Environment-Based Sizing#

variable "environment" {
  type = string
}

locals {
  sizing = {
    dev = {
      instance_type    = "t3.small"
      db_instance      = "db.t3.micro"
      node_count       = 1
      disk_size        = 20
      multi_az         = false
      nat_gateway_count = 1
    }
    staging = {
      instance_type    = "t3.medium"
      db_instance      = "db.t3.small"
      node_count       = 2
      disk_size        = 50
      multi_az         = false
      nat_gateway_count = 1
    }
    prod = {
      instance_type    = "r6g.large"
      db_instance      = "db.r6g.large"
      node_count       = 3
      disk_size        = 200
      multi_az         = true
      nat_gateway_count = 3  # one per AZ
    }
  }

  config = local.sizing[var.environment]
}

Conditional Expensive Resources#

# NAT Gateway: 1 in dev, per-AZ in prod
resource "aws_nat_gateway" "main" {
  count     = local.config.nat_gateway_count
  subnet_id = aws_subnet.public[count.index].id
  # ...
}

# WAF: prod only
resource "aws_wafv2_web_acl" "main" {
  count = var.environment == "prod" ? 1 : 0
  # ...
}

# Multi-AZ RDS: prod only
resource "aws_db_instance" "main" {
  instance_class = local.config.db_instance
  multi_az       = local.config.multi_az
  # ...
}

Tagging for Cost Allocation#

Without tags, your cloud bill is a single number. With tags, you can attribute costs to teams, projects, and environments.

Required Tags#

locals {
  required_tags = {
    Environment = var.environment
    Project     = var.project
    Team        = var.team
    ManagedBy   = "terraform"
    CostCenter  = var.cost_center
  }
}

# AWS — apply via provider default_tags
provider "aws" {
  default_tags {
    tags = local.required_tags
  }
}

# Azure — apply via variable
resource "azurerm_resource_group" "main" {
  name     = "${var.project}-${var.environment}-rg"
  location = var.location
  tags     = local.required_tags
}

# GCP — apply via labels
resource "google_compute_instance" "app" {
  labels = local.required_tags
  # ...
}

Enforcing Tags with Policy#

# AWS — SCP to deny untagged resources
resource "aws_organizations_policy" "require_tags" {
  name = "require-cost-allocation-tags"
  content = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Sid       = "DenyUntaggedResources"
      Effect    = "Deny"
      Action    = ["ec2:RunInstances", "rds:CreateDBInstance", "s3:CreateBucket"]
      Resource  = "*"
      Condition = {
        "Null" = {
          "aws:RequestTag/CostCenter" = "true"
        }
      }
    }]
  })
}

Reserved Capacity Decisions#

When to Reserve#

Signal Action
Resource running 24/7 for 6+ months Consider 1-year reserved
Resource running 24/7 for 12+ months with stable sizing Consider 3-year reserved
Workload is bursty or experimental Stay on-demand
Planning to change instance size Wait until sizing is stable
Using spot-tolerant workloads (batch, CI/CD) Use spot instances, not reserved

Terraform and Reserved Instances#

Reserved instances (AWS RIs, Azure Reserved VM Instances, GCP CUDs) are billing constructs — they are not managed by Terraform. Terraform creates on-demand instances, and the billing discount applies automatically if a matching reservation exists.

# This is an on-demand instance in Terraform
# If you have a matching RI, AWS applies the discount automatically
resource "aws_instance" "app" {
  instance_type = "r6g.large"  # matches your RI? discount applies
  # ...
}

Do not use instance_market_options for reserved instances — that is for spot instances. Reserved instances are managed through the AWS/Azure/GCP billing console, not Terraform.

Agent Cost Awareness Workflow#

When an agent writes Terraform that creates cloud resources:

  1. Choose the smallest viable size — start with t3.small / Standard_B2s / e2-small for dev. Scale up based on load testing, not guessing
  2. Check per-environment sizing — dev should be significantly smaller than prod
  3. Count the NAT Gateways — one per environment for dev, per-AZ for prod
  4. Check for always-on expensive resources — firewalls, application gateways, premium features
  5. Add cost allocation tags — every resource should be attributable to a team and project
  6. Recommend Infracost — if not already in the CI/CD pipeline, suggest adding it
  7. Flag resources over $100/month — call out expensive resources explicitly in plan summaries