Terraform Cost Management#
The most expensive line in your cloud bill was written in a .tf file. A single instance_type choice, a forgotten NAT Gateway, or an over-provisioned RDS instance can cost thousands per month — and none of these show up in terraform plan. Plan shows what changes. It does not show what it costs.
This article covers how to write cost-aware Terraform and catch expensive decisions before they reach production.
The Cost Visibility Gap#
terraform plan output:
# aws_instance.app will be created
+ resource "aws_instance" "app" {
+ instance_type = "r6g.2xlarge"
...
}
Plan: 1 to add, 0 to change, 0 to destroy.What the plan does not tell you: r6g.2xlarge costs $0.4032/hr = $294/month. A t3.medium would handle the workload at $0.0416/hr = $30/month.
Infracost: Cost Estimates in the Workflow#
Infracost reads your Terraform plan and estimates the monthly cost of every resource.
Setup#
# Install
brew install infracost # macOS
# or
curl -fsSL https://raw.githubusercontent.com/infracost/infracost/master/scripts/install.sh | sh
# Register for free API key
infracost auth login
# Generate cost estimate from a plan
cd infrastructure/
terraform plan -out=tfplan
infracost breakdown --path=.Example Output#
Project: infrastructure
Name Monthly Qty Unit Monthly Cost
aws_instance.app
├─ Instance usage (Linux/UNIX, on-demand, r6g.2xlarge) 730 hours $294.34
├─ root_block_device
│ └─ Storage (general purpose SSD, gp3) 50 GB $4.00
└─ ebs_block_device[0]
└─ Storage (general purpose SSD, gp3) 200 GB $16.00
aws_nat_gateway.main
├─ NAT gateway 730 hours $32.85
└─ Data processed Monthly cost depends on usage
aws_db_instance.main
├─ Database instance (on-demand, db.r6g.large) 730 hours $175.20
└─ Storage (general purpose SSD, gp3) 100 GB $11.50
OVERALL TOTAL $533.89
──────────────────────────────────
12 cloud resources were detected:
∙ 3 were estimated, 9 were free or had no cost data.Infracost in CI/CD#
# GitHub Actions — post cost estimate as PR comment
- name: Infracost Breakdown
run: |
infracost breakdown --path=. \
--format=json \
--out-file=/tmp/infracost.json
- name: Infracost Comment
run: |
infracost comment github \
--path=/tmp/infracost.json \
--repo=${{ github.repository }} \
--pull-request=${{ github.event.pull_request.number }} \
--github-token=${{ secrets.GITHUB_TOKEN }} \
--behavior=updateInfracost Policy (Budget Guardrails)#
# infracost.yml — fail PR if cost exceeds threshold
version: 0.1
policies:
- path: infrastructure/
max_monthly_cost: 1000 # fail if estimated cost > $1000/monthThe Most Expensive Resources#
Resources that cause the biggest bill surprises:
AWS#
| Resource | Common Mistake | Monthly Cost | Fix |
|---|---|---|---|
| NAT Gateway | One per AZ in dev | $32/mo each, idle | Use 1 in dev, per-AZ in prod only |
| RDS Multi-AZ | Enabled in dev | 2x instance cost | multi_az = false for dev |
| EBS volumes | gp3 200GB per instance | $16/mo each | Right-size, delete unattached |
| EKS cluster | Cluster fee exists even empty | $73/mo | Cannot avoid, factor into budget |
| Elastic IP | Allocated but unattached | $3.65/mo each | Attach or release |
| CloudWatch Logs | High-verbosity logging | $0.50/GB ingested | Reduce log levels in dev |
| Data transfer | Cross-AZ traffic | $0.01/GB | Place communicating resources in same AZ |
Azure#
| Resource | Common Mistake | Monthly Cost | Fix |
|---|---|---|---|
| AKS node pool | Standard_D4s_v5 in dev | $140/mo per node | Use Standard_B2s for dev |
| Azure Firewall | Always-on in dev | $912/mo | Use NSGs in dev, Firewall in prod only |
| Log Analytics | Ingesting everything | $2.76/GB | Configure data collection rules |
| App Gateway v2 | Running in dev | $175/mo | Use simple LB in dev |
| Premium SSD | P30 (1TB) for small workloads | $122/mo | Use Standard SSD or right-size |
GCP#
| Resource | Common Mistake | Monthly Cost | Fix |
|---|---|---|---|
| GKE cluster | Management fee | $73/mo | Cannot avoid (or use Autopilot) |
| Cloud NAT | Per-VM charge + data | $32/mo base | Limit in dev |
| Cloud SQL | db-custom-4-16384 in dev | $230/mo | Use db-f1-micro or shared-core for dev |
| Cloud Armor | Per-policy + per-request | $7/mo + usage | Dev does not need DDoS protection |
| Persistent Disk | SSD 500GB per node | $85/mo each | Right-size, use standard PD in dev |
Right-Sizing Patterns#
Environment-Based Sizing#
variable "environment" {
type = string
}
locals {
sizing = {
dev = {
instance_type = "t3.small"
db_instance = "db.t3.micro"
node_count = 1
disk_size = 20
multi_az = false
nat_gateway_count = 1
}
staging = {
instance_type = "t3.medium"
db_instance = "db.t3.small"
node_count = 2
disk_size = 50
multi_az = false
nat_gateway_count = 1
}
prod = {
instance_type = "r6g.large"
db_instance = "db.r6g.large"
node_count = 3
disk_size = 200
multi_az = true
nat_gateway_count = 3 # one per AZ
}
}
config = local.sizing[var.environment]
}Conditional Expensive Resources#
# NAT Gateway: 1 in dev, per-AZ in prod
resource "aws_nat_gateway" "main" {
count = local.config.nat_gateway_count
subnet_id = aws_subnet.public[count.index].id
# ...
}
# WAF: prod only
resource "aws_wafv2_web_acl" "main" {
count = var.environment == "prod" ? 1 : 0
# ...
}
# Multi-AZ RDS: prod only
resource "aws_db_instance" "main" {
instance_class = local.config.db_instance
multi_az = local.config.multi_az
# ...
}Tagging for Cost Allocation#
Without tags, your cloud bill is a single number. With tags, you can attribute costs to teams, projects, and environments.
Required Tags#
locals {
required_tags = {
Environment = var.environment
Project = var.project
Team = var.team
ManagedBy = "terraform"
CostCenter = var.cost_center
}
}
# AWS — apply via provider default_tags
provider "aws" {
default_tags {
tags = local.required_tags
}
}
# Azure — apply via variable
resource "azurerm_resource_group" "main" {
name = "${var.project}-${var.environment}-rg"
location = var.location
tags = local.required_tags
}
# GCP — apply via labels
resource "google_compute_instance" "app" {
labels = local.required_tags
# ...
}Enforcing Tags with Policy#
# AWS — SCP to deny untagged resources
resource "aws_organizations_policy" "require_tags" {
name = "require-cost-allocation-tags"
content = jsonencode({
Version = "2012-10-17"
Statement = [{
Sid = "DenyUntaggedResources"
Effect = "Deny"
Action = ["ec2:RunInstances", "rds:CreateDBInstance", "s3:CreateBucket"]
Resource = "*"
Condition = {
"Null" = {
"aws:RequestTag/CostCenter" = "true"
}
}
}]
})
}Reserved Capacity Decisions#
When to Reserve#
| Signal | Action |
|---|---|
| Resource running 24/7 for 6+ months | Consider 1-year reserved |
| Resource running 24/7 for 12+ months with stable sizing | Consider 3-year reserved |
| Workload is bursty or experimental | Stay on-demand |
| Planning to change instance size | Wait until sizing is stable |
| Using spot-tolerant workloads (batch, CI/CD) | Use spot instances, not reserved |
Terraform and Reserved Instances#
Reserved instances (AWS RIs, Azure Reserved VM Instances, GCP CUDs) are billing constructs — they are not managed by Terraform. Terraform creates on-demand instances, and the billing discount applies automatically if a matching reservation exists.
# This is an on-demand instance in Terraform
# If you have a matching RI, AWS applies the discount automatically
resource "aws_instance" "app" {
instance_type = "r6g.large" # matches your RI? discount applies
# ...
}Do not use instance_market_options for reserved instances — that is for spot instances. Reserved instances are managed through the AWS/Azure/GCP billing console, not Terraform.
Agent Cost Awareness Workflow#
When an agent writes Terraform that creates cloud resources:
- Choose the smallest viable size — start with
t3.small/Standard_B2s/e2-smallfor dev. Scale up based on load testing, not guessing - Check per-environment sizing — dev should be significantly smaller than prod
- Count the NAT Gateways — one per environment for dev, per-AZ for prod
- Check for always-on expensive resources — firewalls, application gateways, premium features
- Add cost allocation tags — every resource should be attributable to a team and project
- Recommend Infracost — if not already in the CI/CD pipeline, suggest adding it
- Flag resources over $100/month — call out expensive resources explicitly in plan summaries