Agent-Oriented Terraform#

Most Terraform code is written by humans for humans. It favors abstraction, DRY principles, and deep module nesting — patterns that make sense when a human maintains a mental model of the codebase. Agents do not maintain mental models. They read code fresh each time, trace references to resolve dependencies, and reason about the full resource graph in a single context window.

The patterns that make Terraform elegant for humans make it expensive for agents. Deep module nesting multiplies the files an agent must read. Variable threading through three layers of modules hides dependencies behind indirection. Complex for_each over maps of objects creates resources that are invisible until runtime. The agent spends most of its context on navigation, not comprehension.

This article describes Terraform patterns optimized for agent authorship and maintenance — flatter, more explicit, with state decomposition that enables parallel operations.

The Problem with Human-Oriented Abstraction#

Consider a typical human-authored infrastructure setup:

# Human-oriented: one line calls 40+ resources across 6 nested modules
module "platform" {
  source      = "./modules/platform"
  environment = var.environment
  config      = local.platform_config[var.environment]
}

What an agent sees: one module call with an opaque config object. To understand what resources exist, the agent must:

Read modules/platform/main.tf — find 3 child module calls
Read modules/platform/modules/networking/main.tf — find VPC, subnets, NAT, routes
Read modules/platform/modules/compute/main.tf — find EKS, node groups
Read modules/platform/modules/database/main.tf — find RDS, security groups
Trace var.config through all 4 levels to understand what values each resource gets
Trace outputs back up 3 levels to see what is exposed

That is 8-12 files and 3,000-10,000 tokens of context just to understand what already exists — before making any changes. The human who wrote this can hold the mental model. The agent reconstructs it from scratch every session.

What Agents Struggle With#

Pattern	Why It Is Hard for Agents	Token Cost
Deep module nesting (3+ levels)	Each level requires reading additional files, tracing variables and outputs through layers	2,000-5,000 per level
`for_each` over complex maps	Resources are invisible until the map is mentally unrolled; the agent cannot see resource addresses without evaluating the expression	500-2,000 per dynamic block
Variable threading through modules	A value flows `root → module A → module B → resource`; the agent must chase the chain to find where a value originates or is consumed	1,000-3,000 per chain
`dynamic` blocks with conditionals	Generated blocks are invisible in the code; the agent must evaluate conditions to know which blocks exist	500-1,500 per dynamic block
`local` expressions with nested `merge`/`lookup`	Intermediate values hide the final computed result; the agent cannot see what a resource actually gets without evaluating the full chain	500-1,000 per complex local

What Agents Handle Easily#

Pattern	Why It Works	Token Cost
Direct resource references (`aws_vpc.main.id`)	One hop — the agent reads the resource and knows everything	50-100 per reference
Explicit resource blocks (no dynamic generation)	Every resource is visible in the code; `terraform state list` matches what the agent reads	100-300 per resource
Flat file organization (all resources in one directory)	No file navigation needed; the full picture is in 2-4 files	500-2,000 total
Simple `for_each` over a list of strings	Resources are predictable; `aws_subnet.private["us-east-1a"]` is obvious from the code	100-200 per for_each
Outputs defined next to the resources they expose	No tracing required; the output is next to the source	50-100 per output

Linear Terraform Patterns#

Linear Terraform is explicit, flat, and readable top-to-bottom. Every resource is visible. Every dependency is a direct single-hop reference. The agent reads one directory of files and sees the complete infrastructure.

Pattern 1: Flat Resource Definitions#

Instead of nesting modules three levels deep, define resources directly:

# networking.tf — all networking resources, explicit and visible

resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  tags = {
    Name        = "production-vpc"
    Environment = "production"
    ManagedBy   = "terraform"
  }
}

resource "aws_subnet" "private_a" {
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.1.0/24"
  availability_zone = "us-east-1a"
  tags              = { Name = "production-private-us-east-1a" }
}

resource "aws_subnet" "private_b" {
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.2.0/24"
  availability_zone = "us-east-1b"
  tags              = { Name = "production-private-us-east-1b" }
}

resource "aws_subnet" "public_a" {
  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.0.101.0/24"
  availability_zone       = "us-east-1a"
  map_public_ip_on_launch = true
  tags                    = { Name = "production-public-us-east-1a" }
}

resource "aws_subnet" "public_b" {
  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.0.102.0/24"
  availability_zone       = "us-east-1b"
  map_public_ip_on_launch = true
  tags                    = { Name = "production-public-us-east-1b" }
}

A human looks at this and says “not DRY — you repeated the subnet block four times.” An agent looks at this and sees: four subnets, each with its explicit CIDR and AZ, each directly referencing aws_vpc.main.id. No loops to unroll. No maps to dereference. The dependency graph is visible in the code.

The cost of repetition is low when agents write and maintain the code. Changing a CIDR across all subnets is a simple find-and-replace — trivial for an agent. Adding a subnet is copying a block and changing three values — also trivial.

Pattern 2: Direct References Over Output Chains#

# Human pattern: outputs chained through modules
# To find what subnet the EKS cluster uses, trace:
#   module.platform.module.networking.aws_subnet.private → output subnet_ids →
#   module.platform output networking_subnet_ids → module.platform.module.compute input

# Agent pattern: direct reference, one hop
resource "aws_eks_cluster" "main" {
  name     = "production"
  role_arn = aws_iam_role.eks_cluster.arn

  vpc_config {
    subnet_ids = [
      aws_subnet.private_a.id,
      aws_subnet.private_b.id,
    ]
    security_group_ids = [aws_security_group.eks_cluster.id]
  }
}

Every reference is resource_type.name.attribute — one hop. The agent does not need to read module outputs, follow variable threading, or understand module composition. The EKS cluster uses these specific subnets and this specific security group.

Pattern 3: Simple for_each When Repetition Is Truly Uniform#

Linear does not mean never using loops. When resources are genuinely identical except for one key, for_each over a simple list is clear:

# Good: simple for_each, resources are predictable
variable "availability_zones" {
  type    = list(string)
  default = ["us-east-1a", "us-east-1b"]
}

resource "aws_nat_gateway" "main" {
  for_each      = toset(var.availability_zones)
  allocation_id = aws_eip.nat[each.key].id
  subnet_id     = aws_subnet.public[each.key].id
  tags          = { Name = "nat-${each.key}" }
}

The resource addresses are obvious: aws_nat_gateway.main["us-east-1a"], aws_nat_gateway.main["us-east-1b"]. The agent knows exactly what resources exist without evaluating complex expressions.

# Bad: for_each over a complex map of objects
variable "services" {
  type = map(object({
    port     = number
    protocol = string
    health   = optional(object({ path = string, interval = number }))
    scaling  = optional(object({ min = number, max = number, target_cpu = number }))
  }))
}

resource "aws_ecs_service" "main" {
  for_each = var.services
  # ... 30 lines of each.value.this and each.value.that
  # With optional nested objects requiring try() and coalesce()
}

This is a loop generating invisible resources whose attributes depend on a complex nested object. The agent cannot know what services exist, what ports they use, or what their scaling config is without evaluating the variable. Write the services explicitly instead.

Pattern 4: File Organization by Concern, Not Abstraction#

# Agent-oriented: flat, by concern
infrastructure/
├── networking/
│   ├── providers.tf
│   ├── backend.tf        # key = "networking/terraform.tfstate"
│   ├── vpc.tf            # VPC, subnets, route tables, NAT, IGW
│   ├── security-groups.tf
│   ├── outputs.tf
│   └── variables.tf
├── database/
│   ├── providers.tf
│   ├── backend.tf        # key = "database/terraform.tfstate"
│   ├── rds.tf            # RDS instance, parameter group, subnet group
│   ├── data.tf           # remote_state for networking (read VPC/subnet IDs)
│   ├── outputs.tf
│   └── variables.tf
├── compute/
│   ├── providers.tf
│   ├── backend.tf        # key = "compute/terraform.tfstate"
│   ├── eks.tf            # EKS cluster, node groups, IRSA roles
│   ├── data.tf           # remote_state for networking and database
│   ├── outputs.tf
│   └── variables.tf
└── application/
    ├── providers.tf
    ├── backend.tf        # key = "application/terraform.tfstate"
    ├── helm.tf           # Helm releases
    ├── k8s.tf            # Kubernetes resources
    ├── data.tf           # remote_state for compute
    └── variables.tf

Each directory is an independent root module with its own state file. An agent working on the database does not need to read networking code — it reads the data.tf file which declares exactly what values it imports from the networking state.

State Decomposition for Parallelism#

The single most impactful structural change for agent-managed infrastructure: separate state files per concern.

The Monolith State Problem#

infrastructure/
└── main.tf          # 80 resources, 1 state file, 1 lock

Problems:

Single lock: Only one plan or apply can run at a time. Two agents cannot work in parallel. Two CI pipelines block each other.
Large blast radius: Every apply could modify any of 80 resources. A mistake affects everything.
Slow plans: Terraform refreshes all 80 resources on every plan, even if you only changed one.
Conflict-prone: Any code change is potentially a merge conflict because everything is in one directory.

Decomposed State#

networking/   → 15 resources, own state, own lock
database/     → 10 resources, own state, own lock
compute/      → 20 resources, own state, own lock
application/  → 35 resources, own state, own lock

Benefits:

Parallel operations: Four agents (or four CI jobs) can plan and apply simultaneously
Bounded blast radius: An apply in database/ cannot affect networking or compute
Fast plans: Each plan refreshes only 10-20 resources instead of 80
Independent changes: Code changes to database/ do not conflict with changes to compute/

Cross-State References#

Decomposed state requires explicit data sharing between root modules. Use terraform_remote_state or, better, SSM Parameter Store / data sources:

# database/data.tf — read networking outputs
data "terraform_remote_state" "networking" {
  backend = "s3"
  config = {
    bucket = "myorg-tfstate"
    key    = "networking/terraform.tfstate"
    region = "us-east-1"
  }
}

# database/rds.tf — use networking values
resource "aws_db_subnet_group" "main" {
  name       = "production-db-subnets"
  subnet_ids = data.terraform_remote_state.networking.outputs.private_subnet_ids
}

The dependency is explicit: database depends on networking’s outputs. The agent can see exactly what crosses the boundary.

Apply Ordering#

With decomposed state, apply order matters:

networking (no dependencies)
    ↓
database (depends on networking)     compute (depends on networking)
    ↓                                     ↓
              application (depends on database + compute)

Networking first. Database and compute in parallel (both depend only on networking). Application last. An agent orchestrating this applies networking, then fans out to database and compute simultaneously, then applies application.

When Modules Still Make Sense#

Linear does not mean “never use modules.” Modules are valuable when they provide genuine reuse across multiple root modules or when a community module handles edge cases you should not re-derive:

Use Modules For#

Community modules with significant edge case handling: terraform-aws-modules/vpc/aws handles 30+ edge cases (NAT gateway placement, IPv6, VPC endpoints) that you would otherwise miss.
Cross-project reuse: If your organization deploys the same RDS pattern across 10 projects, a shared module prevents drift between them.
Encapsulating vendor-specific complexity: A module that wraps the 15 resources needed for an EKS cluster is easier to consume than writing those 15 resources in every project.

Do Not Use Modules For#

DRY within a single project: If you have 3 similar subnets, write 3 resource blocks. The “savings” of a module is 10 lines of HCL. The cost is an additional directory, variable threading, and output wiring.
Abstracting away detail: A module called platform that creates everything is not an abstraction — it is a hiding place. The agent cannot reason about what it creates without reading inside it.
One-level indirection “just in case”: Wrapping every resource in a module for hypothetical future reuse adds overhead with no current benefit.

Module Depth Rule#

If you use modules, limit to one level of nesting. The root module calls child modules. Child modules do not call their own child modules.

# Good: one level
root/
├── main.tf           # calls module "vpc", module "eks", module "rds"
└── modules/
    ├── vpc/          # contains resources, no child modules
    ├── eks/          # contains resources, no child modules
    └── rds/          # contains resources, no child modules

# Bad: three levels
root/
├── main.tf           # calls module "platform"
└── modules/
    └── platform/     # calls module "networking", module "compute"
        ├── networking/  # calls module "vpc", module "subnets"
        │   ├── vpc/
        │   └── subnets/
        └── compute/

One level means the agent reads the root module and one directory per component. Three levels means the agent reads the root module, the platform module, the networking module, the vpc module, the subnets module — five directories to understand a VPC.

The Human Role Shift#

When agents write and maintain Terraform, the human role changes from writer to reviewer:

Task	Human-Authored Era	Agent-Authored Era
Writing HCL	Human writes, reviews own work	Agent writes, human reviews plan output
Deciding structure	Human chooses module boundaries	Human approves decomposition, agent implements
Applying changes	Human runs `apply` or approves CI	Human reviews plan, approves apply (unchanged)
Debugging drift	Human investigates manually	Agent investigates, presents findings with options
Refactoring	Human plans and executes moves	Agent proposes moves, human approves, agent executes

For the reviewer role, linear Terraform is strictly better. The plan output maps directly to visible resources in the code. No “module.platform will be updated” black boxes — every resource change is traceable to an explicit resource block.

Migration Path: Existing Nested Code#

You do not need to rewrite your Terraform overnight. Migrate incrementally:

New root modules: Write new infrastructure in the linear pattern. Do not extend the existing monolith.
State decomposition: Extract independent concerns into their own root modules using terraform state mv and moved blocks.
Module flattening: When modifying a deeply nested section, flatten it as part of the change. Replace the module call with explicit resources.
Gradual: Each pull request makes one section more linear. Over weeks, the codebase shifts.

The goal is not purity — it is reducing the context cost for whoever (human or agent) works on the code next.