AWS Terraform Patterns#
AWS is the most common Terraform target and the most complex. It has more services, more configuration options, and more subtle gotchas than Azure or GCP. This article covers the AWS-specific patterns that agents need to write correct, secure Terraform — with emphasis on the mistakes that cause real production issues.
IAM: The Foundation of Everything#
Every AWS resource that does anything needs IAM permissions. The two patterns agents must know: service roles (letting AWS services act on your behalf) and IRSA (letting Kubernetes pods assume IAM roles).
Service Role Pattern#
# The role: "who can assume this role?"
resource "aws_iam_role" "lambda_exec" {
name = "my-lambda-execution-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = { Service = "lambda.amazonaws.com" }
}]
})
}
# The policy: "what can this role do?"
resource "aws_iam_role_policy_attachment" "lambda_basic" {
role = aws_iam_role.lambda_exec.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
}
# Custom policy for specific permissions
resource "aws_iam_policy" "lambda_s3_access" {
name = "lambda-s3-read-access"
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = ["s3:GetObject", "s3:ListBucket"]
Resource = [
aws_s3_bucket.data.arn,
"${aws_s3_bucket.data.arn}/*",
]
}]
})
}
resource "aws_iam_role_policy_attachment" "lambda_s3" {
role = aws_iam_role.lambda_exec.name
policy_arn = aws_iam_policy.lambda_s3_access.arn
}Gotcha: The assume_role_policy (trust policy) defines WHO can use the role. The attached policies define WHAT the role can do. Confusing these is the #1 IAM mistake.
Gotcha: S3 permissions need both the bucket ARN (s3:ListBucket) and the objects ARN with /* suffix (s3:GetObject). Missing the /* is a silent permission denial.
IRSA (IAM Roles for Service Accounts)#
IRSA lets Kubernetes pods assume IAM roles without storing credentials:
# Enable OIDC provider for the EKS cluster
data "tls_certificate" "eks" {
url = aws_eks_cluster.main.identity[0].oidc[0].issuer
}
resource "aws_iam_openid_connect_provider" "eks" {
client_id_list = ["sts.amazonaws.com"]
thumbprint_list = [data.tls_certificate.eks.certificates[0].sha1_fingerprint]
url = aws_eks_cluster.main.identity[0].oidc[0].issuer
}
# Role that a specific K8s service account can assume
resource "aws_iam_role" "app_role" {
name = "my-app-pod-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = {
Federated = aws_iam_openid_connect_provider.eks.arn
}
Action = "sts:AssumeRoleWithWebIdentity"
Condition = {
StringEquals = {
"${replace(aws_eks_cluster.main.identity[0].oidc[0].issuer, "https://", "")}:sub" =
"system:serviceaccount:default:my-app"
"${replace(aws_eks_cluster.main.identity[0].oidc[0].issuer, "https://", "")}:aud" =
"sts.amazonaws.com"
}
}
}]
})
}
# Kubernetes service account annotated with the IAM role
resource "kubernetes_service_account" "app" {
metadata {
name = "my-app"
namespace = "default"
annotations = {
"eks.amazonaws.com/role-arn" = aws_iam_role.app_role.arn
}
}
}Gotcha: The Condition in the trust policy must match the exact namespace and service account name. A typo means the pod silently fails to assume the role.
Gotcha: The OIDC thumbprint changes when EKS rotates certificates. Monitor for InvalidIdentityToken errors.
Security Groups#
Principle: Separate Groups by Purpose#
# One security group per logical role
resource "aws_security_group" "alb" {
name = "production-alb-sg"
vpc_id = aws_vpc.main.id
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
description = "HTTPS from internet"
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
description = "Allow all outbound"
}
}
resource "aws_security_group" "app" {
name = "production-app-sg"
vpc_id = aws_vpc.main.id
ingress {
from_port = 8080
to_port = 8080
protocol = "tcp"
security_groups = [aws_security_group.alb.id] # only from ALB
description = "App port from ALB only"
}
}
resource "aws_security_group" "database" {
name = "production-database-sg"
vpc_id = aws_vpc.main.id
ingress {
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.app.id] # only from app
description = "PostgreSQL from app tier only"
}
}Gotcha: Security group rules referencing other security groups create implicit dependencies. Terraform usually handles this, but circular references (A allows B, B allows A) require aws_security_group_rule resources instead of inline rules.
Gotcha: The default security group of a VPC allows all traffic between members. If you do not tighten it, any resource in the VPC can talk to any other.
S3 Buckets#
resource "aws_s3_bucket" "data" {
bucket = "myorg-production-data"
lifecycle {
prevent_destroy = true
}
}
resource "aws_s3_bucket_versioning" "data" {
bucket = aws_s3_bucket.data.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "data" {
bucket = aws_s3_bucket.data.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
}
}
}
resource "aws_s3_bucket_public_access_block" "data" {
bucket = aws_s3_bucket.data.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}Gotcha: Since AWS provider v4, bucket properties are separate resources (aws_s3_bucket_versioning, aws_s3_bucket_server_side_encryption_configuration, etc.) — not inline arguments. Agents using inline arguments will get deprecation warnings or errors.
Gotcha: Always include aws_s3_bucket_public_access_block with all four flags set to true unless the bucket genuinely needs public access. Checkov will flag this.
Common AWS Terraform Gotchas#
| Gotcha | Symptom | Fix |
|---|---|---|
Missing depends_on for IAM |
AccessDeniedException during apply |
Add depends_on to resources that need IAM roles/policies |
| EKS cluster creation timeout | Apply hangs for 15+ minutes | Normal — EKS takes 10-15 min. Set timeout > 20 min |
| Subnet tag missing for EKS LB | ALB/NLB in EKS doesn’t find subnets | Add kubernetes.io/role/elb and kubernetes.io/role/internal-elb tags |
RDS final_snapshot_identifier required |
Cannot destroy RDS without setting snapshot name | Set skip_final_snapshot = true only for dev |
| S3 bucket name globally unique | BucketAlreadyExists error |
Prefix with org name and environment |
| NAT Gateway costs with no traffic | $32/mo/AZ even idle | Use 1 NAT Gateway in dev, per-AZ in prod |
| Default VPC security group | Unexpected open access | Import and tighten the default SG or explicitly ignore it |
| Auto-scaling desired_capacity drift | Plan shows changes every run | Add ignore_changes = [desired_capacity] |
| EBS volumes outlive instances | Orphaned volumes after destroy |
Check for delete_on_termination = true on root volumes |
| Cross-AZ data transfer costs | Surprise charges on bill | Place communicating resources in same AZ when possible |