Terraform Cloud Architecture Patterns#
The three-tier architecture — networking, managed Kubernetes, managed database — is the most common pattern for production deployments on any major cloud. The concepts are identical across AWS, Azure, and GCP. The Terraform code is not. Resource names differ, required arguments differ, default behaviors differ, and the gotchas that catch agents and humans are cloud-specific.
This article shows the real Terraform for each layer on each cloud, side by side, so agents can write correct infrastructure code for whichever cloud the user deploys to.
Concept Mapping#
| Concept | AWS | Azure | GCP |
|---|---|---|---|
| Virtual network | VPC | Virtual Network (VNET) | VPC Network |
| Private subnet | Subnet + route table | Subnet (within VNET) | Subnetwork |
| Internet gateway | Internet Gateway | (implicit in VNET) | (implicit in VPC) |
| NAT for private subnets | NAT Gateway + EIP | NAT Gateway | Cloud NAT + Router |
| Managed Kubernetes | EKS | AKS | GKE |
| Node pools | Managed Node Groups | Agent Pools (default + user) | Node Pools |
| Managed database | RDS / Aurora | Azure Database for PostgreSQL Flexible | Cloud SQL |
| IAM for workloads | IRSA (IAM Roles for Service Accounts) | Azure AD Workload Identity | GKE Workload Identity |
| DNS | Route 53 | Azure DNS | Cloud DNS |
| Object storage | S3 | Blob Storage | Cloud Storage |
| Load balancer | ALB / NLB | Azure Load Balancer | GCP Load Balancer |
Layer 1: Networking#
AWS VPC#
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = { Name = "production-vpc" }
}
resource "aws_subnet" "private_a" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.1.0/24"
availability_zone = "us-east-1a"
tags = { Name = "private-us-east-1a", "kubernetes.io/role/internal-elb" = "1" }
}
resource "aws_subnet" "private_b" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.2.0/24"
availability_zone = "us-east-1b"
tags = { Name = "private-us-east-1b", "kubernetes.io/role/internal-elb" = "1" }
}
resource "aws_subnet" "public_a" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.101.0/24"
availability_zone = "us-east-1a"
map_public_ip_on_launch = true
tags = { Name = "public-us-east-1a", "kubernetes.io/role/elb" = "1" }
}
resource "aws_subnet" "public_b" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.102.0/24"
availability_zone = "us-east-1b"
map_public_ip_on_launch = true
tags = { Name = "public-us-east-1b", "kubernetes.io/role/elb" = "1" }
}
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
}
resource "aws_eip" "nat" {
domain = "vpc"
}
resource "aws_nat_gateway" "main" {
allocation_id = aws_eip.nat.id
subnet_id = aws_subnet.public_a.id
}
resource "aws_route_table" "private" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.main.id
}
}
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
}
resource "aws_route_table_association" "private_a" {
subnet_id = aws_subnet.private_a.id
route_table_id = aws_route_table.private.id
}
resource "aws_route_table_association" "private_b" {
subnet_id = aws_subnet.private_b.id
route_table_id = aws_route_table.private.id
}
resource "aws_route_table_association" "public_a" {
subnet_id = aws_subnet.public_a.id
route_table_id = aws_route_table.public.id
}
resource "aws_route_table_association" "public_b" {
subnet_id = aws_subnet.public_b.id
route_table_id = aws_route_table.public.id
}AWS gotchas:
- Subnets need
kubernetes.io/role/elbandkubernetes.io/role/internal-elbtags for EKS load balancer discovery - NAT Gateway costs ~$32/mo per AZ even with zero traffic — use one for dev, per-AZ for prod
- VPC has a default security group that allows all outbound — tighten it
Azure VNET#
resource "azurerm_resource_group" "main" {
name = "production-rg"
location = "eastus"
}
resource "azurerm_virtual_network" "main" {
name = "production-vnet"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
address_space = ["10.0.0.0/16"]
}
resource "azurerm_subnet" "aks" {
name = "aks-subnet"
resource_group_name = azurerm_resource_group.main.name
virtual_network_name = azurerm_virtual_network.main.name
address_prefixes = ["10.0.1.0/24"]
}
resource "azurerm_subnet" "database" {
name = "database-subnet"
resource_group_name = azurerm_resource_group.main.name
virtual_network_name = azurerm_virtual_network.main.name
address_prefixes = ["10.0.2.0/24"]
delegation {
name = "postgresql-delegation"
service_delegation {
name = "Microsoft.DBforPostgreSQL/flexibleServers"
actions = ["Microsoft.Network/virtualNetworks/subnets/join/action"]
}
}
}
resource "azurerm_nat_gateway" "main" {
name = "production-natgw"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
sku_name = "Standard"
}
resource "azurerm_public_ip" "nat" {
name = "nat-public-ip"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
allocation_method = "Static"
sku = "Standard"
}
resource "azurerm_nat_gateway_public_ip_association" "main" {
nat_gateway_id = azurerm_nat_gateway.main.id
public_ip_address_id = azurerm_public_ip.nat.id
}
resource "azurerm_subnet_nat_gateway_association" "aks" {
subnet_id = azurerm_subnet.aks.id
nat_gateway_id = azurerm_nat_gateway.main.id
}Azure gotchas:
- Everything requires a Resource Group — create it first
- Subnets for Azure Database for PostgreSQL Flexible Server need a delegation block
- AKS with Azure CNI requires a subnet large enough for pods (each pod gets a VNET IP) — /24 is only 251 pods
- No explicit Internet Gateway — VNETs have implicit internet access via Azure backbone
GCP VPC#
resource "google_compute_network" "main" {
name = "production-vpc"
auto_create_subnetworks = false # custom mode — we define our own subnets
}
resource "google_compute_subnetwork" "gke" {
name = "gke-subnet"
region = "us-central1"
network = google_compute_network.main.id
ip_cidr_range = "10.0.1.0/24"
secondary_ip_range {
range_name = "pods"
ip_cidr_range = "10.1.0.0/16" # GKE pods get secondary range
}
secondary_ip_range {
range_name = "services"
ip_cidr_range = "10.2.0.0/20" # GKE services get secondary range
}
}
resource "google_compute_subnetwork" "database" {
name = "database-subnet"
region = "us-central1"
network = google_compute_network.main.id
ip_cidr_range = "10.0.2.0/24"
}
resource "google_compute_router" "main" {
name = "production-router"
region = "us-central1"
network = google_compute_network.main.id
}
resource "google_compute_router_nat" "main" {
name = "production-nat"
router = google_compute_router.main.name
region = "us-central1"
nat_ip_allocate_option = "AUTO_ONLY"
source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES"
}GCP gotchas:
auto_create_subnetworks = falseis required for custom networking — the default creates a subnet per region which you probably do not want- GKE requires secondary IP ranges on the subnet for pods and services (VPC-native mode)
- Cloud NAT requires a Cloud Router — two resources instead of one
- No Internet Gateway resource — internet access is implicit for instances with external IPs or via Cloud NAT
Layer 2: Managed Kubernetes#
AWS EKS#
resource "aws_iam_role" "eks_cluster" {
name = "production-eks-cluster-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = { Service = "eks.amazonaws.com" }
}]
})
}
resource "aws_iam_role_policy_attachment" "eks_cluster" {
role = aws_iam_role.eks_cluster.name
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
}
resource "aws_eks_cluster" "main" {
name = "production"
role_arn = aws_iam_role.eks_cluster.arn
version = "1.29"
vpc_config {
subnet_ids = [aws_subnet.private_a.id, aws_subnet.private_b.id]
endpoint_private_access = true
endpoint_public_access = true
}
depends_on = [aws_iam_role_policy_attachment.eks_cluster]
}
resource "aws_iam_role" "eks_nodes" {
name = "production-eks-node-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = { Service = "ec2.amazonaws.com" }
}]
})
}
resource "aws_iam_role_policy_attachment" "eks_worker" {
role = aws_iam_role.eks_nodes.name
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
}
resource "aws_iam_role_policy_attachment" "eks_cni" {
role = aws_iam_role.eks_nodes.name
policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
}
resource "aws_iam_role_policy_attachment" "ecr_read" {
role = aws_iam_role.eks_nodes.name
policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
}
resource "aws_eks_node_group" "main" {
cluster_name = aws_eks_cluster.main.name
node_group_name = "production-nodes"
node_role_arn = aws_iam_role.eks_nodes.arn
subnet_ids = [aws_subnet.private_a.id, aws_subnet.private_b.id]
instance_types = ["t3.large"]
scaling_config {
desired_size = 3
max_size = 6
min_size = 2
}
depends_on = [
aws_iam_role_policy_attachment.eks_worker,
aws_iam_role_policy_attachment.eks_cni,
aws_iam_role_policy_attachment.ecr_read,
]
}EKS gotchas:
- EKS requires explicit IAM roles for both the cluster and the node group — 2 roles, 4 policy attachments minimum
depends_onis required for IAM attachments or the cluster/node group creation races ahead of the policy- EKS cluster creation takes 10-15 minutes
- Cluster endpoint defaults to public — set
endpoint_public_access = falsefor private clusters
Azure AKS#
resource "azurerm_kubernetes_cluster" "main" {
name = "production-aks"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
dns_prefix = "production"
kubernetes_version = "1.29"
default_node_pool {
name = "default"
node_count = 3
vm_size = "Standard_D2s_v5"
vnet_subnet_id = azurerm_subnet.aks.id
min_count = 2
max_count = 6
enable_auto_scaling = true
}
identity {
type = "SystemAssigned"
}
network_profile {
network_plugin = "azure" # Azure CNI — pods get VNET IPs
service_cidr = "10.3.0.0/16"
dns_service_ip = "10.3.0.10"
}
oidc_issuer_enabled = true # for Workload Identity
workload_identity_enabled = true
}AKS gotchas:
- AKS is significantly simpler than EKS — no separate IAM roles to create;
identity { type = "SystemAssigned" }handles it - Azure CNI gives pods real VNET IPs, which means the subnet must be large enough — plan for
(max_pods_per_node × max_nodes)IPs service_cidrmust not overlap with the VNET address space- AKS creates a second resource group for node resources (MC_*) — do not delete it
GCP GKE#
resource "google_service_account" "gke_nodes" {
account_id = "gke-node-sa"
display_name = "GKE Node Service Account"
}
resource "google_project_iam_member" "gke_log_writer" {
project = var.project_id
role = "roles/logging.logWriter"
member = "serviceAccount:${google_service_account.gke_nodes.email}"
}
resource "google_project_iam_member" "gke_metric_writer" {
project = var.project_id
role = "roles/monitoring.metricWriter"
member = "serviceAccount:${google_service_account.gke_nodes.email}"
}
resource "google_container_cluster" "main" {
name = "production"
location = "us-central1"
network = google_compute_network.main.id
subnetwork = google_compute_subnetwork.gke.id
ip_allocation_policy {
cluster_secondary_range_name = "pods"
services_secondary_range_name = "services"
}
min_master_version = "1.29"
# We manage node pools separately
remove_default_node_pool = true
initial_node_count = 1
workload_identity_config {
workload_pool = "${var.project_id}.svc.id.goog"
}
}
resource "google_container_node_pool" "main" {
name = "production-nodes"
location = "us-central1"
cluster = google_container_cluster.main.name
node_count = 3
autoscaling {
min_node_count = 2
max_node_count = 6
}
node_config {
machine_type = "e2-standard-2"
service_account = google_service_account.gke_nodes.email
oauth_scopes = ["https://www.googleapis.com/auth/cloud-platform"]
workload_metadata_config {
mode = "GKE_METADATA" # enables Workload Identity
}
}
}GKE gotchas:
remove_default_node_pool = true+ separate node pool is the recommended pattern — the default pool cannot be customizedinitial_node_count = 1is required even withremove_default_node_pool— it creates then immediately deletes- Secondary IP ranges must be pre-configured on the subnet (done in networking layer)
- Workload Identity requires both cluster config (
workload_pool) and node config (GKE_METADATA)
Layer 3: Managed Database#
AWS RDS PostgreSQL#
resource "aws_db_subnet_group" "main" {
name = "production-db-subnets"
subnet_ids = [aws_subnet.private_a.id, aws_subnet.private_b.id]
}
resource "aws_security_group" "database" {
name = "production-database-sg"
vpc_id = aws_vpc.main.id
ingress {
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_eks_cluster.main.vpc_config[0].cluster_security_group_id]
}
}
resource "aws_db_instance" "main" {
identifier = "production-postgres"
engine = "postgres"
engine_version = "15.4"
instance_class = "db.t3.medium"
allocated_storage = 50
max_allocated_storage = 200
storage_encrypted = true
db_name = "appdb"
username = "dbadmin"
password = var.db_password # use Secrets Manager in production
db_subnet_group_name = aws_db_subnet_group.main.name
vpc_security_group_ids = [aws_security_group.database.id]
multi_az = true
publicly_accessible = false
backup_retention_period = 7
skip_final_snapshot = false
final_snapshot_identifier = "production-postgres-final"
lifecycle {
prevent_destroy = true
}
}Azure Database for PostgreSQL Flexible#
resource "azurerm_private_dns_zone" "postgres" {
name = "production.postgres.database.azure.com"
resource_group_name = azurerm_resource_group.main.name
}
resource "azurerm_private_dns_zone_virtual_network_link" "postgres" {
name = "postgres-vnet-link"
resource_group_name = azurerm_resource_group.main.name
private_dns_zone_name = azurerm_private_dns_zone.postgres.name
virtual_network_id = azurerm_virtual_network.main.id
}
resource "azurerm_postgresql_flexible_server" "main" {
name = "production-postgres"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
sku_name = "GP_Standard_D2s_v3"
version = "15"
storage_mb = 65536
delegated_subnet_id = azurerm_subnet.database.id
private_dns_zone_id = azurerm_private_dns_zone.postgres.id
administrator_login = "dbadmin"
administrator_password = var.db_password
backup_retention_days = 7
geo_redundant_backup_enabled = false
lifecycle {
prevent_destroy = true
}
depends_on = [azurerm_private_dns_zone_virtual_network_link.postgres]
}
resource "azurerm_postgresql_flexible_server_database" "main" {
name = "appdb"
server_id = azurerm_postgresql_flexible_server.main.id
charset = "UTF8"
collation = "en_US.utf8"
}GCP Cloud SQL PostgreSQL#
resource "google_compute_global_address" "private_ip" {
name = "private-ip-address"
purpose = "VPC_PEERING"
address_type = "INTERNAL"
prefix_length = 16
network = google_compute_network.main.id
}
resource "google_service_networking_connection" "private_vpc" {
network = google_compute_network.main.id
service = "servicenetworking.googleapis.com"
reserved_peering_ranges = [google_compute_global_address.private_ip.name]
}
resource "google_sql_database_instance" "main" {
name = "production-postgres"
database_version = "POSTGRES_15"
region = "us-central1"
settings {
tier = "db-custom-2-8192" # 2 vCPU, 8 GB RAM
disk_size = 50
disk_autoresize = true
availability_type = "REGIONAL" # HA with automatic failover
ip_configuration {
ipv4_enabled = false
private_network = google_compute_network.main.id
}
backup_configuration {
enabled = true
point_in_time_recovery_enabled = true
start_time = "03:00"
}
}
deletion_protection = true
depends_on = [google_service_networking_connection.private_vpc]
}
resource "google_sql_database" "main" {
name = "appdb"
instance = google_sql_database_instance.main.name
}
resource "google_sql_user" "admin" {
name = "dbadmin"
instance = google_sql_database_instance.main.name
password = var.db_password
}Cloud-Specific Complexity Comparison#
| Aspect | AWS | Azure | GCP |
|---|---|---|---|
| Networking setup | Explicit (IGW, NAT, routes, associations) | Moderate (NAT + delegation) | Moderate (Router + NAT, secondary ranges) |
| Kubernetes IAM | Complex (2 roles, 4+ policies, depends_on) | Simple (SystemAssigned identity) | Moderate (service account, 2 role bindings) |
| Kubernetes creation time | 10-15 min | 5-10 min | 5-10 min |
| Database private access | Security group + subnet group | Private DNS zone + delegation + link | VPC peering + private network |
| Total resources for 3-tier | ~25-30 | ~15-20 | ~18-22 |
| Total lines of Terraform | ~250-300 | ~150-200 | ~200-250 |
AWS requires the most explicit configuration (especially IAM). Azure is the most opinionated (less to configure but fewer options). GCP falls in the middle (secondary IP ranges and service networking add complexity but the rest is streamlined).
Agents: Choosing the Right Pattern#
When a user says “set up infrastructure on [cloud],” use these as starting templates. Adjust sizing, naming, and CIDR ranges to match the user’s requirements. The patterns above represent production-ready baselines — they include encryption, private networking, multi-AZ, backups, and lifecycle protection.
For dev/test environments, reduce: single-AZ, smaller instances, skip_final_snapshot = true, prevent_destroy removed. For production, keep all safety measures and add monitoring, alerting, and cost tags.