Multi-Account Cloud Architecture with Terraform#

Single-account cloud deployments work for learning and prototypes. Production systems need multiple accounts (AWS), subscriptions (Azure), or projects (GCP) for isolation — security boundaries, blast radius control, billing separation, and compliance requirements.

Terraform manages multi-account architectures well, but the patterns differ significantly from single-account work. Provider configuration, state isolation, cross-account references, and IAM trust relationships all need explicit design.

Why Multiple Accounts#

Reason Single Account Problem Multi-Account Solution
Blast radius Misconfigured IAM affects everything Damage limited to one account
Billing Cannot attribute costs to teams Per-account billing and budgets
Compliance PCI data mixed with dev workloads Separate accounts for regulated workloads
Service limits VPC limit of 5 per region shared Each account has its own limits
Access control Complex IAM policies to isolate teams Account boundary is the strongest isolation
Testing Dev resources can affect production Impossible for dev to touch prod resources

AWS Organizations#

Organization Structure#

Organization Root
├── Core OU
│   ├── Management Account (billing, org management)
│   ├── Security Account (GuardDuty, SecurityHub, audit logs)
│   └── Networking Account (Transit Gateway, shared VPCs)
├── Workload OU
│   ├── Production OU
│   │   ├── App-A Production Account
│   │   └── App-B Production Account
│   └── Non-Production OU
│       ├── App-A Development Account
│       └── App-A Staging Account
└── Sandbox OU
    └── Developer Sandbox Accounts

Terraform for AWS Organizations#

resource "aws_organizations_organization" "main" {
  feature_set = "ALL"

  enabled_policy_types = [
    "SERVICE_CONTROL_POLICY",
    "TAG_POLICY",
  ]
}

resource "aws_organizations_organizational_unit" "core" {
  name      = "Core"
  parent_id = aws_organizations_organization.main.roots[0].id
}

resource "aws_organizations_organizational_unit" "workloads" {
  name      = "Workloads"
  parent_id = aws_organizations_organization.main.roots[0].id
}

resource "aws_organizations_organizational_unit" "production" {
  name      = "Production"
  parent_id = aws_organizations_organizational_unit.workloads.id
}

# Create a workload account
resource "aws_organizations_account" "app_production" {
  name      = "app-a-production"
  email     = "aws+app-a-prod@example.com"
  parent_id = aws_organizations_organizational_unit.production.id
  role_name = "OrganizationAccountAccessRole"  # cross-account admin role

  lifecycle {
    prevent_destroy = true  # accounts cannot be easily recreated
  }
}

Service Control Policies (SCPs)#

SCPs set permission boundaries for entire OUs:

resource "aws_organizations_policy" "deny_root_actions" {
  name    = "deny-root-user-actions"
  content = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Sid       = "DenyRootUser"
      Effect    = "Deny"
      Action    = "*"
      Resource  = "*"
      Condition = {
        StringLike = {
          "aws:PrincipalArn" = "arn:aws:iam::*:root"
        }
      }
    }]
  })
}

resource "aws_organizations_policy" "deny_region" {
  name    = "restrict-regions"
  content = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Sid       = "DenyNonApprovedRegions"
      Effect    = "Deny"
      NotAction = [
        "iam:*", "sts:*", "organizations:*",
        "support:*", "budgets:*",
      ]
      Resource = "*"
      Condition = {
        StringNotEquals = {
          "aws:RequestedRegion" = ["us-east-1", "us-west-2", "eu-west-1"]
        }
      }
    }]
  })
}

resource "aws_organizations_policy_attachment" "deny_region_workloads" {
  policy_id = aws_organizations_policy.deny_region.id
  target_id = aws_organizations_organizational_unit.workloads.id
}

Cross-Account Provider Aliasing#

The key pattern for multi-account Terraform: use assume_role in provider blocks to operate in different accounts from a single Terraform configuration.

# Default provider — management account
provider "aws" {
  region = "us-east-1"
}

# Provider for the networking account
provider "aws" {
  alias  = "networking"
  region = "us-east-1"
  assume_role {
    role_arn = "arn:aws:iam::${aws_organizations_account.networking.id}:role/OrganizationAccountAccessRole"
  }
}

# Provider for the production account
provider "aws" {
  alias  = "production"
  region = "us-east-1"
  assume_role {
    role_arn = "arn:aws:iam::${aws_organizations_account.app_production.id}:role/OrganizationAccountAccessRole"
  }
}

# Create a VPC in the networking account
resource "aws_vpc" "shared" {
  provider   = aws.networking
  cidr_block = "10.0.0.0/16"
  tags       = { Name = "shared-vpc" }
}

# Create resources in the production account
resource "aws_iam_role" "app_role" {
  provider = aws.production
  name     = "app-execution-role"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action    = "sts:AssumeRole"
      Effect    = "Allow"
      Principal = { Service = "ecs-tasks.amazonaws.com" }
    }]
  })
}

Gotcha: The role assumed must exist in the target account. OrganizationAccountAccessRole is created automatically when you create an account through AWS Organizations, but it gives full admin access. Create least-privilege roles for Terraform.

Gotcha: Provider aliasing means one Terraform state file references multiple accounts. If that state is compromised, all accounts are exposed. Consider separate state files per account.

Azure Management Groups#

Hierarchy Structure#

Tenant Root Group
├── Platform
│   ├── Identity (Azure AD, DNS)
│   ├── Management (monitoring, automation)
│   └── Connectivity (hub VNETs, ExpressRoute, Firewall)
├── Landing Zones
│   ├── Production
│   │   ├── App-A-Prod Subscription
│   │   └── App-B-Prod Subscription
│   └── Non-Production
│       ├── App-A-Dev Subscription
│       └── App-A-Staging Subscription
└── Sandbox
    └── Developer Sandboxes

Terraform for Management Groups#

resource "azurerm_management_group" "platform" {
  display_name = "Platform"
}

resource "azurerm_management_group" "landing_zones" {
  display_name = "Landing Zones"
}

resource "azurerm_management_group" "production" {
  display_name               = "Production"
  parent_management_group_id = azurerm_management_group.landing_zones.id
}

# Policy assignment at the management group level
resource "azurerm_management_group_policy_assignment" "require_tags" {
  name                 = "require-cost-center-tag"
  management_group_id  = azurerm_management_group.landing_zones.id
  policy_definition_id = "/providers/Microsoft.Authorization/policyDefinitions/1e30110a-5ceb-460c-a204-c1c3969c6d62"

  parameters = jsonencode({
    tagName = { value = "CostCenter" }
  })
}

Multi-Subscription Provider Configuration#

# Default provider — platform subscription
provider "azurerm" {
  features {}
  subscription_id = var.platform_subscription_id
}

# Provider for each workload subscription
provider "azurerm" {
  alias           = "app_production"
  features {}
  subscription_id = var.app_production_subscription_id
}

provider "azurerm" {
  alias           = "app_dev"
  features {}
  subscription_id = var.app_dev_subscription_id
}

# Hub VNET in platform subscription
resource "azurerm_virtual_network" "hub" {
  provider            = azurerm
  name                = "hub-vnet"
  resource_group_name = azurerm_resource_group.connectivity.name
  location            = "eastus"
  address_space       = ["10.0.0.0/16"]
}

# Spoke VNET in production subscription
resource "azurerm_virtual_network" "spoke_prod" {
  provider            = azurerm.app_production
  name                = "app-prod-vnet"
  resource_group_name = azurerm_resource_group.prod_networking.name
  location            = "eastus"
  address_space       = ["10.1.0.0/16"]
}

# VNET peering: hub to spoke
resource "azurerm_virtual_network_peering" "hub_to_prod" {
  provider                  = azurerm
  name                      = "hub-to-app-prod"
  resource_group_name       = azurerm_resource_group.connectivity.name
  virtual_network_name      = azurerm_virtual_network.hub.name
  remote_virtual_network_id = azurerm_virtual_network.spoke_prod.id
  allow_forwarded_traffic   = true
}

Gotcha: Azure VNET peering must be created from both sides. You need a azurerm_virtual_network_peering resource in both the hub and spoke subscriptions.

GCP Organizations#

Hierarchy Structure#

Organization (example.com)
├── Folders
│   ├── Platform
│   │   ├── networking-prod (Shared VPC host)
│   │   ├── security-prod (audit logs, SCC)
│   │   └── monitoring-prod (Cloud Monitoring workspace)
│   ├── Production
│   │   ├── app-a-prod
│   │   └── app-b-prod
│   ├── Non-Production
│   │   ├── app-a-dev
│   │   └── app-a-staging
│   └── Sandbox
│       └── developer sandboxes

Terraform for GCP Organization#

resource "google_folder" "platform" {
  display_name = "Platform"
  parent       = "organizations/${var.org_id}"
}

resource "google_folder" "production" {
  display_name = "Production"
  parent       = "organizations/${var.org_id}"
}

# Create a project in the production folder
resource "google_project" "app_prod" {
  name            = "App A Production"
  project_id      = "myorg-app-a-prod"
  folder_id       = google_folder.production.name
  billing_account = var.billing_account_id

  labels = {
    environment = "production"
    team        = "app-a"
  }
}

# Enable required APIs in the new project
resource "google_project_service" "app_prod_apis" {
  for_each = toset([
    "compute.googleapis.com",
    "container.googleapis.com",
    "sqladmin.googleapis.com",
  ])

  project            = google_project.app_prod.project_id
  service            = each.value
  disable_on_destroy = false
}

Organization Policies#

# Restrict VM external IPs at the organization level
resource "google_organization_policy" "deny_external_ip" {
  org_id     = var.org_id
  constraint = "compute.vmExternalIpAccess"

  list_policy {
    deny {
      all = true
    }
  }
}

# Allow specific regions only
resource "google_organization_policy" "allowed_locations" {
  org_id     = var.org_id
  constraint = "gcp.resourceLocations"

  list_policy {
    allow {
      values = ["in:us-locations", "in:eu-locations"]
    }
  }
}

Shared VPC Pattern#

GCP’s Shared VPC lets a host project own the network and service projects use it:

# Host project owns the VPC
resource "google_compute_shared_vpc_host_project" "host" {
  project = google_project.networking.project_id
}

# Service project uses the shared VPC
resource "google_compute_shared_vpc_service_project" "app_prod" {
  host_project    = google_project.networking.project_id
  service_project = google_project.app_prod.project_id

  depends_on = [google_compute_shared_vpc_host_project.host]
}

State Isolation Strategy#

One State File Per Account#

The safest pattern: each account/subscription/project has its own Terraform root module and state file.

terraform/
├── organization/          # org structure, SCPs, policies
│   ├── main.tf
│   └── backend.tf         # state: s3://tf-state/organization/
├── platform/
│   ├── networking/        # shared VPCs, Transit Gateway
│   │   └── backend.tf     # state: s3://tf-state/platform/networking/
│   └── security/          # GuardDuty, SecurityHub
│       └── backend.tf     # state: s3://tf-state/platform/security/
├── app-a/
│   ├── production/        # app-a prod account resources
│   │   └── backend.tf     # state: s3://tf-state/app-a/production/
│   └── development/
│       └── backend.tf     # state: s3://tf-state/app-a/development/

Advantages:

  • Compromising one state file does not expose other accounts
  • State lock contention is per-account (no blocking between teams)
  • Each team can apply independently

Cross-account references use terraform_remote_state:

# In app-a/production/main.tf — read networking outputs
data "terraform_remote_state" "networking" {
  backend = "s3"
  config = {
    bucket = "tf-state"
    key    = "platform/networking/terraform.tfstate"
    region = "us-east-1"
  }
}

resource "aws_instance" "app" {
  subnet_id = data.terraform_remote_state.networking.outputs.private_subnet_ids[0]
  # ...
}

Single State with Provider Aliases (Small Scale)#

For small organizations (2-3 accounts), a single Terraform config with provider aliases is simpler:

# All accounts in one config — simpler but less isolated
provider "aws" { region = "us-east-1" }
provider "aws" { alias = "prod"; assume_role { role_arn = var.prod_role } }
provider "aws" { alias = "dev";  assume_role { role_arn = var.dev_role } }

When to use: 3 or fewer accounts, one person managing infrastructure, no compliance requirements for state isolation.

When to stop using: The moment a second team needs to apply independently, or when compliance requires separate state access controls.

Landing Zone Patterns#

A landing zone is the baseline configuration applied to every new account/subscription/project. It includes networking, IAM, logging, and security baselines.

Landing Zone Checklist#

Every new account should have:

Component AWS Azure GCP
Networking VPC with private subnets VNET peered to hub Shared VPC service project
IAM baseline Break-glass role, CI/CD role Managed identity for automation Service account for Terraform
Logging CloudTrail → central S3 Activity Log → central Log Analytics Audit Log → central BigQuery
Security GuardDuty enabled, SecurityHub Defender for Cloud Security Command Center
Cost controls Budget alarm, cost allocation tags Budget alert, resource tags Budget alert, labels
DNS Route53 subdomain delegation Private DNS zone linked to hub Cloud DNS zone
Encryption Default EBS encryption, KMS key Customer-managed key CMEK for sensitive services

Terraform Module for Landing Zone#

module "account_baseline" {
  source = "./modules/account-baseline"

  account_id   = aws_organizations_account.new_account.id
  account_name = "app-b-production"
  environment  = "production"
  vpc_cidr     = "10.2.0.0/16"

  providers = {
    aws = aws.new_account
  }
}

The module creates the VPC, IAM roles, CloudTrail, GuardDuty enablement, budget alerts, and default encryption — everything needed before the first workload deploys.

Common Gotchas#

Gotcha Symptom Fix
Account email reuse Cannot create account — email already used Each AWS account needs a unique email (use + aliases)
SCP blocks Terraform AccessDenied on resources that should work Check SCPs — they override IAM policies
Cross-account assume role fails AccessDenied: User is not authorized to perform sts:AssumeRole Trust policy on target role must allow source account/role
Provider alias forgotten Resources created in wrong account Always specify provider = aws.alias for cross-account resources
State bucket in wrong account State accessible to the wrong teams Put state bucket in the management/security account
VNET peering one-sided Peering shows Initiated not Connected Create peering from both sides
GCP API not enabled in new project API not enabled on first resource Add google_project_service for all needed APIs
Organization policy blocks resource Cryptic error about constraint violation Check org policies at folder and org level