Multi-Account Cloud Architecture with Terraform#
Single-account cloud deployments work for learning and prototypes. Production systems need multiple accounts (AWS), subscriptions (Azure), or projects (GCP) for isolation — security boundaries, blast radius control, billing separation, and compliance requirements.
Terraform manages multi-account architectures well, but the patterns differ significantly from single-account work. Provider configuration, state isolation, cross-account references, and IAM trust relationships all need explicit design.
Why Multiple Accounts#
| Reason | Single Account Problem | Multi-Account Solution |
|---|---|---|
| Blast radius | Misconfigured IAM affects everything | Damage limited to one account |
| Billing | Cannot attribute costs to teams | Per-account billing and budgets |
| Compliance | PCI data mixed with dev workloads | Separate accounts for regulated workloads |
| Service limits | VPC limit of 5 per region shared | Each account has its own limits |
| Access control | Complex IAM policies to isolate teams | Account boundary is the strongest isolation |
| Testing | Dev resources can affect production | Impossible for dev to touch prod resources |
AWS Organizations#
Organization Structure#
Organization Root
├── Core OU
│ ├── Management Account (billing, org management)
│ ├── Security Account (GuardDuty, SecurityHub, audit logs)
│ └── Networking Account (Transit Gateway, shared VPCs)
├── Workload OU
│ ├── Production OU
│ │ ├── App-A Production Account
│ │ └── App-B Production Account
│ └── Non-Production OU
│ ├── App-A Development Account
│ └── App-A Staging Account
└── Sandbox OU
└── Developer Sandbox AccountsTerraform for AWS Organizations#
resource "aws_organizations_organization" "main" {
feature_set = "ALL"
enabled_policy_types = [
"SERVICE_CONTROL_POLICY",
"TAG_POLICY",
]
}
resource "aws_organizations_organizational_unit" "core" {
name = "Core"
parent_id = aws_organizations_organization.main.roots[0].id
}
resource "aws_organizations_organizational_unit" "workloads" {
name = "Workloads"
parent_id = aws_organizations_organization.main.roots[0].id
}
resource "aws_organizations_organizational_unit" "production" {
name = "Production"
parent_id = aws_organizations_organizational_unit.workloads.id
}
# Create a workload account
resource "aws_organizations_account" "app_production" {
name = "app-a-production"
email = "aws+app-a-prod@example.com"
parent_id = aws_organizations_organizational_unit.production.id
role_name = "OrganizationAccountAccessRole" # cross-account admin role
lifecycle {
prevent_destroy = true # accounts cannot be easily recreated
}
}Service Control Policies (SCPs)#
SCPs set permission boundaries for entire OUs:
resource "aws_organizations_policy" "deny_root_actions" {
name = "deny-root-user-actions"
content = jsonencode({
Version = "2012-10-17"
Statement = [{
Sid = "DenyRootUser"
Effect = "Deny"
Action = "*"
Resource = "*"
Condition = {
StringLike = {
"aws:PrincipalArn" = "arn:aws:iam::*:root"
}
}
}]
})
}
resource "aws_organizations_policy" "deny_region" {
name = "restrict-regions"
content = jsonencode({
Version = "2012-10-17"
Statement = [{
Sid = "DenyNonApprovedRegions"
Effect = "Deny"
NotAction = [
"iam:*", "sts:*", "organizations:*",
"support:*", "budgets:*",
]
Resource = "*"
Condition = {
StringNotEquals = {
"aws:RequestedRegion" = ["us-east-1", "us-west-2", "eu-west-1"]
}
}
}]
})
}
resource "aws_organizations_policy_attachment" "deny_region_workloads" {
policy_id = aws_organizations_policy.deny_region.id
target_id = aws_organizations_organizational_unit.workloads.id
}Cross-Account Provider Aliasing#
The key pattern for multi-account Terraform: use assume_role in provider blocks to operate in different accounts from a single Terraform configuration.
# Default provider — management account
provider "aws" {
region = "us-east-1"
}
# Provider for the networking account
provider "aws" {
alias = "networking"
region = "us-east-1"
assume_role {
role_arn = "arn:aws:iam::${aws_organizations_account.networking.id}:role/OrganizationAccountAccessRole"
}
}
# Provider for the production account
provider "aws" {
alias = "production"
region = "us-east-1"
assume_role {
role_arn = "arn:aws:iam::${aws_organizations_account.app_production.id}:role/OrganizationAccountAccessRole"
}
}
# Create a VPC in the networking account
resource "aws_vpc" "shared" {
provider = aws.networking
cidr_block = "10.0.0.0/16"
tags = { Name = "shared-vpc" }
}
# Create resources in the production account
resource "aws_iam_role" "app_role" {
provider = aws.production
name = "app-execution-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = { Service = "ecs-tasks.amazonaws.com" }
}]
})
}Gotcha: The role assumed must exist in the target account. OrganizationAccountAccessRole is created automatically when you create an account through AWS Organizations, but it gives full admin access. Create least-privilege roles for Terraform.
Gotcha: Provider aliasing means one Terraform state file references multiple accounts. If that state is compromised, all accounts are exposed. Consider separate state files per account.
Azure Management Groups#
Hierarchy Structure#
Tenant Root Group
├── Platform
│ ├── Identity (Azure AD, DNS)
│ ├── Management (monitoring, automation)
│ └── Connectivity (hub VNETs, ExpressRoute, Firewall)
├── Landing Zones
│ ├── Production
│ │ ├── App-A-Prod Subscription
│ │ └── App-B-Prod Subscription
│ └── Non-Production
│ ├── App-A-Dev Subscription
│ └── App-A-Staging Subscription
└── Sandbox
└── Developer SandboxesTerraform for Management Groups#
resource "azurerm_management_group" "platform" {
display_name = "Platform"
}
resource "azurerm_management_group" "landing_zones" {
display_name = "Landing Zones"
}
resource "azurerm_management_group" "production" {
display_name = "Production"
parent_management_group_id = azurerm_management_group.landing_zones.id
}
# Policy assignment at the management group level
resource "azurerm_management_group_policy_assignment" "require_tags" {
name = "require-cost-center-tag"
management_group_id = azurerm_management_group.landing_zones.id
policy_definition_id = "/providers/Microsoft.Authorization/policyDefinitions/1e30110a-5ceb-460c-a204-c1c3969c6d62"
parameters = jsonencode({
tagName = { value = "CostCenter" }
})
}Multi-Subscription Provider Configuration#
# Default provider — platform subscription
provider "azurerm" {
features {}
subscription_id = var.platform_subscription_id
}
# Provider for each workload subscription
provider "azurerm" {
alias = "app_production"
features {}
subscription_id = var.app_production_subscription_id
}
provider "azurerm" {
alias = "app_dev"
features {}
subscription_id = var.app_dev_subscription_id
}
# Hub VNET in platform subscription
resource "azurerm_virtual_network" "hub" {
provider = azurerm
name = "hub-vnet"
resource_group_name = azurerm_resource_group.connectivity.name
location = "eastus"
address_space = ["10.0.0.0/16"]
}
# Spoke VNET in production subscription
resource "azurerm_virtual_network" "spoke_prod" {
provider = azurerm.app_production
name = "app-prod-vnet"
resource_group_name = azurerm_resource_group.prod_networking.name
location = "eastus"
address_space = ["10.1.0.0/16"]
}
# VNET peering: hub to spoke
resource "azurerm_virtual_network_peering" "hub_to_prod" {
provider = azurerm
name = "hub-to-app-prod"
resource_group_name = azurerm_resource_group.connectivity.name
virtual_network_name = azurerm_virtual_network.hub.name
remote_virtual_network_id = azurerm_virtual_network.spoke_prod.id
allow_forwarded_traffic = true
}Gotcha: Azure VNET peering must be created from both sides. You need a azurerm_virtual_network_peering resource in both the hub and spoke subscriptions.
GCP Organizations#
Hierarchy Structure#
Organization (example.com)
├── Folders
│ ├── Platform
│ │ ├── networking-prod (Shared VPC host)
│ │ ├── security-prod (audit logs, SCC)
│ │ └── monitoring-prod (Cloud Monitoring workspace)
│ ├── Production
│ │ ├── app-a-prod
│ │ └── app-b-prod
│ ├── Non-Production
│ │ ├── app-a-dev
│ │ └── app-a-staging
│ └── Sandbox
│ └── developer sandboxesTerraform for GCP Organization#
resource "google_folder" "platform" {
display_name = "Platform"
parent = "organizations/${var.org_id}"
}
resource "google_folder" "production" {
display_name = "Production"
parent = "organizations/${var.org_id}"
}
# Create a project in the production folder
resource "google_project" "app_prod" {
name = "App A Production"
project_id = "myorg-app-a-prod"
folder_id = google_folder.production.name
billing_account = var.billing_account_id
labels = {
environment = "production"
team = "app-a"
}
}
# Enable required APIs in the new project
resource "google_project_service" "app_prod_apis" {
for_each = toset([
"compute.googleapis.com",
"container.googleapis.com",
"sqladmin.googleapis.com",
])
project = google_project.app_prod.project_id
service = each.value
disable_on_destroy = false
}Organization Policies#
# Restrict VM external IPs at the organization level
resource "google_organization_policy" "deny_external_ip" {
org_id = var.org_id
constraint = "compute.vmExternalIpAccess"
list_policy {
deny {
all = true
}
}
}
# Allow specific regions only
resource "google_organization_policy" "allowed_locations" {
org_id = var.org_id
constraint = "gcp.resourceLocations"
list_policy {
allow {
values = ["in:us-locations", "in:eu-locations"]
}
}
}Shared VPC Pattern#
GCP’s Shared VPC lets a host project own the network and service projects use it:
# Host project owns the VPC
resource "google_compute_shared_vpc_host_project" "host" {
project = google_project.networking.project_id
}
# Service project uses the shared VPC
resource "google_compute_shared_vpc_service_project" "app_prod" {
host_project = google_project.networking.project_id
service_project = google_project.app_prod.project_id
depends_on = [google_compute_shared_vpc_host_project.host]
}State Isolation Strategy#
One State File Per Account#
The safest pattern: each account/subscription/project has its own Terraform root module and state file.
terraform/
├── organization/ # org structure, SCPs, policies
│ ├── main.tf
│ └── backend.tf # state: s3://tf-state/organization/
├── platform/
│ ├── networking/ # shared VPCs, Transit Gateway
│ │ └── backend.tf # state: s3://tf-state/platform/networking/
│ └── security/ # GuardDuty, SecurityHub
│ └── backend.tf # state: s3://tf-state/platform/security/
├── app-a/
│ ├── production/ # app-a prod account resources
│ │ └── backend.tf # state: s3://tf-state/app-a/production/
│ └── development/
│ └── backend.tf # state: s3://tf-state/app-a/development/Advantages:
- Compromising one state file does not expose other accounts
- State lock contention is per-account (no blocking between teams)
- Each team can apply independently
Cross-account references use terraform_remote_state:
# In app-a/production/main.tf — read networking outputs
data "terraform_remote_state" "networking" {
backend = "s3"
config = {
bucket = "tf-state"
key = "platform/networking/terraform.tfstate"
region = "us-east-1"
}
}
resource "aws_instance" "app" {
subnet_id = data.terraform_remote_state.networking.outputs.private_subnet_ids[0]
# ...
}Single State with Provider Aliases (Small Scale)#
For small organizations (2-3 accounts), a single Terraform config with provider aliases is simpler:
# All accounts in one config — simpler but less isolated
provider "aws" { region = "us-east-1" }
provider "aws" { alias = "prod"; assume_role { role_arn = var.prod_role } }
provider "aws" { alias = "dev"; assume_role { role_arn = var.dev_role } }When to use: 3 or fewer accounts, one person managing infrastructure, no compliance requirements for state isolation.
When to stop using: The moment a second team needs to apply independently, or when compliance requires separate state access controls.
Landing Zone Patterns#
A landing zone is the baseline configuration applied to every new account/subscription/project. It includes networking, IAM, logging, and security baselines.
Landing Zone Checklist#
Every new account should have:
| Component | AWS | Azure | GCP |
|---|---|---|---|
| Networking | VPC with private subnets | VNET peered to hub | Shared VPC service project |
| IAM baseline | Break-glass role, CI/CD role | Managed identity for automation | Service account for Terraform |
| Logging | CloudTrail → central S3 | Activity Log → central Log Analytics | Audit Log → central BigQuery |
| Security | GuardDuty enabled, SecurityHub | Defender for Cloud | Security Command Center |
| Cost controls | Budget alarm, cost allocation tags | Budget alert, resource tags | Budget alert, labels |
| DNS | Route53 subdomain delegation | Private DNS zone linked to hub | Cloud DNS zone |
| Encryption | Default EBS encryption, KMS key | Customer-managed key | CMEK for sensitive services |
Terraform Module for Landing Zone#
module "account_baseline" {
source = "./modules/account-baseline"
account_id = aws_organizations_account.new_account.id
account_name = "app-b-production"
environment = "production"
vpc_cidr = "10.2.0.0/16"
providers = {
aws = aws.new_account
}
}The module creates the VPC, IAM roles, CloudTrail, GuardDuty enablement, budget alerts, and default encryption — everything needed before the first workload deploys.
Common Gotchas#
| Gotcha | Symptom | Fix |
|---|---|---|
| Account email reuse | Cannot create account — email already used | Each AWS account needs a unique email (use + aliases) |
| SCP blocks Terraform | AccessDenied on resources that should work |
Check SCPs — they override IAM policies |
| Cross-account assume role fails | AccessDenied: User is not authorized to perform sts:AssumeRole |
Trust policy on target role must allow source account/role |
| Provider alias forgotten | Resources created in wrong account | Always specify provider = aws.alias for cross-account resources |
| State bucket in wrong account | State accessible to the wrong teams | Put state bucket in the management/security account |
| VNET peering one-sided | Peering shows Initiated not Connected |
Create peering from both sides |
| GCP API not enabled in new project | API not enabled on first resource |
Add google_project_service for all needed APIs |
| Organization policy blocks resource | Cryptic error about constraint violation | Check org policies at folder and org level |