Advanced Terraform State Management

Remote Backends#

Every team beyond a single developer needs remote state. The three major backends:

S3 + DynamoDB (AWS):

terraform {
  backend "s3" {
    bucket         = "myorg-tfstate"
    key            = "prod/network/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

Azure Blob Storage:

terraform {
  backend "azurerm" {
    resource_group_name  = "tfstate-rg"
    storage_account_name = "myorgtfstate"
    container_name       = "tfstate"
    key                  = "prod/network/terraform.tfstate"
  }
}

Google Cloud Storage:

terraform {
  backend "gcs" {
    bucket = "myorg-tfstate"
    prefix = "prod/network"
  }
}

All three support locking natively (DynamoDB for S3, blob leases for Azure, object locking for GCS). Always enable encryption at rest and restrict access with IAM.

State Locking: Why It Matters#

Two concurrent terraform apply runs against the same state file will corrupt it. Both read the same starting state, compute independent plans, and write conflicting results. The state file ends up describing infrastructure that matches neither plan. Locking prevents this by acquiring an exclusive lock before any write operation.

When a lock gets stuck (crashed process, killed CI job), you see:

Error: Error locking state: Error acquiring the state lock
Lock Info:
  ID:        a1b2c3d4-e5f6-7890-abcd-ef1234567890
  Path:      myorg-tfstate/prod/terraform.tfstate
  Operation: OperationTypeApply
  Who:       runner@github-actions
  Created:   2026-02-20 14:32:01.234567 +0000 UTC

First verify the operation is truly dead (check CI, check who ran it). Then force-unlock:

terraform force-unlock a1b2c3d4-e5f6-7890-abcd-ef1234567890

Never force-unlock while another operation is genuinely running.

State Manipulation Commands#

List managed resources:

terraform state list
# aws_vpc.main
# aws_subnet.public["us-east-1a"]
# module.ecs.aws_ecs_cluster.this

Inspect a resource:

terraform state show aws_vpc.main

Move a resource (renaming or refactoring into modules):

# Rename a resource
terraform state mv aws_instance.web aws_instance.app

# Move into a module
terraform state mv aws_instance.app module.compute.aws_instance.app

# Move between modules
terraform state mv module.old.aws_rds_cluster.db module.new.aws_rds_cluster.db

Remove from state (Terraform stops managing it, but the real resource stays):

terraform state rm aws_s3_bucket.logs

This is useful when you want to hand a resource off to another Terraform configuration or manage it manually.

Handling State Drift#

State drift means real infrastructure no longer matches what the state file records. Detect drift with a refresh-only plan:

terraform plan -refresh-only

This shows what Terraform would update in the state file (not in your infrastructure) to match reality. Apply it to sync state:

terraform apply -refresh-only

After refreshing, a normal terraform plan reveals the difference between your .tf config and the now-accurate state. You then decide: update the config to match reality, or let Terraform revert the drift.

Importing Existing Infrastructure#

For resources created outside Terraform, use terraform import:

# Import a VPC
terraform import aws_vpc.main vpc-0abc123def456

# Import an RDS instance
terraform import aws_db_instance.main mydb-instance

# Import into a module
terraform import module.vpc.aws_vpc.this vpc-0abc123def456

Write the resource block first, run import, then iterate with terraform plan until it shows no changes.

Terraform 1.5+ supports declarative import blocks:

import {
  to = aws_vpc.main
  id = "vpc-0abc123def456"
}

Run terraform plan and Terraform shows what it would import. This is more auditable than the CLI command because the import intent lives in version control.

Moved Blocks for Refactoring#

When you rename a resource or move it into a module, the moved block tells Terraform it is the same resource under a new address, avoiding destroy-and-recreate:

moved {
  from = aws_instance.web
  to   = aws_instance.app
}

moved {
  from = aws_instance.app
  to   = module.compute.aws_instance.app
}

After a successful apply, the moved blocks can be removed. Keep them for one release cycle so all environments pick up the refactoring.

Workspaces vs Directory Structure#

Workspaces maintain separate state files from the same configuration:

terraform workspace new staging
terraform workspace new prod
terraform workspace select staging

Access the workspace name in config with terraform.workspace. Good when environments differ only in sizing. Bad when environments differ structurally.

Directory structure uses separate root modules per environment:

envs/
  staging/
    main.tf        # calls shared modules
    backend.tf     # key = "staging/terraform.tfstate"
    terraform.tfvars
  prod/
    main.tf
    backend.tf     # key = "prod/terraform.tfstate"
    terraform.tfvars
modules/
  vpc/
  ecs/

Each environment is fully independent. You can promote changes from staging to prod by copying the module version bump. More files, but zero risk of applying the wrong workspace to the wrong environment.

Emergency State Operations#

When things go very wrong, you may need direct state access:

# Download state to a local file
terraform state pull > emergency-backup.tfstate

# Edit the file (carefully) and push it back
terraform state push emergency-backup.tfstate

Use state pull/push only when normal commands fail. Always back up first, always verify with terraform plan after pushing. The state file is JSON – if you must hand-edit, change only the specific broken attribute.

# Backup before any emergency work
terraform state pull > backup-$(date +%Y%m%d-%H%M%S).tfstate