Diagnosing Common Terraform Problems

Stuck State Lock#

A CI job was cancelled, a laptop lost network, or a process crashed mid-apply. Terraform refuses to run:

Error acquiring the state lock
Lock Info:
  ID:        f8e7d6c5-b4a3-2109-8765-43210fedcba9
  Operation: OperationTypeApply
  Who:       deploy@ci-runner
  Created:   2026-02-20 09:15:22 +0000 UTC

Verify the lock holder is truly dead. Check CI job status, then:

terraform force-unlock f8e7d6c5-b4a3-2109-8765-43210fedcba9

If the lock was from a crashed apply, the state may be partially updated. Run terraform plan immediately after unlocking to see the current situation.

Provider Authentication Failures#

Error: error configuring S3 Backend: no valid credential sources found

Check in order: (1) environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, or AWS_PROFILE), (2) shared credentials file (~/.aws/credentials), (3) instance role or OIDC token. Common traps:

Running in CI without OIDC configured or credentials injected
AWS_PROFILE set to a profile that does not exist on the CI runner
MFA-protected profiles that cannot work non-interactively
Expired SSO session: run aws sso login --profile your-profile

For Azure: check ARM_CLIENT_ID, ARM_CLIENT_SECRET, ARM_TENANT_ID, ARM_SUBSCRIPTION_ID. For GCP: check GOOGLE_CREDENTIALS or GOOGLE_APPLICATION_CREDENTIALS path.

Dependency Cycles#

Error: Cycle: aws_security_group.a, aws_security_group.b

Two resources reference each other. Security groups are the classic case. Fix by splitting into separate resources and rules:

resource "aws_security_group" "a" {
  name   = "sg-a"
  vpc_id = aws_vpc.main.id
}

resource "aws_security_group" "b" {
  name   = "sg-b"
  vpc_id = aws_vpc.main.id
}

resource "aws_security_group_rule" "a_from_b" {
  type                     = "ingress"
  security_group_id        = aws_security_group.a.id
  source_security_group_id = aws_security_group.b.id
  from_port                = 443
  to_port                  = 443
  protocol                 = "tcp"
}

resource "aws_security_group_rule" "b_from_a" {
  type                     = "ingress"
  security_group_id        = aws_security_group.b.id
  source_security_group_id = aws_security_group.a.id
  from_port                = 5432
  to_port                  = 5432
  protocol                 = "tcp"
}

Separate rule resources break the cycle because Terraform can create both groups first, then both rules.

Plan Shows Unexpected Changes#

When terraform plan shows changes you did not make, investigate the cause:

State drift. Someone modified the resource outside Terraform. Run terraform plan -refresh-only to see what drifted, then decide whether to update your config or let Terraform revert the change.

Provider upgrade changed defaults. A provider update may interpret attributes differently. Pin provider versions and review changelogs before upgrading.

Lifecycle blocks prevent unwanted changes:

resource "aws_instance" "app" {
  ami           = data.aws_ami.ubuntu.id
  instance_type = "t3.medium"

  lifecycle {
    ignore_changes = [ami]   # AMI updates handled by a separate process
  }
}

resource "aws_db_instance" "main" {
  # ...
  lifecycle {
    prevent_destroy = true   # Block accidental deletion
  }
}

ignore_changes skips specific attributes during planning. prevent_destroy errors if a plan would destroy the resource.

“Resource Already Exists” on Apply#

Error: error creating S3 Bucket (my-bucket): BucketAlreadyOwnedByYou

The resource exists in AWS but not in Terraform state. Two options:

Import it (Terraform starts managing it):

terraform import aws_s3_bucket.logs my-bucket

Remove the conflict (if it is a naming collision from a previous partial apply):

# If state has a stale reference
terraform state rm aws_s3_bucket.logs
# Then plan/apply again

Slow Plans#

Large configurations take minutes to plan because Terraform refreshes every resource via API calls.

Target specific resources during development:

terraform plan -target=module.ecs
terraform plan -target=aws_instance.web

Do not use -target in CI or production applies. It skips dependency checks and can leave state inconsistent.

Increase parallelism (default is 10):

terraform apply -parallelism=30

Split large configs into smaller root modules. If networking, compute, and databases are independent blast radii, they should be separate state files.

Version Constraint Conflicts#

Error: Failed to query available provider packages
Could not retrieve the list of available versions for provider hashicorp/aws:
locked provider registry.hashicorp.com/hashicorp/aws 4.67.0 does not match
configured version constraint ~> 5.0

The .terraform.lock.hcl file records exact provider versions. If you update version constraints in your config, delete the lock file and re-init:

rm .terraform.lock.hcl
terraform init -upgrade

Then commit the new lock file. In a team, coordinate lock file updates to avoid merge conflicts.

Debugging with TF_LOG#

When error messages are not enough, enable debug logging:

# Levels: TRACE, DEBUG, INFO, WARN, ERROR
TF_LOG=DEBUG terraform plan

# Log to file instead of stderr
TF_LOG=TRACE TF_LOG_PATH=terraform.log terraform plan

# Provider-specific logging
TF_LOG_PROVIDER=TRACE terraform plan

TRACE is extremely verbose but shows the exact API calls Terraform makes. Useful for diagnosing “why did the provider send this request?” problems.

Handling API Rate Limits#

With large configurations, Terraform can hit provider API rate limits:

Error: error reading S3 Bucket: SlowDown: Please reduce your request rate

The AWS provider retries automatically. Reduce parallelism as a blunt fix: terraform apply -parallelism=5.

Recovering from Partial Applies#

If terraform apply fails midway, the state file accurately reflects what was created. Run terraform plan to see remaining work and terraform apply again – Terraform picks up where it left off.

If the partial state is badly broken (rare, usually provider bugs), restore from backup:

terraform state pull > broken.tfstate
# Restore from S3 versioning or your backup
aws s3api list-object-versions --bucket myorg-tfstate --prefix prod/terraform.tfstate
aws s3api get-object --bucket myorg-tfstate --key prod/terraform.tfstate \
  --version-id "previous-version-id" restored.tfstate
terraform state push restored.tfstate