Stuck State Lock#
A CI job was cancelled, a laptop lost network, or a process crashed mid-apply. Terraform refuses to run:
Error acquiring the state lock
Lock Info:
ID: f8e7d6c5-b4a3-2109-8765-43210fedcba9
Operation: OperationTypeApply
Who: deploy@ci-runner
Created: 2026-02-20 09:15:22 +0000 UTCVerify the lock holder is truly dead. Check CI job status, then:
terraform force-unlock f8e7d6c5-b4a3-2109-8765-43210fedcba9If the lock was from a crashed apply, the state may be partially updated. Run terraform plan immediately after unlocking to see the current situation.
Provider Authentication Failures#
Error: error configuring S3 Backend: no valid credential sources foundCheck in order: (1) environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, or AWS_PROFILE), (2) shared credentials file (~/.aws/credentials), (3) instance role or OIDC token. Common traps:
- Running in CI without OIDC configured or credentials injected
AWS_PROFILEset to a profile that does not exist on the CI runner- MFA-protected profiles that cannot work non-interactively
- Expired SSO session: run
aws sso login --profile your-profile
For Azure: check ARM_CLIENT_ID, ARM_CLIENT_SECRET, ARM_TENANT_ID, ARM_SUBSCRIPTION_ID. For GCP: check GOOGLE_CREDENTIALS or GOOGLE_APPLICATION_CREDENTIALS path.
Dependency Cycles#
Error: Cycle: aws_security_group.a, aws_security_group.bTwo resources reference each other. Security groups are the classic case. Fix by splitting into separate resources and rules:
resource "aws_security_group" "a" {
name = "sg-a"
vpc_id = aws_vpc.main.id
}
resource "aws_security_group" "b" {
name = "sg-b"
vpc_id = aws_vpc.main.id
}
resource "aws_security_group_rule" "a_from_b" {
type = "ingress"
security_group_id = aws_security_group.a.id
source_security_group_id = aws_security_group.b.id
from_port = 443
to_port = 443
protocol = "tcp"
}
resource "aws_security_group_rule" "b_from_a" {
type = "ingress"
security_group_id = aws_security_group.b.id
source_security_group_id = aws_security_group.a.id
from_port = 5432
to_port = 5432
protocol = "tcp"
}Separate rule resources break the cycle because Terraform can create both groups first, then both rules.
Plan Shows Unexpected Changes#
When terraform plan shows changes you did not make, investigate the cause:
State drift. Someone modified the resource outside Terraform. Run terraform plan -refresh-only to see what drifted, then decide whether to update your config or let Terraform revert the change.
Provider upgrade changed defaults. A provider update may interpret attributes differently. Pin provider versions and review changelogs before upgrading.
Lifecycle blocks prevent unwanted changes:
resource "aws_instance" "app" {
ami = data.aws_ami.ubuntu.id
instance_type = "t3.medium"
lifecycle {
ignore_changes = [ami] # AMI updates handled by a separate process
}
}
resource "aws_db_instance" "main" {
# ...
lifecycle {
prevent_destroy = true # Block accidental deletion
}
}ignore_changes skips specific attributes during planning. prevent_destroy errors if a plan would destroy the resource.
“Resource Already Exists” on Apply#
Error: error creating S3 Bucket (my-bucket): BucketAlreadyOwnedByYouThe resource exists in AWS but not in Terraform state. Two options:
Import it (Terraform starts managing it):
terraform import aws_s3_bucket.logs my-bucketRemove the conflict (if it is a naming collision from a previous partial apply):
# If state has a stale reference
terraform state rm aws_s3_bucket.logs
# Then plan/apply againSlow Plans#
Large configurations take minutes to plan because Terraform refreshes every resource via API calls.
Target specific resources during development:
terraform plan -target=module.ecs
terraform plan -target=aws_instance.webDo not use -target in CI or production applies. It skips dependency checks and can leave state inconsistent.
Increase parallelism (default is 10):
terraform apply -parallelism=30Split large configs into smaller root modules. If networking, compute, and databases are independent blast radii, they should be separate state files.
Version Constraint Conflicts#
Error: Failed to query available provider packages
Could not retrieve the list of available versions for provider hashicorp/aws:
locked provider registry.hashicorp.com/hashicorp/aws 4.67.0 does not match
configured version constraint ~> 5.0The .terraform.lock.hcl file records exact provider versions. If you update version constraints in your config, delete the lock file and re-init:
rm .terraform.lock.hcl
terraform init -upgradeThen commit the new lock file. In a team, coordinate lock file updates to avoid merge conflicts.
Debugging with TF_LOG#
When error messages are not enough, enable debug logging:
# Levels: TRACE, DEBUG, INFO, WARN, ERROR
TF_LOG=DEBUG terraform plan
# Log to file instead of stderr
TF_LOG=TRACE TF_LOG_PATH=terraform.log terraform plan
# Provider-specific logging
TF_LOG_PROVIDER=TRACE terraform planTRACE is extremely verbose but shows the exact API calls Terraform makes. Useful for diagnosing “why did the provider send this request?” problems.
Handling API Rate Limits#
With large configurations, Terraform can hit provider API rate limits:
Error: error reading S3 Bucket: SlowDown: Please reduce your request rateThe AWS provider retries automatically. Reduce parallelism as a blunt fix: terraform apply -parallelism=5.
Recovering from Partial Applies#
If terraform apply fails midway, the state file accurately reflects what was created. Run terraform plan to see remaining work and terraform apply again – Terraform picks up where it left off.
If the partial state is badly broken (rare, usually provider bugs), restore from backup:
terraform state pull > broken.tfstate
# Restore from S3 versioning or your backup
aws s3api list-object-versions --bucket myorg-tfstate --prefix prod/terraform.tfstate
aws s3api get-object --bucket myorg-tfstate --key prod/terraform.tfstate \
--version-id "previous-version-id" restored.tfstate
terraform state push restored.tfstate