Integrating Infrastructure as Code with CI/CD#
Running Terraform locally works for one person. It breaks down when multiple people (or agents) modify infrastructure concurrently, when changes need review before applying, and when environments (dev/staging/prod) need synchronized promotion. CI/CD pipelines solve this by making the plan-review-apply cycle automated, auditable, and safe.
This article covers the patterns for integrating Terraform into CI/CD — from the basic plan-on-PR flow to multi-directory monorepos with dependency ordering and environment promotion.
The Core Pattern: Plan on PR, Apply on Merge#
Developer creates PR
↓
CI runs terraform plan → posts plan output as PR comment
↓
Reviewer reads plan, approves PR
↓
PR merges to main
↓
CI runs terraform apply with the exact plan reviewedThis is the foundation. Every other pattern builds on it.
Why This Pattern Is Non-Negotiable#
- Plan visibility: The reviewer sees exactly what will change before it changes
- Auditability: Every infrastructure change is tied to a PR with discussion, approval, and plan output
- Safety:
applyruns the saved plan, not a re-computed one that might differ - Concurrency control: The state lock prevents two applies from running simultaneously
- Rollback trail: Every change is a git commit that can be reverted
The GitHub Actions Implementation#
name: Terraform
on:
pull_request:
paths: ["infrastructure/**"]
push:
branches: [main]
paths: ["infrastructure/**"]
permissions:
id-token: write
contents: read
pull-requests: write
env:
TF_IN_AUTOMATION: true
TF_INPUT: false
jobs:
plan:
if: github.event_name == 'pull_request'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.7.0
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ vars.TERRAFORM_PLAN_ROLE_ARN }}
aws-region: us-east-1
- name: Init
working-directory: infrastructure
run: terraform init -backend-config=backend.hcl
- name: Plan
working-directory: infrastructure
id: plan
run: terraform plan -no-color -out=tfplan 2>&1 | tee plan.txt
continue-on-error: true
- name: Comment Plan on PR
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const plan = fs.readFileSync('infrastructure/plan.txt', 'utf8');
const truncated = plan.length > 60000
? plan.substring(0, 60000) + '\n\n... truncated ...'
: plan;
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
body: `### Terraform Plan\n\`\`\`\n${truncated}\n\`\`\``
});
- name: Fail on Plan Error
if: steps.plan.outcome == 'failure'
run: exit 1
apply:
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
environment: production # requires manual approval in GitHub
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.7.0
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ vars.TERRAFORM_APPLY_ROLE_ARN }}
aws-region: us-east-1
- name: Init
working-directory: infrastructure
run: terraform init -backend-config=backend.hcl
- name: Plan
working-directory: infrastructure
run: terraform plan -no-color -out=tfplan
- name: Apply
working-directory: infrastructure
run: terraform apply -no-color tfplanKey details:
TF_IN_AUTOMATION=truesuppresses interactive promptsTF_INPUT=falseprevents Terraform from waiting for input on missing variables- Separate IAM roles for plan (read-only) and apply (write) — principle of least privilege
environment: productionin the apply job enables GitHub’s environment protection rules (manual approval)
Multi-Directory Monorepo#
When infrastructure is decomposed into separate root modules (networking, database, compute), CI/CD must detect which directories changed and run plan/apply only for those.
Directory Detection#
jobs:
detect-changes:
runs-on: ubuntu-latest
outputs:
directories: ${{ steps.detect.outputs.directories }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- id: detect
run: |
DIRS=$(git diff --name-only origin/main...HEAD \
| grep '^infrastructure/' \
| cut -d'/' -f1-2 \
| sort -u \
| jq -R -s -c 'split("\n") | map(select(length > 0))')
echo "directories=$DIRS" >> "$GITHUB_OUTPUT"
plan:
needs: detect-changes
if: needs.detect-changes.outputs.directories != '[]'
strategy:
matrix:
directory: ${{ fromJson(needs.detect-changes.outputs.directories) }}
fail-fast: false
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Plan
working-directory: ${{ matrix.directory }}
run: |
terraform init
terraform plan -no-color -out=tfplanDependency Ordering for Apply#
Plan can run in parallel for all changed directories. Apply must respect dependencies:
infrastructure/
├── networking/ # Layer 1: no dependencies
├── database/ # Layer 2: depends on networking
├── compute/ # Layer 2: depends on networking
└── application/ # Layer 3: depends on database + computejobs:
apply-layer-1:
if: contains(needs.detect-changes.outputs.directories, 'infrastructure/networking')
steps:
- name: Apply Networking
working-directory: infrastructure/networking
run: terraform apply tfplan
apply-layer-2:
needs: apply-layer-1
strategy:
matrix:
directory: [infrastructure/database, infrastructure/compute]
steps:
- name: Apply
working-directory: ${{ matrix.directory }}
run: terraform apply tfplan
apply-layer-3:
needs: apply-layer-2
if: contains(needs.detect-changes.outputs.directories, 'infrastructure/application')
steps:
- name: Apply Application
working-directory: infrastructure/application
run: terraform apply tfplanLayer 1 (networking) applies first. Layer 2 (database, compute) applies in parallel after Layer 1. Layer 3 (application) applies after Layer 2.
Drift Detection#
Infrastructure drift — changes made outside of Terraform — should be detected proactively, not discovered during the next apply.
Scheduled Drift Detection#
name: Drift Detection
on:
schedule:
- cron: '0 6 * * 1-5' # weekdays at 6 AM UTC
jobs:
check-drift:
runs-on: ubuntu-latest
strategy:
matrix:
directory:
- infrastructure/networking
- infrastructure/database
- infrastructure/compute
- infrastructure/application
steps:
- uses: actions/checkout@v4
- name: Init
working-directory: ${{ matrix.directory }}
run: terraform init
- name: Detect Drift
id: drift
working-directory: ${{ matrix.directory }}
run: |
terraform plan -detailed-exitcode -no-color 2>&1 | tee drift.txt
echo "exit_code=$?" >> "$GITHUB_OUTPUT"
continue-on-error: true
- name: Alert on Drift
if: steps.drift.outputs.exit_code == '2'
run: |
echo "::warning::Drift detected in ${{ matrix.directory }}"
# Send Slack notification, create GitHub issue, etc.terraform plan -detailed-exitcode returns:
- 0: No changes (no drift)
- 1: Error
- 2: Changes detected (drift)
What to Do When Drift Is Detected#
- Investigate: What changed? Check cloud audit logs (CloudTrail, Azure Activity Log, GCP Audit Log)
- Classify: Was the change intentional (manual hotfix, auto-scaling) or accidental (console click)?
- Decide:
- If intentional: update Terraform code to match reality (
terraform apply -refresh-onlythen adjust code) - If accidental: apply Terraform to revert to the desired state
- If auto-managed: add
ignore_changesfor that attribute
- If intentional: update Terraform code to match reality (
Environment Promotion#
Moving infrastructure changes safely from dev → staging → production.
The Promotion Pattern#
dev/ staging/ prod/
│ │ │
PR with change ──→│ │ │
├── plan + apply ──→ │ │
│ OK? │ │
│ │ ├── plan + apply ──→ │
│ │ │ OK? │
│ │ │ │ ├── plan + apply
│ │ │ │ │Implementation: Staged Applies#
jobs:
apply-dev:
environment: dev
steps:
- working-directory: infrastructure/envs/dev
run: terraform init && terraform plan -out=tfplan && terraform apply tfplan
apply-staging:
needs: apply-dev
environment: staging # may require manual approval
steps:
- working-directory: infrastructure/envs/staging
run: terraform init && terraform plan -out=tfplan && terraform apply tfplan
apply-prod:
needs: apply-staging
environment: production # always requires manual approval
steps:
- working-directory: infrastructure/envs/prod
run: terraform init && terraform plan -out=tfplan && terraform apply tfplanKey: Each environment re-plans (not re-using the dev plan file). The code is the same, but the state and variables differ. The plan for production might show different changes than dev if the environments have diverged.
Emergency Rollback#
When an apply causes problems, you need to revert quickly.
Git Revert Pattern (Safest)#
# 1. Revert the merge commit
git revert -m 1 HEAD
# 2. Push the revert (triggers CI)
git push
# 3. CI runs plan (showing the revert changes) and applyThis is the safest rollback because it goes through the full plan-review-apply cycle. The plan shows exactly what will be reverted.
Manual Targeted Revert (Faster, Riskier)#
# 1. Check out the previous state of one directory
git checkout HEAD~1 -- infrastructure/compute/
# 2. Plan and apply locally (bypasses CI)
cd infrastructure/compute
terraform plan -out=tfplan
terraform apply tfplan
# 3. Commit the revert
git add . && git commit -m "Revert compute changes"This is faster but bypasses CI review. Use only in genuine emergencies.
What Cannot Be Rolled Back#
Some changes are irreversible even with a git revert:
- Database deletions (data is gone unless there is a backup)
- Encryption key rotations (old key is disabled)
- DNS propagation (reverting the record does not immediately undo global DNS cache)
- S3 bucket name changes (old name is released, may be claimed by someone else)
For these, the “rollback” is a forward fix: create a new resource, restore from backup, or wait for propagation.
Platform Comparison#
| Feature | GitHub Actions | Atlantis | Spacelift | Terraform Cloud |
|---|---|---|---|---|
| Hosting | GitHub-hosted or self-hosted | Self-hosted | SaaS | SaaS |
| Plan on PR | Via workflow | Native (atlantis plan) |
Native | Native |
| Apply on merge | Via workflow | Via PR comment (atlantis apply) |
Native | Native |
| State management | You manage (S3/Azure Blob/GCS) | You manage | Built-in | Built-in |
| Drift detection | Custom scheduled job | Not built-in | Native | Native |
| Cost estimation | Via Infracost integration | Via Infracost integration | Native | Via integration |
| Policy as code | Via OPA/Conftest steps | Via OPA/Conftest | Native (OPA) | Sentinel |
| Multi-directory | Matrix strategy | Native (per-directory) | Native (stacks) | Workspaces |
| Dependency ordering | Manual job dependencies | Custom workflows | Stack dependencies | Run triggers |
| Price | Free for public repos, usage-based for private | Free (self-host cost) | From $0/mo (community) | From $0/mo (free tier) |
For small teams: GitHub Actions + manual S3 backend. Simple, free, sufficient.
For medium teams: Atlantis (if you want self-hosted control) or Spacelift (if you want managed).
For large teams: Spacelift or Terraform Cloud with full policy enforcement, drift detection, and stack dependencies.
The Complete Pipeline Checklist#
A production-ready IaC pipeline includes:
- Format check (
terraform fmt -check) — every commit - Validate (
terraform validate) — every commit - Lint (
tflint) — every PR - Security scan (
checkov) — every PR - Plan with output posted to PR — every PR
- Cost estimate (
infracost) — every PR - Policy check (
conftest) — every PR - Manual approval gate — before production apply
- Apply from saved plan — on merge to main
- Drift detection — scheduled (daily or weekly)
- State backup — automated
- Rollback procedure — documented and tested