Integrating Infrastructure as Code with CI/CD#

Running Terraform locally works for one person. It breaks down when multiple people (or agents) modify infrastructure concurrently, when changes need review before applying, and when environments (dev/staging/prod) need synchronized promotion. CI/CD pipelines solve this by making the plan-review-apply cycle automated, auditable, and safe.

This article covers the patterns for integrating Terraform into CI/CD — from the basic plan-on-PR flow to multi-directory monorepos with dependency ordering and environment promotion.

The Core Pattern: Plan on PR, Apply on Merge#

Developer creates PR
        ↓
CI runs terraform plan → posts plan output as PR comment
        ↓
Reviewer reads plan, approves PR
        ↓
PR merges to main
        ↓
CI runs terraform apply with the exact plan reviewed

This is the foundation. Every other pattern builds on it.

Why This Pattern Is Non-Negotiable#

Plan visibility: The reviewer sees exactly what will change before it changes
Auditability: Every infrastructure change is tied to a PR with discussion, approval, and plan output
Safety: apply runs the saved plan, not a re-computed one that might differ
Concurrency control: The state lock prevents two applies from running simultaneously
Rollback trail: Every change is a git commit that can be reverted

The GitHub Actions Implementation#

name: Terraform
on:
  pull_request:
    paths: ["infrastructure/**"]
  push:
    branches: [main]
    paths: ["infrastructure/**"]

permissions:
  id-token: write
  contents: read
  pull-requests: write

env:
  TF_IN_AUTOMATION: true
  TF_INPUT: false

jobs:
  plan:
    if: github.event_name == 'pull_request'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.7.0

      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ vars.TERRAFORM_PLAN_ROLE_ARN }}
          aws-region: us-east-1

      - name: Init
        working-directory: infrastructure
        run: terraform init -backend-config=backend.hcl

      - name: Plan
        working-directory: infrastructure
        id: plan
        run: terraform plan -no-color -out=tfplan 2>&1 | tee plan.txt
        continue-on-error: true

      - name: Comment Plan on PR
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const plan = fs.readFileSync('infrastructure/plan.txt', 'utf8');
            const truncated = plan.length > 60000
              ? plan.substring(0, 60000) + '\n\n... truncated ...'
              : plan;
            await github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.issue.number,
              body: `### Terraform Plan\n\`\`\`\n${truncated}\n\`\`\``
            });

      - name: Fail on Plan Error
        if: steps.plan.outcome == 'failure'
        run: exit 1

  apply:
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    environment: production  # requires manual approval in GitHub
    steps:
      - uses: actions/checkout@v4

      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.7.0

      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ vars.TERRAFORM_APPLY_ROLE_ARN }}
          aws-region: us-east-1

      - name: Init
        working-directory: infrastructure
        run: terraform init -backend-config=backend.hcl

      - name: Plan
        working-directory: infrastructure
        run: terraform plan -no-color -out=tfplan

      - name: Apply
        working-directory: infrastructure
        run: terraform apply -no-color tfplan

Key details:

TF_IN_AUTOMATION=true suppresses interactive prompts
TF_INPUT=false prevents Terraform from waiting for input on missing variables
Separate IAM roles for plan (read-only) and apply (write) — principle of least privilege
environment: production in the apply job enables GitHub’s environment protection rules (manual approval)

Multi-Directory Monorepo#

When infrastructure is decomposed into separate root modules (networking, database, compute), CI/CD must detect which directories changed and run plan/apply only for those.

Directory Detection#

jobs:
  detect-changes:
    runs-on: ubuntu-latest
    outputs:
      directories: ${{ steps.detect.outputs.directories }}
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - id: detect
        run: |
          DIRS=$(git diff --name-only origin/main...HEAD \
            | grep '^infrastructure/' \
            | cut -d'/' -f1-2 \
            | sort -u \
            | jq -R -s -c 'split("\n") | map(select(length > 0))')
          echo "directories=$DIRS" >> "$GITHUB_OUTPUT"

  plan:
    needs: detect-changes
    if: needs.detect-changes.outputs.directories != '[]'
    strategy:
      matrix:
        directory: ${{ fromJson(needs.detect-changes.outputs.directories) }}
      fail-fast: false
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Plan
        working-directory: ${{ matrix.directory }}
        run: |
          terraform init
          terraform plan -no-color -out=tfplan

Dependency Ordering for Apply#

Plan can run in parallel for all changed directories. Apply must respect dependencies:

infrastructure/
├── networking/    # Layer 1: no dependencies
├── database/      # Layer 2: depends on networking
├── compute/       # Layer 2: depends on networking
└── application/   # Layer 3: depends on database + compute

jobs:
  apply-layer-1:
    if: contains(needs.detect-changes.outputs.directories, 'infrastructure/networking')
    steps:
      - name: Apply Networking
        working-directory: infrastructure/networking
        run: terraform apply tfplan

  apply-layer-2:
    needs: apply-layer-1
    strategy:
      matrix:
        directory: [infrastructure/database, infrastructure/compute]
    steps:
      - name: Apply
        working-directory: ${{ matrix.directory }}
        run: terraform apply tfplan

  apply-layer-3:
    needs: apply-layer-2
    if: contains(needs.detect-changes.outputs.directories, 'infrastructure/application')
    steps:
      - name: Apply Application
        working-directory: infrastructure/application
        run: terraform apply tfplan

Layer 1 (networking) applies first. Layer 2 (database, compute) applies in parallel after Layer 1. Layer 3 (application) applies after Layer 2.

Drift Detection#

Infrastructure drift — changes made outside of Terraform — should be detected proactively, not discovered during the next apply.

Scheduled Drift Detection#

name: Drift Detection
on:
  schedule:
    - cron: '0 6 * * 1-5'  # weekdays at 6 AM UTC

jobs:
  check-drift:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        directory:
          - infrastructure/networking
          - infrastructure/database
          - infrastructure/compute
          - infrastructure/application
    steps:
      - uses: actions/checkout@v4

      - name: Init
        working-directory: ${{ matrix.directory }}
        run: terraform init

      - name: Detect Drift
        id: drift
        working-directory: ${{ matrix.directory }}
        run: |
          terraform plan -detailed-exitcode -no-color 2>&1 | tee drift.txt
          echo "exit_code=$?" >> "$GITHUB_OUTPUT"
        continue-on-error: true

      - name: Alert on Drift
        if: steps.drift.outputs.exit_code == '2'
        run: |
          echo "::warning::Drift detected in ${{ matrix.directory }}"
          # Send Slack notification, create GitHub issue, etc.

terraform plan -detailed-exitcode returns:

0: No changes (no drift)
1: Error
2: Changes detected (drift)

What to Do When Drift Is Detected#

Investigate: What changed? Check cloud audit logs (CloudTrail, Azure Activity Log, GCP Audit Log)
Classify: Was the change intentional (manual hotfix, auto-scaling) or accidental (console click)?
Decide:
- If intentional: update Terraform code to match reality (terraform apply -refresh-only then adjust code)
- If accidental: apply Terraform to revert to the desired state
- If auto-managed: add ignore_changes for that attribute

Environment Promotion#

Moving infrastructure changes safely from dev → staging → production.

The Promotion Pattern#

                 dev/                    staging/               prod/
                  │                        │                      │
PR with change ──→│                        │                      │
                  ├── plan + apply ──→     │                      │
                  │                  OK?   │                      │
                  │                   │    ├── plan + apply ──→   │
                  │                   │    │                OK?   │
                  │                   │    │                 │    ├── plan + apply
                  │                   │    │                 │    │

Implementation: Staged Applies#

jobs:
  apply-dev:
    environment: dev
    steps:
      - working-directory: infrastructure/envs/dev
        run: terraform init && terraform plan -out=tfplan && terraform apply tfplan

  apply-staging:
    needs: apply-dev
    environment: staging  # may require manual approval
    steps:
      - working-directory: infrastructure/envs/staging
        run: terraform init && terraform plan -out=tfplan && terraform apply tfplan

  apply-prod:
    needs: apply-staging
    environment: production  # always requires manual approval
    steps:
      - working-directory: infrastructure/envs/prod
        run: terraform init && terraform plan -out=tfplan && terraform apply tfplan

Key: Each environment re-plans (not re-using the dev plan file). The code is the same, but the state and variables differ. The plan for production might show different changes than dev if the environments have diverged.

Emergency Rollback#

When an apply causes problems, you need to revert quickly.

Git Revert Pattern (Safest)#

# 1. Revert the merge commit
git revert -m 1 HEAD

# 2. Push the revert (triggers CI)
git push

# 3. CI runs plan (showing the revert changes) and apply

This is the safest rollback because it goes through the full plan-review-apply cycle. The plan shows exactly what will be reverted.

Manual Targeted Revert (Faster, Riskier)#

# 1. Check out the previous state of one directory
git checkout HEAD~1 -- infrastructure/compute/

# 2. Plan and apply locally (bypasses CI)
cd infrastructure/compute
terraform plan -out=tfplan
terraform apply tfplan

# 3. Commit the revert
git add . && git commit -m "Revert compute changes"

This is faster but bypasses CI review. Use only in genuine emergencies.

What Cannot Be Rolled Back#

Some changes are irreversible even with a git revert:

Database deletions (data is gone unless there is a backup)
Encryption key rotations (old key is disabled)
DNS propagation (reverting the record does not immediately undo global DNS cache)
S3 bucket name changes (old name is released, may be claimed by someone else)

For these, the “rollback” is a forward fix: create a new resource, restore from backup, or wait for propagation.

Platform Comparison#

Feature	GitHub Actions	Atlantis	Spacelift	Terraform Cloud
Hosting	GitHub-hosted or self-hosted	Self-hosted	SaaS	SaaS
Plan on PR	Via workflow	Native (`atlantis plan`)	Native	Native
Apply on merge	Via workflow	Via PR comment (`atlantis apply`)	Native	Native
State management	You manage (S3/Azure Blob/GCS)	You manage	Built-in	Built-in
Drift detection	Custom scheduled job	Not built-in	Native	Native
Cost estimation	Via Infracost integration	Via Infracost integration	Native	Via integration
Policy as code	Via OPA/Conftest steps	Via OPA/Conftest	Native (OPA)	Sentinel
Multi-directory	Matrix strategy	Native (per-directory)	Native (stacks)	Workspaces
Dependency ordering	Manual job dependencies	Custom workflows	Stack dependencies	Run triggers
Price	Free for public repos, usage-based for private	Free (self-host cost)	From $0/mo (community)	From $0/mo (free tier)

For small teams: GitHub Actions + manual S3 backend. Simple, free, sufficient.

For medium teams: Atlantis (if you want self-hosted control) or Spacelift (if you want managed).

For large teams: Spacelift or Terraform Cloud with full policy enforcement, drift detection, and stack dependencies.

The Complete Pipeline Checklist#

A production-ready IaC pipeline includes:

Format check (terraform fmt -check) — every commit
Validate (terraform validate) — every commit
Lint (tflint) — every PR
Security scan (checkov) — every PR
Plan with output posted to PR — every PR
Cost estimate (infracost) — every PR
Policy check (conftest) — every PR
Manual approval gate — before production apply
Apply from saved plan — on merge to main
Drift detection — scheduled (daily or weekly)
State backup — automated
Rollback procedure — documented and tested