Building Machine Images with Packer#

Machine images (AMIs, Azure Managed Images, GCP Images) are the foundation of immutable infrastructure. Instead of provisioning a base OS and configuring it at boot, you build a pre-configured image and launch instances from it. Packer automates this process: it launches a temporary instance, runs provisioners to configure it, creates an image from the result, and destroys the temporary instance.

This operational sequence walks through building, testing, and managing machine images with Packer from template creation through CI/CD integration.

Phase 1 – Template Structure#

Step 1: Initialize the Project#

Create the project directory and a base template using HCL2 (Packer’s current configuration language, replacing the legacy JSON format):

mkdir -p packer/
cd packer/

A Packer template consists of three primary blocks: source (defines the builder), build (defines provisioners and post-processors), and variable (defines inputs).

Step 2: Define Variables#

# variables.pkr.hcl
variable "aws_region" {
  type    = string
  default = "us-east-1"
}

variable "instance_type" {
  type    = string
  default = "t3.medium"
}

variable "image_name" {
  type    = string
  default = "app-base"
}

variable "image_version" {
  type = string
}

variable "ssh_username" {
  type    = string
  default = "ubuntu"
}

variable "base_ami_filter" {
  type    = string
  default = "ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"
}

Variables without defaults are required at build time. Pass them via command line (-var), variable files (-var-file), or environment variables (PKR_VAR_image_version).

Step 3: Define Sources (Builders)#

Sources define where and how the temporary build instance is created.

AWS AMI:

# aws.pkr.hcl
source "amazon-ebs" "base" {
  region        = var.aws_region
  instance_type = var.instance_type
  ami_name      = "${var.image_name}-${var.image_version}-{{timestamp}}"

  source_ami_filter {
    filters = {
      name                = var.base_ami_filter
      root-device-type    = "ebs"
      virtualization-type = "hvm"
    }
    most_recent = true
    owners      = ["099720109477"]  # Canonical
  }

  ssh_username = var.ssh_username

  tags = {
    Name        = "${var.image_name}-${var.image_version}"
    Version     = var.image_version
    BuildTime   = "{{timestamp}}"
    BaseAMI     = "{{ .SourceAMI }}"
    ManagedBy   = "packer"
  }

  # Encrypt the resulting AMI
  encrypt_boot = true

  # Share with other accounts
  ami_users = ["111111111111", "222222222222"]
}

Azure:

# azure.pkr.hcl
source "azure-arm" "base" {
  subscription_id = var.azure_subscription_id
  client_id       = var.azure_client_id
  client_secret   = var.azure_client_secret
  tenant_id       = var.azure_tenant_id

  managed_image_name                = "${var.image_name}-${var.image_version}"
  managed_image_resource_group_name = "packer-images"

  os_type         = "Linux"
  image_publisher = "Canonical"
  image_offer     = "0001-com-ubuntu-server-jammy"
  image_sku       = "22_04-lts"

  location = "eastus"
  vm_size  = "Standard_D2s_v3"

  azure_tags = {
    version   = var.image_version
    managedBy = "packer"
  }
}

GCP:

# gcp.pkr.hcl
source "googlecompute" "base" {
  project_id   = var.gcp_project_id
  zone         = "us-central1-a"
  machine_type = "e2-medium"

  source_image_family = "ubuntu-2204-lts"
  image_name          = "${var.image_name}-${var.image_version}-{{timestamp}}"
  image_family        = var.image_name
  image_description   = "Base image version ${var.image_version}"

  ssh_username = var.ssh_username

  image_labels = {
    version    = replace(var.image_version, ".", "-")
    managed_by = "packer"
  }
}

Docker (for testing or container image building):

# docker.pkr.hcl
source "docker" "base" {
  image  = "ubuntu:22.04"
  commit = true
  changes = [
    "ENTRYPOINT [\"/usr/sbin/sshd\", \"-D\"]",
    "EXPOSE 22"
  ]
}

Step 4: Verification#

Validate the template syntax:

packer init .        # Download required plugins
packer validate .    # Check syntax and configuration
packer fmt .         # Format HCL files consistently

Phase 2 – Provisioning#

Step 5: Define the Build Block#

The build block connects sources to provisioners. Provisioners run in order and configure the instance.

# build.pkr.hcl
build {
  sources = [
    "source.amazon-ebs.base",
    "source.azure-arm.base",
    "source.googlecompute.base",
  ]

  # Wait for cloud-init to finish
  provisioner "shell" {
    inline = [
      "while [ ! -f /var/lib/cloud/instance/boot-finished ]; do sleep 2; done"
    ]
  }

  # System updates
  provisioner "shell" {
    inline = [
      "sudo apt-get update",
      "sudo apt-get upgrade -y",
      "sudo apt-get install -y curl wget jq unzip htop"
    ]
  }

  # Copy configuration files
  provisioner "file" {
    source      = "files/sshd_config"
    destination = "/tmp/sshd_config"
  }

  provisioner "shell" {
    inline = [
      "sudo mv /tmp/sshd_config /etc/ssh/sshd_config",
      "sudo chown root:root /etc/ssh/sshd_config",
      "sudo chmod 644 /etc/ssh/sshd_config"
    ]
  }

  # Run Ansible for complex configuration
  provisioner "ansible" {
    playbook_file = "ansible/configure.yml"
    extra_arguments = [
      "--extra-vars", "image_version=${var.image_version}"
    ]
  }

  # Clean up before creating the image
  provisioner "shell" {
    inline = [
      "sudo apt-get clean",
      "sudo rm -rf /var/lib/apt/lists/*",
      "sudo rm -rf /tmp/*",
      "sudo rm -rf /var/tmp/*",
      "sudo rm -f /root/.bash_history",
      "sudo rm -f /home/${var.ssh_username}/.bash_history",
      "sudo truncate -s 0 /var/log/*.log",
      "sudo sync"
    ]
  }
}

Step 6: Provisioner Types#

Shell provisioner runs commands directly. Best for simple tasks like package installation and file cleanup. Use inline for short commands and script or scripts for longer scripts:

provisioner "shell" {
  scripts = [
    "scripts/01-base-packages.sh",
    "scripts/02-security-hardening.sh",
    "scripts/03-monitoring-agent.sh"
  ]
  environment_vars = [
    "DEBIAN_FRONTEND=noninteractive"
  ]
}

Ansible provisioner runs an Ansible playbook against the build instance. Best for complex configuration that benefits from Ansible’s idempotency, templates, and role ecosystem:

provisioner "ansible" {
  playbook_file = "ansible/site.yml"
  galaxy_file   = "ansible/requirements.yml"
  roles_path    = "ansible/roles"
  extra_arguments = [
    "--extra-vars", "env=production image_version=${var.image_version}",
    "--tags", "base,security,monitoring"
  ]
}

File provisioner copies files or directories to the build instance. It uploads only – for downloads, use a shell provisioner with curl or wget.

Step 7: Cloud-Specific Post-Build Steps#

For Azure images, the instance must be generalized before capture:

build {
  sources = ["source.azure-arm.base"]

  # ... provisioners ...

  provisioner "shell" {
    execute_command = "chmod +x {{ .Path }}; {{ .Vars }} sudo -E sh '{{ .Path }}'"
    inline = [
      "/usr/sbin/waagent -force -deprovision+user && export HISTSIZE=0 && sync"
    ]
    skip_clean = true
  }
}

Step 8: Build the Image#

# Build for all sources
packer build -var "image_version=1.0.0" .

# Build for a specific source only
packer build -var "image_version=1.0.0" -only="amazon-ebs.base" .

# Debug mode (pauses on failure for SSH access)
packer build -var "image_version=1.0.0" -debug .

The -debug flag is invaluable for troubleshooting. When a provisioner fails, Packer pauses and prints SSH connection details so you can SSH into the instance and investigate.

Phase 3 – Post-Processors#

Step 9: Add Post-Processors#

Post-processors run after the image is created. Common uses include generating manifests, compressing artifacts, and pushing to registries.

build {
  sources = ["source.amazon-ebs.base"]

  # ... provisioners ...

  post-processor "manifest" {
    output     = "build-manifest.json"
    strip_path = true
  }

  # For Docker builds: tag and push
  post-processors {
    post-processor "docker-tag" {
      repository = "myregistry.example.com/app-base"
      tags       = [var.image_version, "latest"]
    }
    post-processor "docker-push" {
      login          = true
      login_server   = "myregistry.example.com"
      login_username = var.registry_username
      login_password = var.registry_password
    }
  }
}

The manifest post-processor outputs a JSON file with the built image IDs, timestamps, and builder details. This file is consumed by downstream processes (Terraform, deployment pipelines) to reference the correct image.

Step 10: Verification#

After the build completes, verify the manifest output contains the expected image IDs:

cat build-manifest.json | jq '.builds[] | {name: .name, artifact_id: .artifact_id}'

Phase 4 – Image Testing#

Step 11: Launch a Test Instance#

Before promoting an image to production, verify it works by launching an instance and running tests against it.

# test/main.tf
variable "ami_id" {
  type = string
}

resource "aws_instance" "test" {
  ami           = var.ami_id
  instance_type = "t3.small"
  key_name      = "test-key"

  tags = {
    Name = "packer-image-test"
  }
}

output "test_ip" {
  value = aws_instance.test.public_ip
}
# Extract AMI from manifest and launch test instance
AMI_ID=$(jq -r '.builds[-1].artifact_id' build-manifest.json | cut -d: -f2)
cd test/
terraform apply -var "ami_id=$AMI_ID" -auto-approve
TEST_IP=$(terraform output -raw test_ip)

Step 12: Run Verification Tests#

Use InSpec, Serverspec, or Goss to verify the image configuration:

# test/image_spec.rb (InSpec)
describe package('nginx') do
  it { should be_installed }
end

describe service('nginx') do
  it { should be_enabled }
  it { should be_running }
end

describe port(80) do
  it { should be_listening }
end

describe file('/etc/ssh/sshd_config') do
  its('content') { should match(/PermitRootLogin no/) }
  its('content') { should match(/PasswordAuthentication no/) }
end

describe command('openssl version') do
  its('stdout') { should match(/OpenSSL 3/) }
end

describe user('deploy') do
  it { should exist }
  its('groups') { should include 'sudo' }
end
inspec exec test/image_spec.rb -t ssh://ubuntu@$TEST_IP -i test-key.pem

Step 13: Clean Up Test Infrastructure#

cd test/
terraform destroy -var "ami_id=$AMI_ID" -auto-approve

Phase 5 – CI/CD Integration#

Step 14: Pipeline Definition#

# .github/workflows/packer-build.yml
name: Build Machine Image
on:
  push:
    branches: [main]
    paths: ['packer/**']
  workflow_dispatch:
    inputs:
      image_version:
        description: 'Image version'
        required: true

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Setup Packer
        uses: hashicorp/setup-packer@main
      - name: Initialize Packer
        run: packer init packer/
      - name: Validate template
        run: packer validate -var "image_version=0.0.0" packer/
      - name: Check formatting
        run: packer fmt -check packer/

  build:
    runs-on: ubuntu-latest
    needs: validate
    steps:
      - uses: actions/checkout@v4
      - name: Setup Packer
        uses: hashicorp/setup-packer@main
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
          aws-region: us-east-1
      - name: Build image
        run: |
          VERSION=${{ github.event.inputs.image_version || github.sha }}
          packer build -var "image_version=$VERSION" -color=false packer/
      - name: Upload manifest
        uses: actions/upload-artifact@v4
        with:
          name: build-manifest
          path: packer/build-manifest.json

  test:
    runs-on: ubuntu-latest
    needs: build
    steps:
      - uses: actions/checkout@v4
      - name: Download manifest
        uses: actions/download-artifact@v4
        with:
          name: build-manifest
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
          aws-region: us-east-1
      - name: Launch test instance and run tests
        run: |
          AMI_ID=$(jq -r '.builds[-1].artifact_id' build-manifest.json | cut -d: -f2)
          cd packer/test/
          terraform init
          terraform apply -var "ami_id=$AMI_ID" -auto-approve
          TEST_IP=$(terraform output -raw test_ip)
          sleep 60  # Wait for instance to boot
          inspec exec image_spec.rb -t ssh://ubuntu@$TEST_IP -i /tmp/test-key.pem
      - name: Cleanup test infrastructure
        if: always()
        run: |
          cd packer/test/
          terraform destroy -auto-approve

Step 15: Verification#

Trigger the pipeline and verify that each stage completes: validation catches template errors, the build produces an image and manifest, tests launch an instance and pass, and cleanup destroys the test infrastructure regardless of test outcome.

Phase 6 – Image Lifecycle Management#

Step 16: Image Retention Policy#

Images accumulate over time and incur storage costs. Define a retention policy:

#!/bin/bash
# scripts/cleanup-old-amis.sh
# Keep the 5 most recent images per image family, delete the rest

IMAGE_NAME="app-base"
KEEP_COUNT=5

AMI_IDS=$(aws ec2 describe-images \
  --owners self \
  --filters "Name=tag:Name,Values=${IMAGE_NAME}-*" "Name=tag:ManagedBy,Values=packer" \
  --query "sort_by(Images, &CreationDate)[:-${KEEP_COUNT}].ImageId" \
  --output text)

for AMI_ID in $AMI_IDS; do
  echo "Deregistering $AMI_ID"
  SNAP_IDS=$(aws ec2 describe-images --image-ids "$AMI_ID" \
    --query 'Images[0].BlockDeviceMappings[*].Ebs.SnapshotId' --output text)
  aws ec2 deregister-image --image-id "$AMI_ID"
  for SNAP_ID in $SNAP_IDS; do
    echo "Deleting snapshot $SNAP_ID"
    aws ec2 delete-snapshot --snapshot-id "$SNAP_ID"
  done
done

Step 17: Image Promotion#

Use a promotion model where images progress through stages:

  1. Build: Image is created and tagged status=testing.
  2. Test: Automated tests pass, image is tagged status=staging.
  3. Staging: Deployed to staging environment, soak for 24-48 hours.
  4. Production: Promoted to status=production, Terraform references this tag to find the current production image.
# In Terraform, reference the latest production image
data "aws_ami" "app" {
  most_recent = true
  owners      = ["self"]

  filter {
    name   = "tag:Name"
    values = ["app-base-*"]
  }

  filter {
    name   = "tag:status"
    values = ["production"]
  }
}

Step 18: Rebuild Schedule#

Even if your application has not changed, rebuild images regularly (weekly or bi-weekly) to incorporate OS security patches. A stale image that has not been rebuilt in 90 days likely has unpatched vulnerabilities.

Schedule automated rebuilds in CI:

on:
  schedule:
    - cron: '0 4 * * 1'   # Every Monday at 4 AM UTC

Common Gotchas#

Not waiting for cloud-init. Cloud providers run cloud-init on instance launch. If Packer starts provisioning before cloud-init finishes, package installations fail because apt/yum is locked. Always wait for /var/lib/cloud/instance/boot-finished as the first provisioner step.

Forgetting to clean up. Temporary files, package caches, shell history, and SSH keys left in the image waste space and can leak information. Always include a cleanup provisioner as the last step before image creation.

Not deleting snapshots when deregistering AMIs. Deregistering an AMI in AWS does not delete the underlying EBS snapshots. They continue to incur storage charges. Always delete associated snapshots when removing old images.

Building images without version tags. Images without version metadata are impossible to track. Always tag images with a version, build timestamp, and source commit hash. The manifest post-processor captures this information automatically.

Testing only the build, not the image. A successful packer build means the provisioners ran without errors. It does not mean the resulting image actually works. Launch an instance from the image and verify that services start, ports are open, and configurations are correct.