Building Machine Images with Packer#
Machine images (AMIs, Azure Managed Images, GCP Images) are the foundation of immutable infrastructure. Instead of provisioning a base OS and configuring it at boot, you build a pre-configured image and launch instances from it. Packer automates this process: it launches a temporary instance, runs provisioners to configure it, creates an image from the result, and destroys the temporary instance.
This operational sequence walks through building, testing, and managing machine images with Packer from template creation through CI/CD integration.
Phase 1 – Template Structure#
Step 1: Initialize the Project#
Create the project directory and a base template using HCL2 (Packer’s current configuration language, replacing the legacy JSON format):
mkdir -p packer/
cd packer/A Packer template consists of three primary blocks: source (defines the builder), build (defines provisioners and post-processors), and variable (defines inputs).
Step 2: Define Variables#
# variables.pkr.hcl
variable "aws_region" {
type = string
default = "us-east-1"
}
variable "instance_type" {
type = string
default = "t3.medium"
}
variable "image_name" {
type = string
default = "app-base"
}
variable "image_version" {
type = string
}
variable "ssh_username" {
type = string
default = "ubuntu"
}
variable "base_ami_filter" {
type = string
default = "ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"
}Variables without defaults are required at build time. Pass them via command line (-var), variable files (-var-file), or environment variables (PKR_VAR_image_version).
Step 3: Define Sources (Builders)#
Sources define where and how the temporary build instance is created.
AWS AMI:
# aws.pkr.hcl
source "amazon-ebs" "base" {
region = var.aws_region
instance_type = var.instance_type
ami_name = "${var.image_name}-${var.image_version}-{{timestamp}}"
source_ami_filter {
filters = {
name = var.base_ami_filter
root-device-type = "ebs"
virtualization-type = "hvm"
}
most_recent = true
owners = ["099720109477"] # Canonical
}
ssh_username = var.ssh_username
tags = {
Name = "${var.image_name}-${var.image_version}"
Version = var.image_version
BuildTime = "{{timestamp}}"
BaseAMI = "{{ .SourceAMI }}"
ManagedBy = "packer"
}
# Encrypt the resulting AMI
encrypt_boot = true
# Share with other accounts
ami_users = ["111111111111", "222222222222"]
}Azure:
# azure.pkr.hcl
source "azure-arm" "base" {
subscription_id = var.azure_subscription_id
client_id = var.azure_client_id
client_secret = var.azure_client_secret
tenant_id = var.azure_tenant_id
managed_image_name = "${var.image_name}-${var.image_version}"
managed_image_resource_group_name = "packer-images"
os_type = "Linux"
image_publisher = "Canonical"
image_offer = "0001-com-ubuntu-server-jammy"
image_sku = "22_04-lts"
location = "eastus"
vm_size = "Standard_D2s_v3"
azure_tags = {
version = var.image_version
managedBy = "packer"
}
}GCP:
# gcp.pkr.hcl
source "googlecompute" "base" {
project_id = var.gcp_project_id
zone = "us-central1-a"
machine_type = "e2-medium"
source_image_family = "ubuntu-2204-lts"
image_name = "${var.image_name}-${var.image_version}-{{timestamp}}"
image_family = var.image_name
image_description = "Base image version ${var.image_version}"
ssh_username = var.ssh_username
image_labels = {
version = replace(var.image_version, ".", "-")
managed_by = "packer"
}
}Docker (for testing or container image building):
# docker.pkr.hcl
source "docker" "base" {
image = "ubuntu:22.04"
commit = true
changes = [
"ENTRYPOINT [\"/usr/sbin/sshd\", \"-D\"]",
"EXPOSE 22"
]
}Step 4: Verification#
Validate the template syntax:
packer init . # Download required plugins
packer validate . # Check syntax and configuration
packer fmt . # Format HCL files consistentlyPhase 2 – Provisioning#
Step 5: Define the Build Block#
The build block connects sources to provisioners. Provisioners run in order and configure the instance.
# build.pkr.hcl
build {
sources = [
"source.amazon-ebs.base",
"source.azure-arm.base",
"source.googlecompute.base",
]
# Wait for cloud-init to finish
provisioner "shell" {
inline = [
"while [ ! -f /var/lib/cloud/instance/boot-finished ]; do sleep 2; done"
]
}
# System updates
provisioner "shell" {
inline = [
"sudo apt-get update",
"sudo apt-get upgrade -y",
"sudo apt-get install -y curl wget jq unzip htop"
]
}
# Copy configuration files
provisioner "file" {
source = "files/sshd_config"
destination = "/tmp/sshd_config"
}
provisioner "shell" {
inline = [
"sudo mv /tmp/sshd_config /etc/ssh/sshd_config",
"sudo chown root:root /etc/ssh/sshd_config",
"sudo chmod 644 /etc/ssh/sshd_config"
]
}
# Run Ansible for complex configuration
provisioner "ansible" {
playbook_file = "ansible/configure.yml"
extra_arguments = [
"--extra-vars", "image_version=${var.image_version}"
]
}
# Clean up before creating the image
provisioner "shell" {
inline = [
"sudo apt-get clean",
"sudo rm -rf /var/lib/apt/lists/*",
"sudo rm -rf /tmp/*",
"sudo rm -rf /var/tmp/*",
"sudo rm -f /root/.bash_history",
"sudo rm -f /home/${var.ssh_username}/.bash_history",
"sudo truncate -s 0 /var/log/*.log",
"sudo sync"
]
}
}Step 6: Provisioner Types#
Shell provisioner runs commands directly. Best for simple tasks like package installation and file cleanup. Use inline for short commands and script or scripts for longer scripts:
provisioner "shell" {
scripts = [
"scripts/01-base-packages.sh",
"scripts/02-security-hardening.sh",
"scripts/03-monitoring-agent.sh"
]
environment_vars = [
"DEBIAN_FRONTEND=noninteractive"
]
}Ansible provisioner runs an Ansible playbook against the build instance. Best for complex configuration that benefits from Ansible’s idempotency, templates, and role ecosystem:
provisioner "ansible" {
playbook_file = "ansible/site.yml"
galaxy_file = "ansible/requirements.yml"
roles_path = "ansible/roles"
extra_arguments = [
"--extra-vars", "env=production image_version=${var.image_version}",
"--tags", "base,security,monitoring"
]
}File provisioner copies files or directories to the build instance. It uploads only – for downloads, use a shell provisioner with curl or wget.
Step 7: Cloud-Specific Post-Build Steps#
For Azure images, the instance must be generalized before capture:
build {
sources = ["source.azure-arm.base"]
# ... provisioners ...
provisioner "shell" {
execute_command = "chmod +x {{ .Path }}; {{ .Vars }} sudo -E sh '{{ .Path }}'"
inline = [
"/usr/sbin/waagent -force -deprovision+user && export HISTSIZE=0 && sync"
]
skip_clean = true
}
}Step 8: Build the Image#
# Build for all sources
packer build -var "image_version=1.0.0" .
# Build for a specific source only
packer build -var "image_version=1.0.0" -only="amazon-ebs.base" .
# Debug mode (pauses on failure for SSH access)
packer build -var "image_version=1.0.0" -debug .The -debug flag is invaluable for troubleshooting. When a provisioner fails, Packer pauses and prints SSH connection details so you can SSH into the instance and investigate.
Phase 3 – Post-Processors#
Step 9: Add Post-Processors#
Post-processors run after the image is created. Common uses include generating manifests, compressing artifacts, and pushing to registries.
build {
sources = ["source.amazon-ebs.base"]
# ... provisioners ...
post-processor "manifest" {
output = "build-manifest.json"
strip_path = true
}
# For Docker builds: tag and push
post-processors {
post-processor "docker-tag" {
repository = "myregistry.example.com/app-base"
tags = [var.image_version, "latest"]
}
post-processor "docker-push" {
login = true
login_server = "myregistry.example.com"
login_username = var.registry_username
login_password = var.registry_password
}
}
}The manifest post-processor outputs a JSON file with the built image IDs, timestamps, and builder details. This file is consumed by downstream processes (Terraform, deployment pipelines) to reference the correct image.
Step 10: Verification#
After the build completes, verify the manifest output contains the expected image IDs:
cat build-manifest.json | jq '.builds[] | {name: .name, artifact_id: .artifact_id}'Phase 4 – Image Testing#
Step 11: Launch a Test Instance#
Before promoting an image to production, verify it works by launching an instance and running tests against it.
# test/main.tf
variable "ami_id" {
type = string
}
resource "aws_instance" "test" {
ami = var.ami_id
instance_type = "t3.small"
key_name = "test-key"
tags = {
Name = "packer-image-test"
}
}
output "test_ip" {
value = aws_instance.test.public_ip
}# Extract AMI from manifest and launch test instance
AMI_ID=$(jq -r '.builds[-1].artifact_id' build-manifest.json | cut -d: -f2)
cd test/
terraform apply -var "ami_id=$AMI_ID" -auto-approve
TEST_IP=$(terraform output -raw test_ip)Step 12: Run Verification Tests#
Use InSpec, Serverspec, or Goss to verify the image configuration:
# test/image_spec.rb (InSpec)
describe package('nginx') do
it { should be_installed }
end
describe service('nginx') do
it { should be_enabled }
it { should be_running }
end
describe port(80) do
it { should be_listening }
end
describe file('/etc/ssh/sshd_config') do
its('content') { should match(/PermitRootLogin no/) }
its('content') { should match(/PasswordAuthentication no/) }
end
describe command('openssl version') do
its('stdout') { should match(/OpenSSL 3/) }
end
describe user('deploy') do
it { should exist }
its('groups') { should include 'sudo' }
endinspec exec test/image_spec.rb -t ssh://ubuntu@$TEST_IP -i test-key.pemStep 13: Clean Up Test Infrastructure#
cd test/
terraform destroy -var "ami_id=$AMI_ID" -auto-approvePhase 5 – CI/CD Integration#
Step 14: Pipeline Definition#
# .github/workflows/packer-build.yml
name: Build Machine Image
on:
push:
branches: [main]
paths: ['packer/**']
workflow_dispatch:
inputs:
image_version:
description: 'Image version'
required: true
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Packer
uses: hashicorp/setup-packer@main
- name: Initialize Packer
run: packer init packer/
- name: Validate template
run: packer validate -var "image_version=0.0.0" packer/
- name: Check formatting
run: packer fmt -check packer/
build:
runs-on: ubuntu-latest
needs: validate
steps:
- uses: actions/checkout@v4
- name: Setup Packer
uses: hashicorp/setup-packer@main
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: us-east-1
- name: Build image
run: |
VERSION=${{ github.event.inputs.image_version || github.sha }}
packer build -var "image_version=$VERSION" -color=false packer/
- name: Upload manifest
uses: actions/upload-artifact@v4
with:
name: build-manifest
path: packer/build-manifest.json
test:
runs-on: ubuntu-latest
needs: build
steps:
- uses: actions/checkout@v4
- name: Download manifest
uses: actions/download-artifact@v4
with:
name: build-manifest
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: us-east-1
- name: Launch test instance and run tests
run: |
AMI_ID=$(jq -r '.builds[-1].artifact_id' build-manifest.json | cut -d: -f2)
cd packer/test/
terraform init
terraform apply -var "ami_id=$AMI_ID" -auto-approve
TEST_IP=$(terraform output -raw test_ip)
sleep 60 # Wait for instance to boot
inspec exec image_spec.rb -t ssh://ubuntu@$TEST_IP -i /tmp/test-key.pem
- name: Cleanup test infrastructure
if: always()
run: |
cd packer/test/
terraform destroy -auto-approveStep 15: Verification#
Trigger the pipeline and verify that each stage completes: validation catches template errors, the build produces an image and manifest, tests launch an instance and pass, and cleanup destroys the test infrastructure regardless of test outcome.
Phase 6 – Image Lifecycle Management#
Step 16: Image Retention Policy#
Images accumulate over time and incur storage costs. Define a retention policy:
#!/bin/bash
# scripts/cleanup-old-amis.sh
# Keep the 5 most recent images per image family, delete the rest
IMAGE_NAME="app-base"
KEEP_COUNT=5
AMI_IDS=$(aws ec2 describe-images \
--owners self \
--filters "Name=tag:Name,Values=${IMAGE_NAME}-*" "Name=tag:ManagedBy,Values=packer" \
--query "sort_by(Images, &CreationDate)[:-${KEEP_COUNT}].ImageId" \
--output text)
for AMI_ID in $AMI_IDS; do
echo "Deregistering $AMI_ID"
SNAP_IDS=$(aws ec2 describe-images --image-ids "$AMI_ID" \
--query 'Images[0].BlockDeviceMappings[*].Ebs.SnapshotId' --output text)
aws ec2 deregister-image --image-id "$AMI_ID"
for SNAP_ID in $SNAP_IDS; do
echo "Deleting snapshot $SNAP_ID"
aws ec2 delete-snapshot --snapshot-id "$SNAP_ID"
done
doneStep 17: Image Promotion#
Use a promotion model where images progress through stages:
- Build: Image is created and tagged
status=testing. - Test: Automated tests pass, image is tagged
status=staging. - Staging: Deployed to staging environment, soak for 24-48 hours.
- Production: Promoted to
status=production, Terraform references this tag to find the current production image.
# In Terraform, reference the latest production image
data "aws_ami" "app" {
most_recent = true
owners = ["self"]
filter {
name = "tag:Name"
values = ["app-base-*"]
}
filter {
name = "tag:status"
values = ["production"]
}
}Step 18: Rebuild Schedule#
Even if your application has not changed, rebuild images regularly (weekly or bi-weekly) to incorporate OS security patches. A stale image that has not been rebuilt in 90 days likely has unpatched vulnerabilities.
Schedule automated rebuilds in CI:
on:
schedule:
- cron: '0 4 * * 1' # Every Monday at 4 AM UTCCommon Gotchas#
Not waiting for cloud-init. Cloud providers run cloud-init on instance launch. If Packer starts provisioning before cloud-init finishes, package installations fail because apt/yum is locked. Always wait for /var/lib/cloud/instance/boot-finished as the first provisioner step.
Forgetting to clean up. Temporary files, package caches, shell history, and SSH keys left in the image waste space and can leak information. Always include a cleanup provisioner as the last step before image creation.
Not deleting snapshots when deregistering AMIs. Deregistering an AMI in AWS does not delete the underlying EBS snapshots. They continue to incur storage charges. Always delete associated snapshots when removing old images.
Building images without version tags. Images without version metadata are impossible to track. Always tag images with a version, build timestamp, and source commit hash. The manifest post-processor captures this information automatically.
Testing only the build, not the image. A successful packer build means the provisioners ran without errors. It does not mean the resulting image actually works. Launch an instance from the image and verify that services start, ports are open, and configurations are correct.