Debugging GitHub Actions#

When a GitHub Actions workflow fails or does not behave as expected, the problem falls into a few predictable categories. This guide covers each one with the diagnostic steps and fixes.

Workflow Not Triggering#

The most common GitHub Actions “bug” is a workflow that never runs.

Check the event and branch filter. A push trigger with branches: [main] will not fire for pushes to feature/xyz. A pull_request trigger fires for the PR’s head branch, not the base branch:

# This triggers when a PR targets main, not when you push to main
on:
  pull_request:
    branches: [main]

Check path filters. If you have paths: ['src/**'] and only changed a README, the workflow is skipped by design. GitHub shows skipped workflows in the Actions tab if you look carefully.

Fork restrictions. Workflows triggered by pull_request from a fork run with read-only GITHUB_TOKEN and cannot access repository secrets. Additionally, first-time contributors require maintainer approval before workflows run at all. Check the Actions tab for “Approve and run” buttons.

Workflow file location. The workflow file must be on the default branch for schedule and workflow_dispatch triggers. If you add a new workflow on a feature branch, workflow_dispatch will not appear in the UI until that branch is merged to main.

Use the GitHub CLI to check recent runs:

gh run list --workflow=ci.yml --limit=10
gh run view 12345678 --log

Step Failures#

Exit codes. Any non-zero exit code fails a step. A common trap is piped commands where only the last command’s exit code matters:

# This succeeds even if curl fails, because grep is the last command
- run: curl https://api.example.com/health | grep "ok"

Fix with set -eo pipefail (bash default in GitHub Actions, but explicit is safer):

- run: |
    set -eo pipefail
    curl https://api.example.com/health | grep "ok"
  shell: bash

continue-on-error lets a step fail without failing the job:

- name: Optional lint check
  run: golangci-lint run
  continue-on-error: true

Use this for non-blocking checks. The step shows as failed in the UI, but the job continues.

Timeout control. Steps can hang indefinitely (waiting for input, network issues). Set explicit timeouts:

- name: Integration tests
  run: make integration-test
  timeout-minutes: 15

The default job timeout is 360 minutes (6 hours). Always set a lower timeout for jobs that should complete quickly.

Secret Not Available#

Symptom: ${{ secrets.MY_SECRET }} resolves to an empty string.

Check the scope. Secrets can be scoped to the repository or to an environment. Environment secrets require the job to declare environment: <name>:

jobs:
  deploy:
    environment: production    # Required to access production secrets
    runs-on: ubuntu-latest
    steps:
      - run: echo "${{ secrets.PROD_API_KEY }}"

Without the environment key, only repository-level secrets are available.

GITHUB_TOKEN permissions. The default GITHUB_TOKEN has limited permissions. If a step fails with a 403 from the GitHub API, you likely need to expand permissions:

permissions:
  contents: read
  packages: write
  pull-requests: write
  issues: write

Set permissions at the workflow level or job level. Start restrictive and add as needed.

Fork PRs cannot access secrets. This is a security feature. If your CI needs secrets for tests (database credentials, API keys), those tests will fail on fork PRs. Options: mock external services, run secret-dependent tests only on the base repo’s branches, or use pull_request_target (with extreme caution, as it runs in the context of the base branch with access to secrets).

Cache Misses#

Symptom: build is slow because the cache never hits.

Check the cache key. The cache action requires an exact key match for a cache hit. restore-keys provide fallback prefix matching:

- uses: actions/cache@v4
  with:
    path: node_modules
    key: ${{ runner.os }}-node-${{ hashFiles('package-lock.json') }}
    restore-keys: |
      ${{ runner.os }}-node-

If package-lock.json changes, the exact key misses but the restore-keys prefix matches a previous cache. This restores a stale cache, and npm install only downloads the diff.

Cache eviction. GitHub evicts caches not accessed in 7 days. Infrequently-run workflows (nightly builds) may always miss. Caches are also scoped to branches – a cache created on main is accessible to feature branches, but not vice versa.

Cache size limit. Individual caches are limited to 10 GB. Total cache storage per repository is 10 GB, with LRU eviction. If you are caching too much, the oldest caches get evicted.

Inspect caches via the API:

gh api repos/{owner}/{repo}/actions/caches --jq '.actions_caches[] | "\(.key) \(.size_in_bytes) \(.last_accessed_at)"'

Runner Out of Disk#

GitHub-hosted runners have about 14 GB of free disk space. Large builds (Docker images, monorepos with many dependencies) can exhaust this.

Diagnose:

- name: Check disk space
  run: df -h
  if: always()

Free space by removing preinstalled software:

- name: Free disk space
  run: |
    sudo rm -rf /usr/share/dotnet
    sudo rm -rf /opt/ghc
    sudo rm -rf /usr/local/share/boost
    sudo docker system prune -af
    df -h

Prune artifacts and intermediate files between steps:

- name: Build Docker image
  run: docker build -t myapp .

- name: Clean build context
  run: rm -rf node_modules dist .next

For consistently large builds, use self-hosted runners with larger disks or split the build across multiple jobs.

Slow Builds#

Enable caching for everything. Dependencies, build artifacts, Docker layers:

- uses: actions/cache@v4
  with:
    path: |
      ~/.cache/go-build
      ~/go/pkg/mod
    key: ${{ runner.os }}-go-${{ hashFiles('**/go.sum') }}

Parallelize with matrix strategies and job splitting. Run unit tests, integration tests, and linting as separate jobs instead of sequential steps:

jobs:
  unit-test:
    runs-on: ubuntu-latest
    steps: [...]
  integration-test:
    runs-on: ubuntu-latest
    steps: [...]
  lint:
    runs-on: ubuntu-latest
    steps: [...]

Skip CI for documentation changes. Add [skip ci] or [ci skip] to the commit message, or use path filtering:

on:
  push:
    paths-ignore:
      - 'docs/**'
      - '*.md'
      - 'LICENSE'

Debugging with ACTIONS_STEP_DEBUG#

Enable verbose logging for all steps by setting the repository secret ACTIONS_STEP_DEBUG to true. This outputs detailed action internals, including input resolution, path setup, and cache operations.

You can also re-run a specific failed job with debug logging enabled from the GitHub UI: click “Re-run jobs” and check “Enable debug logging.” This avoids permanently enabling verbose logs.

Local Testing with act#

The act tool runs GitHub Actions workflows locally using Docker:

# Install
brew install act

# Run the default push event
act

# Run a specific job
act -j test

# Run with a specific event
act pull_request

# Pass secrets
act -s MY_SECRET=value

# Use a specific runner image
act -P ubuntu-latest=catthehacker/ubuntu:act-latest

act does not perfectly replicate GitHub-hosted runners. Services, OIDC, and some GitHub-specific features are unavailable. But for validating workflow syntax, step ordering, and basic logic, it catches mistakes before you push and wait for remote execution.

Reading Workflow Run Logs#

From the GitHub UI: click into a workflow run, expand a failed job, click on the failed step. The log shows stdout and stderr from the step.

From the CLI:

# List recent runs
gh run list --workflow=ci.yml

# View logs for a specific run
gh run view 12345678 --log

# View logs for a specific failed job
gh run view 12345678 --log-failed

# Download full log archive
gh run download 12345678 --name logs

The --log-failed flag is the most useful – it shows only the logs from failed steps, cutting through the noise of a long workflow run.

When reading logs, look for the exit code first (Process completed with exit code 1), then scroll up to find the actual error. GitHub Actions logs are verbose with setup output that obscures the real failure.