Self-Hosted CI Runners at Scale#

GitHub-hosted and GitLab SaaS runners work until they do not. You hit limits when you need private network access to deploy to internal infrastructure, specific hardware like GPUs or ARM64 machines, compliance requirements that prohibit running code on shared infrastructure, or cost control when you are burning thousands of dollars per month on hosted runner minutes.

Self-hosted runners solve these problems but introduce operational complexity: you now own runner provisioning, scaling, security, image updates, and cost management.

GitHub Actions: actions-runner-controller (ARC)#

The actions-runner-controller (ARC) is the official way to run GitHub Actions runners on Kubernetes. It deploys runners as pods that register with GitHub, pick up jobs, and terminate when finished.

Installation#

Install ARC using Helm:

helm install arc \
  --namespace arc-systems \
  --create-namespace \
  oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set-controller

helm install arc-runner-set \
  --namespace arc-runners \
  --create-namespace \
  --set githubConfigUrl="https://github.com/myorg" \
  --set githubConfigSecret.github_token="ghp_xxxxxxxxxxxx" \
  oci://ghcr.io/actions/actions-runner-controller-charts/gha-runner-scale-set

For production, use a GitHub App instead of a PAT. Create a GitHub App with the required permissions (Organization: Self-hosted runners: Read & Write), install it on your organization, and reference the app credentials:

# values.yaml for arc-runner-set
githubConfigUrl: "https://github.com/myorg"
githubConfigSecret:
  github_app_id: "12345"
  github_app_installation_id: "67890"
  github_app_private_key: |
    -----BEGIN RSA PRIVATE KEY-----
    ...
    -----END RSA PRIVATE KEY-----
maxRunners: 20
minRunners: 1

Autoscaling#

ARC scales runners based on webhook events from GitHub. When a workflow requests a runner with matching labels, ARC creates a pod. When the job finishes, the pod terminates. This is pull-based scaling driven by actual demand.

Configure scaling bounds in your runner scale set:

# values.yaml
maxRunners: 50
minRunners: 2

containerMode:
  type: "dind"  # Docker-in-Docker for container builds

template:
  spec:
    containers:
      - name: runner
        image: ghcr.io/actions/actions-runner:latest
        resources:
          requests:
            cpu: "2"
            memory: "4Gi"
          limits:
            cpu: "4"
            memory: "8Gi"

Setting minRunners: 2 keeps warm runners available so jobs do not wait for pod startup (typically 15-30 seconds). Setting maxRunners: 50 prevents runaway scaling from consuming your entire cluster during a burst of workflow triggers.

Custom Runner Images#

The default runner image is minimal. Production runners need build tools, language runtimes, and cached dependencies:

FROM ghcr.io/actions/actions-runner:latest

# Build essentials
RUN sudo apt-get update && sudo apt-get install -y \
    build-essential \
    curl \
    git \
    jq \
    unzip \
    && sudo rm -rf /var/lib/apt/lists/*

# Go
RUN curl -fsSL https://go.dev/dl/go1.23.0.linux-amd64.tar.gz | sudo tar -C /usr/local -xzf -
ENV PATH="/usr/local/go/bin:${PATH}"

# Node.js
RUN curl -fsSL https://deb.nodesource.com/setup_20.x | sudo bash - \
    && sudo apt-get install -y nodejs

# kubectl
RUN curl -fsSL https://dl.k8s.io/release/v1.30.0/bin/linux/amd64/kubectl -o /usr/local/bin/kubectl \
    && sudo chmod +x /usr/local/bin/kubectl

Build and push this image in a separate CI pipeline. Pin the ARC runner scale set to your custom image. Update the image weekly to pick up security patches, and trigger a rebuild whenever your dependency requirements change.

GitLab Runner on Kubernetes#

GitLab Runner uses the Kubernetes executor to create a pod for each CI job. Install it with Helm:

helm install gitlab-runner gitlab/gitlab-runner \
  --namespace gitlab-runners \
  --create-namespace \
  --set gitlabUrl=https://gitlab.com \
  --set runnerToken="glrt-xxxxxxxxxxxx" \
  --set runners.executor=kubernetes

Kubernetes Executor Configuration#

The Kubernetes executor creates a pod with two containers: a helper container that handles git clone, artifact upload/download, and cache operations, and a build container running the image specified in .gitlab-ci.yml:

# values.yaml
runners:
  executor: kubernetes
  config: |
    [[runners]]
      [runners.kubernetes]
        namespace = "gitlab-runners"
        image = "ubuntu:22.04"
        privileged = false
        cpu_request = "1"
        cpu_limit = "4"
        memory_request = "2Gi"
        memory_limit = "8Gi"
        service_cpu_request = "500m"
        service_memory_request = "512Mi"
        poll_timeout = 600
        [runners.kubernetes.pod_security_context]
          run_as_non_root = true
          run_as_user = 1000
        [runners.kubernetes.affinity]
          [runners.kubernetes.affinity.node_affinity]
            [runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution]
              [[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms]]
                [[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms.match_expressions]]
                  key = "node-role"
                  operator = "In"
                  values = ["ci"]

GitLab Runner Autoscaling#

GitLab Runner does not autoscale the runner manager itself – you run a fixed number of runner manager pods, and each manager creates build pods on demand up to a concurrency limit:

concurrent: 30  # Max parallel jobs across all runners

runners:
  config: |
    [[runners]]
      limit = 20
      request_concurrency = 10

Use the Kubernetes Horizontal Pod Autoscaler on the runner manager deployment if you need to scale the managers themselves, though in practice a single manager with limit = 50 handles most workloads.

Ephemeral vs. Persistent Runners#

Ephemeral runners start fresh for every job and terminate when the job completes. ARC runs ephemeral runners by default. Benefits: no state leakage between jobs, no disk space accumulation, clean security posture. Drawback: no warm caches, which adds 30-120 seconds of dependency download time per job.

Persistent runners survive across multiple jobs. They retain filesystem state including dependency caches, build caches, and Docker layer caches. Benefits: faster builds due to warm caches. Drawbacks: state leakage between jobs creates reproducibility issues, disk fills up over time, security risk from one job leaving artifacts (or malware) for the next.

Recommendation: Use ephemeral runners for all workflows. Compensate for the cache penalty with external caching (actions/cache, GitLab cache, or a shared PVC mounted read-only). The security and reproducibility benefits outweigh the speed cost. If cache restore time is truly unacceptable, use persistent runners only for trusted internal repositories and implement automatic runner recycling (terminate and recreate every N jobs or every 24 hours).

Security Isolation#

Self-hosted runners execute arbitrary code from your repositories. For organizations with multiple teams or open-source projects, isolation is critical.

Never use self-hosted runners for public repositories. Anyone who opens a pull request can run code on your infrastructure. This is a remote code execution vector with no practical mitigation.

Namespace isolation: Run runners in a dedicated Kubernetes namespace with strict NetworkPolicy:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: runner-isolation
  namespace: arc-runners
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress
  egress:
    - to: []  # Allow all egress (runners need internet)
      ports:
        - port: 443
          protocol: TCP
        - port: 80
          protocol: TCP
  ingress: []  # Deny all ingress

Pod security: Run build containers as non-root, drop all capabilities, and use read-only root filesystems where possible. For Docker-in-Docker builds, use rootless DinD or Kaniko to avoid privileged containers.

Secret isolation: Use runner groups (GitHub) or tags (GitLab) to restrict which repositories can use which runners. Production deployment runners should only be available to specific repositories, not the entire organization.

Cost Modeling#

Compare self-hosted runner costs against hosted runner pricing to determine your break-even point.

GitHub Actions hosted pricing: $0.008/min for Linux, $0.016/min for Windows, $0.08/min for macOS.

Self-hosted cost components:

Compute: node cost for runner pods (EC2, GKE nodes, on-prem hardware)
Storage: PVCs for caches, container image storage
Networking: egress for dependency downloads, artifact transfers
Operations: engineering time for maintenance, upgrades, troubleshooting

Break-even calculation example:

Hosted cost:
  50,000 minutes/month * $0.008/min = $400/month

Self-hosted cost:
  3x m5.xlarge spot instances ($0.07/hr) = 3 * $0.07 * 730 = $153/month
  EBS storage (100 GB): $10/month
  Data transfer: ~$20/month
  Engineering time (4 hrs/month * $100/hr): $400/month
  Total: $583/month

At low volumes, hosted runners are cheaper because you pay zero for engineering time. Self-hosted becomes cost-effective above roughly 100,000 minutes per month, or immediately if you need capabilities that hosted runners cannot provide (private network, specific hardware, compliance). The engineering time component dominates for small teams – do not self-host runners just to save money unless your volume justifies the operational overhead.

Operational Best Practices#

Monitor runner utilization. Track queue time (how long jobs wait for a runner), runner CPU/memory utilization, and job duration. High queue times mean you need more runners or faster job execution. Low utilization means you are over-provisioned.

Automate image updates. Build runner images in CI, scan them for vulnerabilities, and roll them out automatically. Stale runner images accumulate CVEs and missing tools.

Set resource limits. Without CPU and memory limits, a single runaway build can starve other jobs. Set requests and limits on every runner pod.

Label runners by capability. Use labels like linux-arm64, gpu, deploy-prod to match jobs to appropriate runners. Do not create a single pool of runners that must have every tool installed.