The Problem Self-Service Solves#

Developers need infrastructure: databases, caches, message queues, storage buckets, DNS records. In most organizations, getting these means filing a ticket, waiting days for someone to provision, and receiving credentials in a Slack DM. This bottleneck incentivizes workarounds — manual console provisioning, skipped security configs, everything crammed into shared databases.

Self-service infrastructure lets developers provision what they need directly, within guardrails the platform team defines. Choose a resource from a catalog, fill in parameters, and the system provisions it and returns connection details. No tickets, no waiting.

The challenge is providing the right level of abstraction with the right governance. Too little and developers need to understand networking and IAM. Too much and they cannot customize or debug.

Pattern 1: Terraform Modules as Service Catalog#

The most common starting point. The platform team writes opinionated Terraform modules that encapsulate infrastructure patterns. Developers consume these modules with minimal configuration.

A platform-provided module for an RDS instance:

module "api_database" {
  source  = "git::https://github.com/myorg/terraform-modules.git//rds-postgres?ref=v2.3.0"

  name        = "payment-api"
  environment = "staging"
  team        = "payments"

  # The module handles: VPC placement, security groups, subnet selection,
  # parameter groups, backup windows, monitoring, IAM roles, and
  # secret storage in AWS Secrets Manager.
}

output "connection_string" {
  value     = module.api_database.connection_string
  sensitive = true
}

The developer specifies the name, environment, and team. The module handles everything else according to organizational standards. The module pins to a version (ref=v2.3.0), so updates to the module do not break existing infrastructure until teams explicitly upgrade.

Provisioning workflow. Developers add the module, open a PR, and a CI pipeline runs terraform plan. On merge, the pipeline runs terraform apply. Atlantis automates this — atlantis plan on the PR triggers a plan, atlantis apply triggers the apply.

Governance. OPA or Sentinel policies validate the plan before apply:

# Deny RDS instances without encryption
deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_db_instance"
  not resource.change.after.storage_encrypted
  msg := sprintf("RDS instance %v must have encryption enabled", [resource.address])
}

# Deny instances larger than allowed for non-production
deny[msg] {
  resource := input.resource_changes[_]
  resource.type == "aws_db_instance"
  resource.change.after.tags.environment != "production"
  not allowed_dev_instance_class[resource.change.after.instance_class]
  msg := sprintf("RDS instance %v uses instance class %v which is not allowed in non-production", [resource.address, resource.change.after.instance_class])
}

allowed_dev_instance_class := {"db.t3.micro", "db.t3.small", "db.t3.medium"}

When this pattern works well: teams already use Terraform, infrastructure is in one or two cloud providers, and the module abstraction fits existing workflows.

When this pattern struggles: developers who do not know Terraform still need help writing configuration. The PR-based workflow adds latency. State management across many teams adds operational overhead.

Pattern 2: Crossplane Compositions#

Crossplane extends Kubernetes with custom resources that provision cloud infrastructure. Developers create Kubernetes manifests describing what they need, and Crossplane controllers reconcile those into real cloud resources.

A composition defines the abstraction — what developers request and what gets created:

apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
metadata:
  name: xpostgresqlinstances.database.myorg.io
spec:
  group: database.myorg.io
  names:
    kind: XPostgreSQLInstance
    plural: xpostgresqlinstances
  claimNames:
    kind: PostgreSQLInstance
    plural: postgresqlinstances
  versions:
    - name: v1alpha1
      served: true
      referenceable: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                parameters:
                  type: object
                  properties:
                    size:
                      type: string
                      enum: ["small", "medium", "large"]
                    version:
                      type: string
                      default: "15"
                  required:
                    - size

The composition maps the abstract claim to concrete cloud resources:

apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
  name: postgresql-aws
spec:
  compositeTypeRef:
    apiVersion: database.myorg.io/v1alpha1
    kind: XPostgreSQLInstance
  resources:
    - name: rds-instance
      base:
        apiVersion: rds.aws.upbound.io/v1beta1
        kind: Instance
        spec:
          forProvider:
            engine: postgres
            engineVersion: "15"
            storageEncrypted: true
            autoMinorVersionUpgrade: true
            backupRetentionPeriod: 7
      patches:
        - type: FromCompositeFieldPath
          fromFieldPath: spec.parameters.size
          toFieldPath: spec.forProvider.instanceClass
          transforms:
            - type: map
              map:
                small: db.t3.small
                medium: db.r6g.large
                large: db.r6g.xlarge
    - name: secret
      base:
        apiVersion: kubernetes.crossplane.io/v1alpha1
        kind: Object
        spec:
          forProvider:
            manifest:
              apiVersion: v1
              kind: Secret
              metadata:
                namespace: ""  # patched from claim namespace

A developer requests a database with a simple claim:

apiVersion: database.myorg.io/v1alpha1
kind: PostgreSQLInstance
metadata:
  name: payment-db
  namespace: payments
spec:
  parameters:
    size: medium
    version: "15"

This is pure Kubernetes. The developer uses kubectl apply, the resource appears in kubectl get postgresqlinstances, and the connection secret lands in their namespace. No Terraform knowledge required.

When this pattern works well: Kubernetes-native organizations where teams are comfortable with kubectl. Crossplane’s reconciliation loop automatically corrects drift — if someone deletes a security group manually, Crossplane recreates it.

When this pattern struggles: steep learning curve for the platform team. Debugging failed compositions requires understanding three layers (claim, composite, managed resource). Teams not using Kubernetes gain little from this approach.

Pattern 3: Backstage Templates with Backend Automation#

Backstage scaffolder templates provide a UI-driven request flow that triggers backend automation. The developer fills a form, and the template executes actions — writing Terraform files, opening a PR, triggering a pipeline. The developer never touches HCL or YAML directly.

apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: request-database
  title: Request a PostgreSQL Database
spec:
  parameters:
    - title: Database Configuration
      properties:
        serviceName:
          title: Service Name
          type: string
        environment:
          title: Environment
          type: string
          enum: ["development", "staging", "production"]
        size:
          title: Size
          type: string
          enum: ["small", "medium", "large"]
          enumNames: ["Small (2 vCPU, 4GB)", "Medium (4 vCPU, 16GB)", "Large (8 vCPU, 32GB)"]
  steps:
    - id: create-terraform
      action: fetch:template
      input:
        url: ./terraform-skeleton
        targetPath: infrastructure/${{ parameters.serviceName }}-db
        values:
          serviceName: ${{ parameters.serviceName }}
          environment: ${{ parameters.environment }}
          size: ${{ parameters.size }}
    - id: open-pr
      action: publish:github:pull-request
      input:
        repoUrl: github.com?owner=myorg&repo=infrastructure
        branchName: provision/${{ parameters.serviceName }}-db
        title: "Provision PostgreSQL for ${{ parameters.serviceName }}"
        description: "Automated request via Backstage"

When this pattern works well: you want the lowest barrier to entry and already have backend provisioning (Terraform, Crossplane). Backstage templates are the front door.

When this pattern struggles: it is a UI, not a provisioning system. If the Terraform apply fails, the developer gets a failed PR with no clear resolution path.

Pattern 4: GitOps-Based Provisioning#

In a full GitOps model, infrastructure is provisioned by committing manifests to a Git repository. ArgoCD or Flux watches the repository and reconciles changes, creating a declarative, audit-trailed provisioning flow.

Developer commits claim YAML
  -> Git repository (source of truth)
    -> ArgoCD detects change
      -> Applies to Kubernetes cluster
        -> Crossplane provisions cloud resource
          -> Connection secret appears in namespace

For Terraform-based GitOps, the pattern uses a controller like the Terraform Controller for Flux or tf-controller:

apiVersion: infra.contrib.fluxcd.io/v1alpha2
kind: Terraform
metadata:
  name: payment-db
  namespace: flux-system
spec:
  path: ./infrastructure/payment-db
  sourceRef:
    kind: GitRepository
    name: infrastructure
  interval: 1h
  approvePlan: auto   # or "manual" for production
  writeOutputsToSecret:
    name: payment-db-credentials
    namespace: payments

When this pattern works well: the organization already uses GitOps for applications and wants infrastructure to follow the same model. Every change is a Git commit with a reviewable audit trail.

When this pattern struggles: slower feedback loop than direct CLI. Debugging requires checking the commit, sync status, controller logs, and cloud API responses.

Choosing the Right Pattern#

Factor Terraform Modules Crossplane Backstage Templates GitOps
Developer skill required Terraform knowledge kubectl/YAML None (form-based) Git basics
Platform team skill Terraform, CI/CD Kubernetes, Crossplane TypeScript, backends GitOps controllers
Drift correction Manual or detect-only Automatic reconciliation Depends on backend Depends on backend
Governance OPA/Sentinel on plan Kubernetes admission Baked into template PR review + policy
Time to first value Weeks (if Terraform exists) Months Weeks (wraps existing) Months
Best for Cloud-centric, Terraform shops K8s-native organizations Any (UI layer) GitOps-committed orgs

These patterns combine well: Crossplane or Terraform for provisioning, Backstage for the UI, GitOps for delivery. Start with the pattern closest to your existing workflow and add layers as the platform matures.

Governance and Approval Workflows#

Regardless of which provisioning pattern you choose, governance controls are needed. The challenge is applying the right level of control without recreating the ticket queue that self-service was supposed to eliminate.

Structure approvals in three tiers. Auto-approve low-risk actions: non-production resources, small instance sizes, pre-approved resource types under a cost ceiling. Lightweight review for medium-risk: production databases, public-facing resources, IAM changes — require a single PR review. Full review for high-risk: cross-account access, regulated environments, high-cost resources — require platform team and security sign-off. Keep full reviews to under 10% of requests.

Codify the tier logic in OPA so humans do not decide which tier applies:

auto_approved(resource) {
  resource.environment != "production"
  resource.estimated_monthly_cost <= 500
  allowed_resource_type[resource.type]
}

Target 70-80% of requests auto-approved and provisioned in minutes. The rest should complete within hours. If any request takes longer than a day, developers will find workarounds and you lose the governance self-service was supposed to enforce.