The Cost of Not Having Self-Service#

A developer needs a PostgreSQL database. They file a ticket. It sits in a backlog for two days. A DBA provisions it, sends credentials via Slack DM. Elapsed time: 3 days. Actual need: 5 minutes of configuration. Multiply across every database, cache, queue, and namespace, and manual provisioning becomes the single largest drag on velocity. Self-service lets developers provision pre-approved resources directly, within guardrails the platform team defines.

Infrastructure Request Automation#

The core pattern: developer declares what they want, automation provisions it, credentials are delivered programmatically. Three approaches dominate:

GitOps-driven: Developer opens a PR adding a resource definition. CI validates against policies. On merge, ArgoCD syncs and Crossplane provisions the infrastructure.

Backstage scaffolder: Developer fills a form, scaffolder generates the resource definition and commits to GitOps. Same provisioning backend, UI-guided frontend.

API-driven: Developer calls a platform API (REST or CLI). Works well for programmatic consumers like CI pipelines.

All three converge on declarative resource definitions reconciled by a controller.

Backstage Scaffolder for Self-Service#

The Backstage scaffolder turns self-service requests into multi-step workflows. A scaffolder template for provisioning a Redis cache:

apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: redis-cache
  title: Provision Redis Cache
  description: Self-service Redis cache with automatic credential injection
spec:
  owner: platform-team
  type: resource
  parameters:
    - title: Cache Configuration
      required: [name, owner, environment, size]
      properties:
        name:
          type: string
          pattern: '^[a-z][a-z0-9-]{2,24}$'
          description: Cache instance name
        owner:
          type: string
          ui:field: OwnerPicker
        environment:
          type: string
          enum: [development, staging, production]
        size:
          type: string
          enum: [small, medium, large]
          enumNames: ['Small (1GB)', 'Medium (4GB)', 'Large (16GB)']
  steps:
    - id: generate
      name: Generate Crossplane Claim
      action: fetch:template
      input:
        url: ./skeleton
        targetPath: infrastructure/redis/${{ parameters.name }}
        values:
          name: ${{ parameters.name }}
          owner: ${{ parameters.owner }}
          environment: ${{ parameters.environment }}
          size: ${{ parameters.size }}
    - id: pr
      name: Create Pull Request
      action: publish:github:pull-request
      input:
        repoUrl: github.com?owner=myorg&repo=infrastructure
        branchName: provision-redis-${{ parameters.name }}
        title: 'Provision Redis cache: ${{ parameters.name }}'
        description: |
          Self-service Redis provisioning for ${{ parameters.owner }}.
          Size: ${{ parameters.size }}, Environment: ${{ parameters.environment }}
    - id: register
      name: Register in Catalog
      action: catalog:register
      input:
        catalogInfoUrl: https://github.com/myorg/infrastructure/blob/main/infrastructure/redis/${{ parameters.name }}/catalog-info.yaml

The skeleton directory contains the Crossplane Claim template and a catalog-info.yaml for the resource. The PR is auto-approved by CI if policy checks pass (more on this below).

Crossplane Claims for Resource Provisioning#

Crossplane separates the developer-facing API (Claim) from the infrastructure-specific implementation (Composition). Developers interact only with Claims:

apiVersion: cache.platform.example.com/v1alpha1
kind: RedisInstance
metadata:
  name: session-cache
  namespace: team-identity
spec:
  parameters:
    size: medium
    version: "7"
    highAvailability: true
  compositionSelector:
    matchLabels:
      provider: aws
      environment: production
  writeConnectionSecretToRef:
    name: session-cache-credentials

The platform team maintains Compositions that map these claims to provider-specific resources:

apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
  name: redis-aws-production
  labels:
    provider: aws
    environment: production
spec:
  compositeTypeRef:
    apiVersion: cache.platform.example.com/v1alpha1
    kind: XRedisInstance
  resources:
    - name: elasticache
      base:
        apiVersion: elasticache.aws.upbound.io/v1beta1
        kind: ReplicationGroup
        spec:
          forProvider:
            automaticFailoverEnabled: true
            engine: redis
            engineVersion: "7.0"
            nodeType: cache.r7g.large
            numCacheClusters: 3
            atRestEncryptionEnabled: true
            transitEncryptionEnabled: true

Developers never see the Composition. They interact with size, version, and highAvailability. The platform team controls instance types, encryption, and networking inside the Composition.

Self-Service Databases, Queues, and Caches#

A complete self-service resource catalog:

Resource Claim API Backend Credential Delivery
PostgreSQL PostgreSQLInstance RDS via Crossplane K8s Secret via ExternalSecrets
Redis RedisInstance ElastiCache via Crossplane K8s Secret via ExternalSecrets
RabbitMQ MessageQueue CloudAMQP or RabbitMQ Operator K8s Secret directly
S3 Bucket ObjectStore S3 via Crossplane IRSA (IAM Roles for Service Accounts)
Kafka Topic EventStream MSK via Crossplane or Strimzi K8s Secret + ACLs

Every resource type follows the same pattern: developer creates a Claim, the Composition provisions infrastructure, credentials are injected into the namespace as a Kubernetes Secret.

Guardrails Without Gates#

Guardrails enforce standards without blocking developers behind approval queues. The distinction: a gate requires a human to say yes. A guardrail automatically rejects non-compliant requests and tells the developer why, so they can fix and re-submit immediately.

Policy-as-code with OPA/Gatekeeper or Kyverno:

# Kyverno policy: enforce resource limits
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-resource-limits
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-limits
      match:
        any:
          - resources:
              kinds: [Deployment, StatefulSet]
      validate:
        message: "CPU and memory limits are required"
        pattern:
          spec:
            template:
              spec:
                containers:
                  - resources:
                      limits:
                        memory: "?*"
                        cpu: "?*"

Size-based guardrails: Crossplane Compositions validate parameters. A size: xlarge request is rejected at the Claim level: “Maximum allowed size is large. Contact platform-team for exceptions.”

Cost guardrails: Tag resources with team identifiers. Set per-team budgets. Alert when spend approaches the threshold — visibility and accountability without blocking.

Approval-Free Workflows#

The goal is to eliminate human approvals for standard operations. Here is what makes this safe:

  1. Pre-approved resource definitions: The platform team pre-validates every option in the Claim API. If size: medium maps to a specific, vetted instance type, no approval is needed because the platform team already approved the configuration.

  2. Policy enforcement in CI: PRs to the infrastructure repository are validated by OPA/Conftest before merge. Passing policy checks replaces human review for standard requests.

  3. Auto-merge for policy-passing PRs: GitHub Actions can auto-merge PRs that pass all policy checks and were generated by the scaffolder:

- name: Auto-merge if policy passes
  if: github.actor == 'backstage-bot' && steps.policy.outcome == 'success'
  run: gh pr merge --auto --squash "${{ github.event.pull_request.number }}"
  env:
    GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
  1. Exception path for non-standard requests: Anything outside the pre-approved parameters (custom instance types, cross-account networking, compliance-sensitive resources) routes to a human review queue. This is the only path that requires approval.

The result: 90%+ of infrastructure requests provisioned in minutes with zero human involvement. The remaining non-standard requests get human review — where the platform team’s expertise is actually needed. If manual review exceeds 20%, your self-service catalog is missing common use cases.