The Cost of Not Having Self-Service#
A developer needs a PostgreSQL database. They file a ticket. It sits in a backlog for two days. A DBA provisions it, sends credentials via Slack DM. Elapsed time: 3 days. Actual need: 5 minutes of configuration. Multiply across every database, cache, queue, and namespace, and manual provisioning becomes the single largest drag on velocity. Self-service lets developers provision pre-approved resources directly, within guardrails the platform team defines.
Infrastructure Request Automation#
The core pattern: developer declares what they want, automation provisions it, credentials are delivered programmatically. Three approaches dominate:
GitOps-driven: Developer opens a PR adding a resource definition. CI validates against policies. On merge, ArgoCD syncs and Crossplane provisions the infrastructure.
Backstage scaffolder: Developer fills a form, scaffolder generates the resource definition and commits to GitOps. Same provisioning backend, UI-guided frontend.
API-driven: Developer calls a platform API (REST or CLI). Works well for programmatic consumers like CI pipelines.
All three converge on declarative resource definitions reconciled by a controller.
Backstage Scaffolder for Self-Service#
The Backstage scaffolder turns self-service requests into multi-step workflows. A scaffolder template for provisioning a Redis cache:
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: redis-cache
title: Provision Redis Cache
description: Self-service Redis cache with automatic credential injection
spec:
owner: platform-team
type: resource
parameters:
- title: Cache Configuration
required: [name, owner, environment, size]
properties:
name:
type: string
pattern: '^[a-z][a-z0-9-]{2,24}$'
description: Cache instance name
owner:
type: string
ui:field: OwnerPicker
environment:
type: string
enum: [development, staging, production]
size:
type: string
enum: [small, medium, large]
enumNames: ['Small (1GB)', 'Medium (4GB)', 'Large (16GB)']
steps:
- id: generate
name: Generate Crossplane Claim
action: fetch:template
input:
url: ./skeleton
targetPath: infrastructure/redis/${{ parameters.name }}
values:
name: ${{ parameters.name }}
owner: ${{ parameters.owner }}
environment: ${{ parameters.environment }}
size: ${{ parameters.size }}
- id: pr
name: Create Pull Request
action: publish:github:pull-request
input:
repoUrl: github.com?owner=myorg&repo=infrastructure
branchName: provision-redis-${{ parameters.name }}
title: 'Provision Redis cache: ${{ parameters.name }}'
description: |
Self-service Redis provisioning for ${{ parameters.owner }}.
Size: ${{ parameters.size }}, Environment: ${{ parameters.environment }}
- id: register
name: Register in Catalog
action: catalog:register
input:
catalogInfoUrl: https://github.com/myorg/infrastructure/blob/main/infrastructure/redis/${{ parameters.name }}/catalog-info.yamlThe skeleton directory contains the Crossplane Claim template and a catalog-info.yaml for the resource. The PR is auto-approved by CI if policy checks pass (more on this below).
Crossplane Claims for Resource Provisioning#
Crossplane separates the developer-facing API (Claim) from the infrastructure-specific implementation (Composition). Developers interact only with Claims:
apiVersion: cache.platform.example.com/v1alpha1
kind: RedisInstance
metadata:
name: session-cache
namespace: team-identity
spec:
parameters:
size: medium
version: "7"
highAvailability: true
compositionSelector:
matchLabels:
provider: aws
environment: production
writeConnectionSecretToRef:
name: session-cache-credentialsThe platform team maintains Compositions that map these claims to provider-specific resources:
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
name: redis-aws-production
labels:
provider: aws
environment: production
spec:
compositeTypeRef:
apiVersion: cache.platform.example.com/v1alpha1
kind: XRedisInstance
resources:
- name: elasticache
base:
apiVersion: elasticache.aws.upbound.io/v1beta1
kind: ReplicationGroup
spec:
forProvider:
automaticFailoverEnabled: true
engine: redis
engineVersion: "7.0"
nodeType: cache.r7g.large
numCacheClusters: 3
atRestEncryptionEnabled: true
transitEncryptionEnabled: trueDevelopers never see the Composition. They interact with size, version, and highAvailability. The platform team controls instance types, encryption, and networking inside the Composition.
Self-Service Databases, Queues, and Caches#
A complete self-service resource catalog:
| Resource | Claim API | Backend | Credential Delivery |
|---|---|---|---|
| PostgreSQL | PostgreSQLInstance |
RDS via Crossplane | K8s Secret via ExternalSecrets |
| Redis | RedisInstance |
ElastiCache via Crossplane | K8s Secret via ExternalSecrets |
| RabbitMQ | MessageQueue |
CloudAMQP or RabbitMQ Operator | K8s Secret directly |
| S3 Bucket | ObjectStore |
S3 via Crossplane | IRSA (IAM Roles for Service Accounts) |
| Kafka Topic | EventStream |
MSK via Crossplane or Strimzi | K8s Secret + ACLs |
Every resource type follows the same pattern: developer creates a Claim, the Composition provisions infrastructure, credentials are injected into the namespace as a Kubernetes Secret.
Guardrails Without Gates#
Guardrails enforce standards without blocking developers behind approval queues. The distinction: a gate requires a human to say yes. A guardrail automatically rejects non-compliant requests and tells the developer why, so they can fix and re-submit immediately.
Policy-as-code with OPA/Gatekeeper or Kyverno:
# Kyverno policy: enforce resource limits
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-resource-limits
spec:
validationFailureAction: Enforce
rules:
- name: check-limits
match:
any:
- resources:
kinds: [Deployment, StatefulSet]
validate:
message: "CPU and memory limits are required"
pattern:
spec:
template:
spec:
containers:
- resources:
limits:
memory: "?*"
cpu: "?*"Size-based guardrails: Crossplane Compositions validate parameters. A size: xlarge request is rejected at the Claim level: “Maximum allowed size is large. Contact platform-team for exceptions.”
Cost guardrails: Tag resources with team identifiers. Set per-team budgets. Alert when spend approaches the threshold — visibility and accountability without blocking.
Approval-Free Workflows#
The goal is to eliminate human approvals for standard operations. Here is what makes this safe:
-
Pre-approved resource definitions: The platform team pre-validates every option in the Claim API. If
size: mediummaps to a specific, vetted instance type, no approval is needed because the platform team already approved the configuration. -
Policy enforcement in CI: PRs to the infrastructure repository are validated by OPA/Conftest before merge. Passing policy checks replaces human review for standard requests.
-
Auto-merge for policy-passing PRs: GitHub Actions can auto-merge PRs that pass all policy checks and were generated by the scaffolder:
- name: Auto-merge if policy passes
if: github.actor == 'backstage-bot' && steps.policy.outcome == 'success'
run: gh pr merge --auto --squash "${{ github.event.pull_request.number }}"
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}- Exception path for non-standard requests: Anything outside the pre-approved parameters (custom instance types, cross-account networking, compliance-sensitive resources) routes to a human review queue. This is the only path that requires approval.
The result: 90%+ of infrastructure requests provisioned in minutes with zero human involvement. The remaining non-standard requests get human review — where the platform team’s expertise is actually needed. If manual review exceeds 20%, your self-service catalog is missing common use cases.