Service Catalog Management and Design

Why a Service Catalog Exists#

A service catalog answers: “What do we have, who owns it, and what state is it in?” Without one, this information lives in tribal knowledge and stale wiki pages. When an incident hits at 3 AM, the on-call engineer needs to know who owns the failing service, what it depends on, and where to find the runbook. The catalog provides this in seconds.

The catalog is also the foundation for other platform capabilities. Golden paths register outputs in it. Scorecards evaluate catalog entities. Self-service workflows provision resources linked to catalog entries.

Backstage Catalog Model#

Backstage organizes everything into entities defined by a kind and a type. The core entity kinds:

Component: A piece of software — service, library, website. Each has a type, owner, lifecycle (experimental, production, deprecated), and links to source code, CI/CD, docs, and API definitions.

API: An interface exposed by a component. Specs can reference OpenAPI, AsyncAPI, gRPC protobuf, or GraphQL schemas. First-class entities because they represent contracts between teams.

Resource: Infrastructure a component depends on — databases, caches, queues, S3 buckets.

System: A logical grouping of components and resources providing a business capability. The “orders system” includes the orders API, worker, database, and event stream.

Domain: Top-level business grouping (“Commerce,” “Payments,” “Identity”). Domains contain systems.

Group and User: Teams and individuals that own entities, typically synced from your identity provider.

The hierarchy flows: Domain > System > Component/API/Resource, mapping to domain-driven design concepts.

The catalog-info.yaml File#

Every entity is defined by a YAML descriptor, conventionally placed at the repository root:

apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: order-service
  description: Handles order creation, updates, and fulfillment tracking
  annotations:
    github.com/project-slug: myorg/order-service
    backstage.io/techdocs-ref: dir:.
    argocd/app-name: order-service
    pagerduty.com/service-id: P1234ABC
  tags:
    - go
    - grpc
    - postgresql
  links:
    - url: https://grafana.internal/d/order-service
      title: Grafana Dashboard
      icon: dashboard
spec:
  type: service
  lifecycle: production
  owner: team-commerce
  system: orders
  providesApis:
    - order-api
  consumesApis:
    - inventory-api
    - payment-api
  dependsOn:
    - resource:orders-db
    - resource:orders-cache

Annotations drive plugin behavior: github.com/project-slug shows PRs and CI status, argocd/app-name shows deployment status, pagerduty.com/service-id shows on-call info.

An API entity alongside the component:

apiVersion: backstage.io/v1alpha1
kind: API
metadata:
  name: order-api
  description: REST API for order management
spec:
  type: openapi
  lifecycle: production
  owner: team-commerce
  system: orders
  definition:
    $text: ./api/openapi.yaml

The $text substitution pulls the OpenAPI spec from a file in the same repository. Backstage renders it as interactive API documentation.

Auto-Discovery#

Manually registering every repository is impractical at scale. Backstage supports several discovery mechanisms:

GitHub discovery scans organizations for catalog-info.yaml files:

# app-config.yaml
catalog:
  providers:
    github:
      myorg:
        organization: myorg
        catalogPath: /catalog-info.yaml
        filters:
          repository: '.*'
        schedule:
          frequency: { minutes: 30 }
          timeout: { minutes: 3 }

This scans every repository in myorg every 30 minutes and registers entities found in catalog-info.yaml. GitLab discovery works similarly. Kubernetes discovery can auto-register workloads but produces lower-quality entries since K8s manifests lack semantic metadata.

Recommended approach: require catalog-info.yaml in every repository via golden paths, then use discovery to auto-register. Repositories without one show up as “unregistered” in governance dashboards.

Scorecards and Maturity Tracking#

Scorecards evaluate catalog entities against a set of standards and show a maturity score. They answer “how production-ready is this service?” with concrete checks rather than opinions.

Example scorecard criteria for a production service:

Check	Criteria	Points
Has owner	`spec.owner` is set and maps to an active team	10
Has description	`metadata.description` is non-empty and > 20 characters	5
TechDocs exist	`backstage.io/techdocs-ref` annotation present, docs build successfully	10
CI pipeline passes	Last GitHub Actions run on main is green	10
Has on-call	PagerDuty service linked and has an escalation policy	15
API spec defined	`spec.providesApis` is non-empty with valid API entities	10
Runs on Kubernetes	ArgoCD annotation present, app is synced and healthy	10
Has resource limits	Kubernetes Deployment has CPU and memory limits set	10
Dependency tracking	`spec.dependsOn` lists all consumed resources	10
Recent deploy	Last deployment was within 30 days	10

Tools like Spotify’s Soundcheck, Cortex, and OpsLevel implement scorecards. You can also build custom scorecards with a cron job evaluating checks against the catalog API.

Maturity levels derived from scores:

Bronze (0-40 points): Basic registration only. Missing critical production readiness criteria.
Silver (41-70 points): Has ownership, documentation, and CI. Missing some operational maturity.
Gold (71-90 points): Production-ready. Has on-call, monitoring, and API definitions.
Platinum (91-100 points): Fully mature. All checks pass.

Display maturity levels prominently in the catalog UI. Teams naturally compete to improve their scores when the data is visible.

Ownership Enforcement#

Ownership is the single most important catalog field. Without clear ownership, incidents escalate slowly, tech debt accumulates invisibly, and services become orphans.

Enforcement strategies:

Block unowned entities: A CI check on catalog-info.yaml changes rejects any entity where spec.owner is empty or refers to a non-existent group. This is a hard gate.

Orphan detection: Weekly report listing entities whose owning team has been dissolved or has zero members. These need re-assignment.

Ownership transfer: When a team is reorganized, run a script identifying all entities owned by the old team and open PRs to transfer ownership. This is not optional.

Ownership validation in Backstage:

catalog:
  rules:
    - allow: [Component, API, Resource, System]
  processors:
    - type: owner-validator
      config:
        requireOwner: true
        allowedOwnerKinds: [Group]

Tech Debt Visibility#

The catalog surfaces tech debt where engineers already look. Concrete approaches:

Deprecation tracking: Set spec.lifecycle: deprecated on components that should be migrated away from. Track how many services still consume deprecated APIs.

Dependency age: Annotate components with framework/runtime versions. “47 services on Go 1.20 (EOL), 12 on Go 1.22 (current)” makes tech debt visible at the portfolio level.

Security findings: Integrate Snyk, Trivy, or Dependabot results into catalog entity pages alongside deployment status and on-call info.

Custom tech debt tags: Let teams tag entities with labels (needs-migration, legacy-auth, manual-deploy). Aggregate into a dashboard by team and severity to create organizational pressure without mandating timelines.