The Measurement Problem#
Measuring developer experience wrong is worse than not measuring at all. Lines of code, commit counts, and story points per sprint all create perverse incentives — developers game what gets measured. Good metrics measure outcomes (how fast does code reach production?) and perceptions (do developers feel productive?) without punishing individuals.
The goal is identifying systemic friction in tools, processes, and the platform. Never to evaluate individual developers.
DORA Metrics#
The DORA (DevOps Research and Assessment) metrics are the most widely adopted engineering performance indicators, emerging from research into what separates high-performing organizations.
Deployment Frequency. How often you deploy to production. High performers deploy multiple times per day. This reflects batch size — smaller deployments carry less risk. Count production deployment events from your CI/CD system, excluding config-only deploys.
Lead Time for Changes. Time from code commit to running in production. Captures review wait time, CI build time, staging validation, and deployment automation. For each deployment, find the oldest new commit and measure the gap.
Mean Time to Recover (MTTR). Time from incident detection to service restoration — not root cause resolution. Measure from alert firing to SLO recovery using your incident management system.
Change Failure Rate. Percentage of deployments causing degradation — rollbacks, hotfixes, or incidents. Divide failed deployments by total deployments. Tag incidents with the causal deployment.
Collecting DORA in Practice#
The key is emitting structured deployment events. Every deployment should produce a record:
{
"service": "orders-api",
"environment": "production",
"deployed_at": "2026-02-22T14:30:00Z",
"first_commit_at": "2026-02-21T09:15:00Z",
"deployer": "ci-pipeline",
"commit_sha": "abc123",
"caused_incident": false,
"rolled_back": false
}Emit this from your CI/CD pipeline. GitHub Actions example:
- name: Record deployment
run: |
FIRST_COMMIT=$(git log --format=%aI $(git describe --tags --abbrev=0)..HEAD | tail -1)
curl -X POST https://metrics.internal/deployments \
-H "Content-Type: application/json" \
-d '{
"service": "${{ github.event.repository.name }}",
"environment": "production",
"deployed_at": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'",
"first_commit_at": "'$FIRST_COMMIT'",
"commit_sha": "${{ github.sha }}"
}'Store in a time-series database (Prometheus, InfluxDB) or a simple PostgreSQL table. Compute metrics with straightforward queries: deployment frequency is deploys per day, lead time is average gap between first commit and deploy timestamp, change failure rate is failed deploys over total deploys.
The SPACE Framework#
DORA captures CI/CD pipeline efficiency but misses the broader developer experience. SPACE, developed by researchers at GitHub, Microsoft, and the University of Victoria, provides five dimensions.
Satisfaction and well-being. How developers feel about their work, tools, and team. Measured through surveys. Most strongly correlated with retention. Most often skipped.
Performance. Outcomes of development work — reliability, quality, customer impact. DORA metrics live here. Code review quality, defect density, and SLO adherence also contribute.
Activity. Observable actions: commits, PRs, reviews, deployments. Easy to collect, easy to misuse. Never compare individuals. Useful for spotting trends — a sudden drop in PR throughput across a team suggests a systemic issue.
Communication and collaboration. How effectively developers share knowledge and coordinate. Measured through review turnaround time, documentation freshness, and survey questions about knowledge sharing.
Efficiency and flow. How often developers achieve uninterrupted focus time. Measured through calendar analysis, context switch frequency, and self-reported flow state.
Use at least three SPACE dimensions with at least one metric from each. A team with high activity but low satisfaction is burning out. A team with high satisfaction but low performance is comfortable but not delivering.
Developer Satisfaction Surveys#
Quantitative metrics miss subjective experience. Surveys capture frustration with specific tools and friction that developers have normalized.
Run surveys quarterly — monthly causes fatigue, annually lets problems fester. Keep them under 15 questions. Use a consistent 1-5 scale with one or two open-ended questions:
- “I can get my code to production without unnecessary delays” (Efficiency)
- “I have the tools I need to do my job effectively” (Satisfaction)
- “I receive helpful feedback on code reviews within one business day” (Collaboration)
- “I understand how to deploy and operate my services” (Cognitive load)
- Open-ended: “What is the single biggest thing slowing you down right now?”
Track scores over time, not absolute values. Segment by team, tenure, and role — aggregated averages hide real friction. A 3.5 satisfaction score that was 4.2 last quarter is a signal, not a passing grade.
Time-to-First-Deploy#
The single best measure of platform effectiveness for new hires. Define it as calendar time from a developer’s first day to their first code change reaching production.
Track by recording the first production deployment attributed to each new developer. High-performing organizations achieve 1-3 days. If yours is measured in weeks, the gap is usually one of: environment setup, access provisioning, documentation quality, or build system complexity.
Break it down further:
| Phase | Target | Common Blocker |
|---|---|---|
| Laptop setup | < 2 hours | Manual tool installation |
| Repo access | < 1 hour | Approval workflows |
| First local build | < 30 min | Missing dependencies, outdated docs |
| First PR merged | < 1 day | Long review queues |
| First production deploy | < 3 days | Complex deployment process |
Automate measurement by correlating HR start dates with first commit and first deployment timestamps from your CI/CD system.
Cognitive Load Measurement#
Cognitive load is the mental effort required to complete a task. Three types matter:
Intrinsic load is complexity inherent to the problem — you cannot reduce this. Extraneous load is complexity from poor tooling and unclear processes — the platform should eliminate this. Germane load is effort building useful mental models — this is productive learning.
Measure through:
- Task timing: How long do common tasks take? Time a “create a new service” workflow end to end.
- Context switches per task: How many tools does one workflow touch? Count the distinct systems a developer interacts with to go from code to production.
- Survey questions: “How easy is it to understand our deployment pipeline?” (1-5 scale)
- Tool inventory audit: List every tool a developer must use in a typical week. More than 10 distinct tools is a red flag.
Reduce extraneous load by consolidating tools, standardizing processes, and writing documentation that explains why, not just how.
Platform Adoption Rates#
When the platform team ships a new capability, adoption rate tells you whether it solves a real problem.
Track these metrics monthly:
- Template usage: Services created from scaffolder templates vs manually.
- Golden pipeline adoption: Services using the standard CI/CD pipeline vs custom.
- Self-service utilization: Infrastructure requests through the portal vs tickets.
- Documentation engagement: TechDocs page views, search hit rates, time on page.
Low adoption does not necessarily mean the tool is bad — it might mean discovery is poor or onboarding friction is too high. Investigate before iterating. Talk to the teams that are not adopting. Their reasons are your roadmap.
Feedback Loops#
Metrics without feedback loops are dashboards nobody looks at. Build three loops:
Weekly platform review. The platform team reviews adoption metrics, new survey comments, and top Slack questions. Takes 30 minutes. Output: one or two specific improvements to prioritize.
Monthly engineering review. Share DORA trends and adoption metrics with engineering leadership. Not to judge teams — to identify systemic patterns. If lead time is increasing across all teams, the platform is not keeping up with complexity.
Quarterly developer experience report. Combine survey results, DORA metrics, adoption rates, and time-to-first-deploy into a single report. Trend over time. Highlight what improved, what regressed, and what the platform team is doing about it.
The most important feedback mechanism is informal: platform engineers should spend one day per month pairing with application developers. No amount of metrics replaces watching someone struggle with your platform.
Measurement Tooling#
Sleuth. SaaS tool focused on DORA metrics and deploy tracking. Integrates with GitHub, GitLab, CI/CD systems, and incident tools. Automatically correlates deployments with incidents. Strengths: fast setup, good change failure rate tracking. Limitation: focused on DORA, not broader developer experience.
LinearB. Covers engineering metrics more broadly — PR cycle time, review time, planning accuracy, investment allocation (what percentage of work is features vs bugs vs tech debt). Integrates with GitHub, Jira, GitLab. Strengths: work allocation visibility, team-level dashboards. Limitation: can feel surveillance-oriented if not positioned carefully.
DX (formerly DX by Abi Noda). Focuses on the developer experience side — survey-based measurement using research-backed questions aligned with the SPACE framework. Provides benchmarks against industry peers. Strengths: scientifically validated survey instrument, actionable recommendations. Limitation: survey-based metrics require sustained organizational commitment.
Backstage DORA plugin. Open-source option that embeds DORA dashboards directly in your developer portal. Requires feeding deployment and incident data into Backstage. Strengths: lives where developers already are. Limitation: requires Backstage and data pipeline setup.
DIY with Prometheus + Grafana. Emit deployment events as Prometheus metrics, build Grafana dashboards. Fully customizable, no vendor dependency, but requires building and maintaining the collection pipeline yourself.
Practical Measurement Plan#
Start with what you can measure today and expand:
| Phase | Metrics | Source | Effort |
|---|---|---|---|
| Month 1 | Deployment frequency, lead time | CI/CD events | Low |
| Month 2 | Change failure rate, MTTR | Incident + deploy correlation | Medium |
| Month 3 | Developer satisfaction survey | Quarterly survey launch | Low |
| Month 4 | Time-to-first-deploy | CI/CD + HR data | Medium |
| Month 6 | Adoption rates, cognitive load | Platform telemetry + audit | Medium |
Begin with DORA — the data is usually already available in your CI/CD system. Add a quarterly survey. Layer in adoption and cognitive load metrics as the platform matures and you need more targeted signals about where to invest next.