The Action Item Problem#
Post-mortem reviews produce action items. Teams agree on what needs to change. Then weeks pass, priorities shift, and items quietly decay into a backlog nobody checks. The next incident hits the same root cause, and the post-mortem produces the same action items again.
Studies of recurring incidents consistently show the root cause was identified in a previous post-mortem, and the corresponding action item was never completed. Action item tracking is the mechanism by which incidents make systems more reliable instead of just more documented.
Categorization: Prevent, Detect, Mitigate#
Every action item falls into one of three categories based on where in the incident lifecycle it operates.
Prevent#
Actions that stop this class of incident from happening. These address root causes: add input validation, implement circuit breakers, fix race conditions, add connection pool limits. Prevention actions are highest effort but highest value – they eliminate entire categories of incidents.
Detect#
Actions that improve detection speed and accuracy. Examples: add an alert for connection pool utilization above 80%, create a synthetic check for the payment flow, add structured logging for auth failures. Detection actions have moderate effort and high value – faster detection directly reduces incident duration.
Mitigate#
Actions that reduce impact or speed recovery. Examples: write a failover runbook, implement auto-scaling triggers, pre-stage rollback procedures, add feature flags. Mitigation actions are usually lowest effort and provide immediate value for the next occurrence.
Every significant post-mortem should produce at least one action in each category. If the review only generated prevention items, ask: “If this happens again before the fix is deployed, how do we detect it faster? How do we mitigate it faster?”
Prioritization#
Score each action item on two dimensions:
Impact (how much risk reduction):
| 4 - Critical | Eliminates root cause or prevents SEV-1 recurrence |
|---|---|
| 3 - High | Significantly reduces likelihood or severity |
| 2 - Medium | Meaningful but incremental improvement |
| 1 - Low | Marginal risk reduction |
Effort (how much work):
| 1 - Low | Less than 1 day, single team |
|---|---|
| 2 - Medium | 1-5 days, single team |
| 3 - High | 1-3 weeks, cross-team coordination |
| 4 - Very High | Multiple weeks, significant engineering effort |
Priority = Impact / Effort:
- P0 (Do Now): ratio >= 3.0, or any Impact=4 item
- P1 (This Week): ratio >= 1.5
- P2 (This Month): ratio >= 0.75
- P3 (Backlog): ratio < 0.75
Overrides: Any SEV-1 action item with Impact >= 3 is automatically P0. Any detection improvement that would have halved incident duration is P1+. If the same item appears in two post-mortems, bump it up one priority level.
Example#
Post-mortem: Payment service outage (SEV-1, 2 hours)
| Action | Cat | Impact | Effort | Priority |
|-------------------------------------|---------|--------|--------|----------|
| Add circuit breaker to payment svc | Prevent | 4 | 2 | P0 |
| Alert on payment error rate > 0.5% | Detect | 3 | 1 | P0 |
| Write payment failover runbook | Mitigate| 3 | 2 | P1 |
| Structured logging for retries | Detect | 2 | 1 | P1 |
| Refactor error handling | Prevent | 3 | 4 | P2 |Ownership Assignment#
Every action item needs a single owner. “The team” does not own action items – individuals do.
Rules:
- Assign to the team that owns the affected system.
- Assign to an individual within that team, not a team name.
- Assign at the post-mortem meeting. Do not leave with unassigned items.
- Respect capacity. Do not assign 10 items to one person.
action_item:
id: "PM-2026-0051-03"
post_mortem: "PM-2026-0051"
title: "Add circuit breaker to payment service"
category: "prevent"
priority: "P0"
owner: "jane.doe@company.com"
team: "payments"
due_date: "2026-03-01"
status: "in_progress"
tracking_ticket: "PAY-1234"The tracking ticket rule: Every action item must have a corresponding ticket in the owning team’s project tracker. The post-mortem links to the ticket. The ticket links back to the post-mortem. This bidirectional linking ensures the item is visible in the team’s normal workflow and does not live only in a document nobody revisits.
Follow-Up Cadence#
Without regular follow-up, action items decay.
| Priority | Check-In Frequency | Escalation Trigger |
|---|---|---|
| P0 | Daily standup mention | Not started after 2 days |
| P1 | Weekly check-in | No progress after 1 week |
| P2 | Bi-weekly check-in | No progress after 2 weeks |
| P3 | Monthly review | No progress after 1 month |
Weekly Reliability Review#
A standing 30-minute meeting to review all open action items:
- New items from this week’s incidents (5 min)
- P0 status updates from each owner (10 min)
- P1 status updates (5 min)
- Overdue items: blockers, reassignment (5 min)
- Completion rate metrics (5 min)
Automated Reminders#
def check_overdue_items():
for item in get_open_action_items():
if item.due_date < today():
days_overdue = (today() - item.due_date).days
send_slack_dm(item.owner,
f"Action item {item.id} is {days_overdue} days overdue: "
f"{item.title} | Priority: {item.priority}")
if days_overdue > escalation_threshold(item.priority):
send_slack_dm(item.team_lead,
f"Escalation: {item.id} is {days_overdue} days overdue.")Measuring Completion Rates#
Completion rate: Items completed on time / total items due. Target 85%+. Below 70% indicates systemic problems.
Time to completion by priority:
| Priority | Target | Acceptable |
|---|---|---|
| P0 | 7 days | 14 days |
| P1 | 14 days | 30 days |
| P2 | 30 days | 60 days |
| P3 | 60 days | 90 days |
Recurrence rate: Incidents hitting a root cause identified in a previous post-mortem with an open action item. Above 10% means the organization writes post-mortems but does not learn from them.
Decay rate: Items more than 30 days past due without progress.
Track these in a dashboard. Trends matter more than absolute numbers.
Preventing Action Item Decay#
Set realistic due dates. Over-ambitious dates normalize overdue status.
Limit items per post-mortem. Aim for 3-7 focused items. More than that, prioritize ruthlessly and explicitly defer the rest.
Close items that will not be done. If an item has been deprioritized repeatedly, close it with a documented decision: “Accepted risk: choosing not to implement X because Y.”
Quarterly cleanup. Review all items older than 90 days. For each: complete it, reprioritize it, or close it with documentation.
Celebrate completions. Acknowledge significant completions to reinforce that the work matters.
Track recurrences visibly. When an incident recurs because an action item was not completed, note it in the post-mortem.
Make it a leadership metric. If leadership reviews completion rates alongside velocity and uptime, teams allocate time for it.
Step-by-Step Tracking Process#
- Post-mortem concludes. Facilitator has a list of proposed action items.
- Categorize each as prevent, detect, or mitigate. Verify coverage across all three.
- Score impact and effort. Calculate priority using the matrix.
- Assign an individual owner. Confirm acceptance.
- Set a due date based on priority targets.
- Create a tracking ticket in the owning team’s tracker. Link bidirectionally to the post-mortem.
- Add to reliability review based on priority and cadence.
- Follow up on cadence. Check status. Surface blockers.
- On completion, update the ticket, post-mortem document, and dashboard.
- If overdue, escalate per triggers. Reassign if needed. Adjust due date only with documented justification.
- Quarterly review. Audit all open items. Close zombies. Report metrics to leadership.
Agent Operational Notes#
- Automate reminders. An agent checking status daily and sending reminders on cadence is reliable and consistent.
- Maintain bidirectional links. Always include post-mortem references in tickets and ticket links in post-mortems.
- Surface recurrence data. When a new incident occurs, check for related open action items and flag them immediately.
- Never close items without documentation. Require a reason for every closure without completion.
- Report metrics consistently. Generate weekly/monthly completion metrics automatically to build organizational habit.