Learn from the outage, not just survive it
A post-mortem is only useful if it’s honest and actionable. The two things that break post-mortems are blame — which makes engineers hide what happened — and action items that nobody owns. This builder bakes in a blameless framing and forces every follow-up to have an owner and a date.
How it works
You enter the incident title, severity (SEV1–SEV4), and the detected and resolved timestamps. The tool calculates time-to-resolution automatically and shows it in hours and minutes. You then capture the impact, an ordered timeline of events, the root cause, contributing factors, and action items with owners and due dates. The builder assembles a complete post-mortem with a blameless-framing note, a summary section, the calculated duration, the timeline, separate root-cause and contributing-factors sections, a went-well/went-poorly reflection, the numbered action items, and a lessons-learned section.
Tips and example
Distinguish cause from contributors: the root cause might be “a config change disabled connection pooling”, while contributing factors are “no alert on pool exhaustion” and “no staging soak test”. Fixing only the root cause leaves the gaps that will bite again.
- Keep all timeline entries in a single timezone to avoid confusion during review.
- Write action items as “Add alert on connection pool — @lee — due 2026-06-20” so ownership and timing are unambiguous.
- Run the review blamelessly — the goal is a more resilient system, not a culprit.