What is the cannot-reproduce rate and how do I measure it?

The cannot-reproduce rate is the share of closed bug tickets resolved as 'cannot reproduce', 'works for me', or 'needs more info' rather than fixed or confirmed-not-a-bug. Compute it as that count divided by all closed bugs over a fixed window (a sprint, a month). Track it per team and per report source, because the rate is really a measure of intake quality: a triager who closes a ticket as non-reproducible is telling you the report lacked the state needed to recreate the failure. The 'Works for me!' study of Firefox and Eclipse found these reports run about 17% of all bugs and stay open roughly three months longer, so the rate is a direct proxy for wasted triage cycles. Baseline it before you change anything, then watch it fall as capture and standardization land.

Why can't developers reproduce a bug even when the reporter included steps?

Because steps record actions, not state, and the same actions pass or fail depending on the surrounding conditions. Bettenburg et al. (FSE 2008) found that steps to reproduce, stack traces, and test cases are the artifacts developers value most but reporters supply least reliably, which is the structural cause of the gap. The ICSME 2020 data-fusion study of 576 non-reproducible reports catalogued 11 separate factors, including missing information, environment differences, and ambiguous expected behavior. A click sequence that triggers a 500 on a stale feature flag, a specific API response, a narrow viewport, or a particular account will look like 'works for me' on the developer's machine. The fix is not better prose. It is attaching the console output, the exact network request, and a replay of what the user actually did, so the precondition travels with the report.

How does a session replay carry the 'missing precondition' that steps leave out?

rrweb-style replay records a full DOM snapshot plus every incremental mutation, input event, scroll, viewport resize, console line, and network call, then reconstructs the session without a video file. That means the conditions prose steps silently assume are captured as data. The viewport width that broke the layout is in the resize events. The empty cart that caused the blank checkout page is in the captured POST response. The console error that fired two seconds before the click is on the timeline. A developer or an AI agent does not re-derive the triggering state from a written description; they read it back from the recording. That is the difference between a recipe you can re-run and a description you have to reconstruct, and it is precisely the state the ICSME 2020 study found missing in non-reproducible reports.

What environment and viewport fields should a standardized bug report capture automatically?

Capture the fields that change behavior and that reporters reliably forget: browser name and version, operating system, exact viewport dimensions (width by height in CSS pixels, plus device pixel ratio), the full URL including query and hash, the logged-in account or role, the active feature flags, and the app build or release SHA. Mozilla's Bug Writing Guidelines stress minimizing steps and including any special setup, and these fields are the setup. The point of automating them is that a human typing a report omits whatever feels obvious in the moment, and 'obvious' is usually the precondition. When the extension stamps these onto every report, the triager stops asking 'what browser were you on' and the back-and-forth that inflates the cannot-reproduce rate disappears.

Can an AI coding agent reproduce a bug from a captured session, and what does it need?

An agent can act on a captured bug far more reliably than on prose, but only when the capture includes state. Steps alone force the agent to guess at the triggering conditions, which is the same failure mode that traps human triagers. When the report bundles the replay, the console, the network request, and the environment fields, reproduction becomes a deterministic read rather than a hunt. BugMojo exposes exactly that bundle to agents like Claude Code and Cursor over an MCP server, so the agent can pull the failing request and the viewport state alongside the steps, form a hypothesis, and write a failing test before patching. This is the wedge no plain issue tracker offers: machine-readable bug context, not just a text field an agent has to interpret.

Does standardizing the report and capturing the session slow reporters down?

It does the opposite, because the heavy fields are captured automatically rather than typed. The reporter clicks to record, and the extension attaches the replay, console, network, and environment data client-side; PII is redacted in the browser before anything leaves the page. The human only writes the short narrative — what they expected versus what happened at the failing step, which Mozilla's guidelines identify as the irreducible core. Bettenburg's finding is that the most valuable artifacts are the ones reporters find hardest to produce by hand, so the playbook's move is to stop asking humans to produce them by hand. Standardization here means a fixed shape plus automatic capture, not a longer form. The reporter's effort goes down while the report's reproducibility goes up.

Playbook

A Playbook for Killing Cannot-Reproduce Bug Tickets

Cannot-reproduce tickets are an intake-quality problem, not a developer problem. Here is a four-step playbook that captures the session, logs the environment, standardizes the report, and measures the rate down.

ManviJun 5, 20267 min read

Playbooks

Isometric lime line-art pipeline lifting a CANNOT REPRODUCE ticket through session-replay, environment, report, and rate-gauge checkpoints on a dark canvas

Every backlog has them: tickets closed with a shrug and the words cannot reproduce. The reporter swears it happened. The developer ran the steps and got nothing. Both are telling the truth, and that is the whole problem. A bug report written as prose records what the user did, but a failure depends on the state the user was in — the viewport, the account, the feature flag, the exact API response. When that state is missing, the same click sequence passes on the developer's machine and the ticket dies as noise.

This is not a motivation problem you fix with a sternly worded template. It is a structural mismatch in what reports carry versus what reproduction needs. Below is a playbook that treats the cannot-reproduce ticket as a measurable defect in your intake pipeline and drives its rate toward zero with four concrete moves.

How do you kill cannot-reproduce bug tickets?

Kill cannot-reproduce tickets by carrying state, not prose. Capture a session replay, auto-log the environment and viewport, standardize every report to a fixed shape, and measure the cannot-reproduce rate as a share of closed bugs. Each move attaches the precondition that written steps silently omit, turning reproduction from a guess into a deterministic read.

The evidence that this is structural rather than careless is decades old. Bettenburg and colleagues surveyed developers and reporters across Apache, Eclipse, and Mozilla and analyzed 466 responses. Their core finding: steps to reproduce, stack traces, and test cases are simultaneously the items developers rate most useful and the items reporters find hardest to provide. The most valuable signal is the scarcest signal. That gap is where cannot-reproduce tickets are born.

The cost is quantified too. The Works for me! empirical study of Firefox and Eclipse found that non-reproducible reports make up roughly 17% of all bug reports and stay active about three months longer than reproducible ones. They are not a rounding error. They are one ticket in six, sitting open far longer, consuming triage cycles that produce nothing.

The four-step playbook

Step 1 — Capture the session replay

The single highest-leverage move is to stop asking humans to transcribe what happened and record it instead. An rrweb-style recorder captures a full DOM snapshot plus every incremental mutation, mouse and scroll input, viewport and window resize, console output, and network activity — and reconstructs the session without a video file. That last detail matters: it is structured data, not pixels, so it is searchable, lightweight, and replayable at the DOM level.

This is the mechanism behind the claim that a replay carries the missing precondition. The viewport width that broke the layout lives in the resize events. The empty cart that produced a blank checkout page lives in the captured POST response. The console error that fired two seconds before the click is sitting on the timeline. The developer does not re-derive the triggering state from a paragraph — they read it back from the recording.

Step 2 — Auto-log environment and viewport

Replays carry interaction state; an explicit environment block carries the platform facts a triager would otherwise have to extract by hand. Stamp these onto every report automatically. The point of automating them is precisely that a human omits whatever feels obvious in the moment, and obvious is usually the precondition.

report.environment.jsonjson

{
  "browser":        "Chrome 126.0.6478.127",
  "os":             "Windows 11 (10.0.26200)",
  "viewport":       { "width": 1280, "height": 720, "dpr": 2 },
  "url":            "https://app.example.com/checkout?cart=empty#step-2",
  "account":        { "id": "usr_4192", "role": "trial" },
  "featureFlags":   ["new_checkout_v3", "address_autofill_off"],
  "buildSha":       "a05c0a0",
  "capturedAt":     "2026-06-05T14:22:09Z"
}

Viewport dimensions belong in CSS pixels with device pixel ratio, because a layout bug that only appears at 1280×720 on a 2× display is invisible to a developer sitting at 1920×1080. The build SHA tells you whether the reporter even saw the code you are debugging. None of this is typed by a human.

Step 3 — Standardize the report to a fixed shape

Mozilla's Bug Writing Guidelines are blunt: steps to reproduce are the most important part of any bug report, and if the steps are unclear it might not even be possible to know whether the bug has been fixed. The guidelines stress minimizing steps and stating expected versus actual behavior. Standardization is how you enforce that without nagging.

Crucially, standardizing does not mean a longer form. The heavy artifacts — replay, console, network, environment — are captured automatically. The human writes only the irreducible narrative core. Copy this template into your issue tracker and let automation fill the bracketed slots:

BUG_TEMPLATE.mdmarkdown

## Summary
<one sentence: what broke, where>

## Expected vs Actual
- Expected: <what should have happened at the failing step>
- Actual:   <what happened instead>

## Steps (human-written, minimal)
1. 
2. 
3. 

## Attached automatically — do not type by hand
- [ ] Session replay (rrweb)
- [ ] Console output
- [ ] Failing network request (method, URL, status, body)
- [ ] Environment block (browser, OS, viewport, flags, build SHA)
- [ ] PII redacted client-side before upload

Step 4 — Measure the cannot-reproduce rate

What you do not measure, you cannot drive down. Define the KPI precisely: the count of bugs closed as cannot reproduce, works for me, or needs more info, divided by all bugs closed in a fixed window. Track it per team and per report source. A triager closing a ticket as non-reproducible is signalling a defect in intake quality, so segmenting by source tells you which capture path is leaking state.

Baseline the rate before you change anything. Then watch it fall as Steps 1 through 3 land. The number is a direct proxy for wasted triage cycles, and because the Works for me! data shows these tickets stay open about three months longer, a falling rate also shortens your tail of stale bugs.

Map each root cause to the artifact that kills it

The data-fusion follow-up study (ICSME 2020) examined 576 non-reproducible reports from Firefox and Eclipse and identified 11 distinct factors that push a report to non-reproducibility, including missing information, environment differences, and ambiguous expected behavior. The reason a captured bundle works is that you can trace each cause to a specific artifact that eliminates it.

Non-reproducible bug tickets, by the numbers

Reports that are non-reproducible (%)

Distinct non-repro factors (ICSME 2020)

Bettenburg survey responses analyzed

466

Non-repro reports studied (ICSME 2020)

576

Missing information is killed by the console and network capture. Environment differences are killed by the environment block. Viewport-specific failures are killed by the recorded resize events. Ambiguous expected behavior is killed by the standardized expected-versus-actual field. The bundle is not one fix — it is four overlapping fixes aimed at the four most common causes.

Plain tracker vs. capture-first reporting

Feature	Plain issue tracker	Capture-first (BugMojo)
Reproduction state (viewport, account, flags)	Typed by hand, usually omitted	Auto-captured environment block
Session replay (DOM-level)	Not available	rrweb snapshot + incremental events
Console + failing network request	Pasted manually, if at all	Attached automatically
MCP / AI-agent-readable bug context	Plain text the agent must interpret	Machine-readable bundle over MCP
Setup friction / install footprint	Already in your stack, nothing to add	Requires a browser extension
Works with no extension installed	Yes — pure web form	No — capture needs the extension

The comparison is deliberately two-sided. A plain tracker wins on footprint: it is already in your stack and needs nothing installed, and it works for a reporter who will not add an extension. Capture-first reporting trades that friction for state. The row that decides it for AI workflows is the MCP one — an agent reading a structured bundle behaves differently from an agent parsing a paragraph.

Why the AI-agent row is the wedge

An AI coding agent hits the exact failure mode that traps human triagers: given only prose steps, it must guess the triggering conditions. Bundle the replay, the console, the failing request, and the environment, and reproduction becomes a deterministic read. BugMojo exposes that bundle to agents like Claude Code and Cursor over an MCP server, so an agent can pull the failing request and the viewport state alongside the steps, form a hypothesis, and write a failing test before it patches anything. A plain text field cannot offer that, because there is nothing structured to read.

For the structural detail on why written steps degrade, see the companion glossary entry on reproduction steps — it covers the same reporter/developer information mismatch from the artifact side rather than the measurement side.

Turn cannot-reproduce tickets into deterministic reads

See how BugMojo captures the precondition

Frequently asked questions

Sources

Bettenburg et al. — What Makes a Good Bug Report? (ACM SIGSOFT FSE 2008) — ACM SIGSOFT FSE-16 (author-hosted PDF) (2008)
Works for me! cannot reproduce — a large-scale empirical study of non-reproducible bugs — Mozilla Foundation Research Library (2022)
Why are Some Bugs Non-Reproducible? An Empirical Investigation using Data Fusion (ICSME 2020) — arXiv / IEEE ICSME (2021)
Bug Writing Guidelines — Steps to reproduce are the most important part of any bug report — Mozilla / Bugzilla (2025)
rrweb — record and replay the web (DOM snapshot + incremental mutations, input, viewport, console, network) — rrweb (open source) (2025)

Get bug-tracking insights, weekly.

Engineering deep-dives, QA playbooks, and honest tool comparisons. No spam — unsubscribe in one click.