What is the difference between a regression and a normal bug?

Every regression is a bug, but not every bug is a regression. A regression is specifically a defect in functionality that used to work and stopped working because of a recent change: a code edit, a dependency upgrade, a config tweak, or an environment shift. A normal bug can be brand-new behavior that never worked. The distinction matters because a regression has a known-good baseline, so you can often bisect commits to find the exact change that introduced it, while a net-new bug has no prior working version to compare against. ISTQB defines a regression as a degradation in the quality of a component or system due to a change.

What causes software regressions?

Regressions are caused by change, and there are four common sources. First, direct code edits where a fix or feature touches shared logic and breaks an adjacent path. Second, dependency and library upgrades that alter behavior your code silently relied on. Third, configuration, environment, or infrastructure changes (a flag, an API version, a data-shape change) that the code never accounted for. Fourth, merge conflicts resolved incorrectly so two correct branches combine into a broken state. The common thread is that the failing code was not necessarily the code that changed: a regression frequently surfaces far from the edit that caused it, which is what makes it hard to trace from a stack trace alone.

How do you detect and prevent software regressions?

Detection and prevention rest on having a known-good baseline you can re-check after every change. The standard tools are an automated regression test suite run in CI on every commit, a smoke or sanity pass over critical paths before release, and production monitoring that watches error rates and key flows after each deploy. Prevention adds smaller batch sizes (less surface area per change), code review focused on blast radius, and feature flags so a suspected regression can be turned off without a rollback. DORA's change fail rate metric exists precisely to quantify how often deployments introduce production regressions, so teams can track whether their safeguards are working over time.

Why are AI coding assistants increasing regression risk?

AI assistants generate more code faster, and they encourage larger change batches, both of which raise regression risk if testing does not keep pace. Google's 2024 DORA research found that a 25% increase in AI adoption was associated with an estimated 7.2% decrease in delivery stability. GitClear's 2025 analysis of 211 million changed lines found code churn (lines reverted within two weeks) rising and copy/pasted code climbing from 8.3% to 12.3%, both leading indicators of regressions. The fix is not to abandon AI but to feed the agent better context: when an AI agent can read the failing session, console, and network state behind a regression, it can locate the breaking change instead of guessing.

Is a stack trace enough to debug a regression?

Usually not. A stack trace tells you where code failed and the call path that got there, but a regression's root cause is the change that altered behavior, which is often nowhere near the line that threw. To debug a regression you also need the state that triggered it (the props, the API response, the user action) and ideally the diff between the working and broken versions. That is why works-on-my-machine regressions survive a clean trace. Pairing the trace with a session replay, the console output, and the network request that fed the bad data turns a location into a reproduction, which is what actually lets you bisect to the offending change and fix it.

Glossary

What Is a Software Regression? Causes, Detection & Prevention

A software regression is something that used to work and broke after a change. Here is what causes regressions, how to detect them against a known-good baseline, and why AI-generated code is making them more common.

Hrishikesh BaidyaJun 5, 20266 min read

Glossary

Isometric line-art of a versioned build pipeline where the latest block is cracked and a lime probe traces the break back to the commit that introduced the regression

Definition

A software regression is a defect where functionality that previously worked stops working after a change — a code edit, dependency upgrade, configuration shift, or merge. It is degradation caused by change, not new untested behavior. Because a known-good baseline exists, the offending change can usually be found by bisecting.

The canonical definition comes from the testing-standards body. ISTQB defines a regression as 'a degradation in the quality of a component or system due to a change.' Read that sentence carefully: the trigger is a change, and the failure is degradation of something that already existed. That is the whole distinction. A net-new bug is behavior that never worked; a regression is behavior that worked yesterday and does not work today, which means there is a prior version to compare against.

That baseline is not a footnote — it is the single most useful property of a regression. It is why git bisect exists. If the feature passed on commit A and fails on commit Z, the breaking change is somewhere between them, and you can binary-search the range to find it. A brand-new bug gives you no such anchor. So the first question to ask of any defect is not 'where did it throw' but 'did this ever work' — the answer routes you to two completely different debugging strategies.

Why it matters

Regressions matter because they are the failure mode that scales with how fast you ship. The production-facing version even has a metric: DORA tracks change fail rate — the share of deployments that cause a failure in production requiring a hotfix, rollback, or patch — as one of its four core delivery measures. That makes the regression a benchmarkable number, not a vibe. If your change fail rate is climbing, your safeguards are losing the race against your deploy frequency.

And the substrate they surface from is enormous. CISQ put the cost of poor software quality in the US at $2.41 trillion, including roughly $1.52 trillion of accumulated technical debt. Technical debt is precisely the brittle, under-tested code from which regressions repeatedly emerge whenever someone touches it. The harder problem is that detection is a needle-in-a-haystack search: in Google's 'Taming Google-Scale Continuous Testing' study, only 1.23% of test executions actually caught a real breakage, while about 84% of observed pass-to-fail transitions were flaky tests rather than genuine regressions. Most of the signal your suite produces is noise, which is why a stable baseline and reliable tests matter more than raw test count.

An isometric build pipeline with version blocks v1, v2, v3 connected by a thin rail; the v3 block has a hairline crack in a previously-green path, and a lime probe traces backward along a dotted known-good baseline to the exact commit on v2 that introduced the break — A regression has a baseline: v2 passed, v3 fails, so the breaking change lives in the diff between them. The probe traces back to the offending commit — that backward search is what a net-new bug can never give you.

How this shows up in a real BugMojo bug report

Here is the honest limit of a stack trace on a regression, and where BugMojo fits. A trace tells you where code failed and the path that got there. But a regression's root cause is the change that altered behavior, and that change is frequently nowhere near the line that threw — a config flag, an upgraded dependency, a different API response shape. The failing code is not necessarily the code that changed. So a clean trace points you at a symptom while the cause sits three files and one deploy away.

In a BugMojo report the trace does not arrive alone. The browser extension captures the failure with its surrounding state — an rrweb session replay, the console output, and the network request that fed the bad data — so the frame at Pricing.tsx:142 sits next to the exact GET /api/plan response whose new tier field your code never handled. That is the difference between 'something broke near line 142' and 'the plan endpoint started returning a shape this branch did not account for.' The state is what tells you a change caused it.

Then BugMojo hands that whole bundle to an AI agent (Claude Code, Cursor) over an MCP server. The agent reads the replay, the console, and the network response together with your repository, so it can correlate the failing state with the diff that introduced it instead of guessing from a trace. That is the uncontested wedge: production error monitors attach a trace with breadcrumbs, but none of them ship an MCP layer that lets an agent read the captured session behind the regression.

Feature	Capability	BugMojo	Prod error monitor (Sentry/BugSnag)
Stack trace attached to the report	—	✓	✓
rrweb session replay of the regression	—	✓	—
Console + network captured with the failure	—	✓	Breadcrumbs
Captured bug bundle handed to an AI agent over MCP	—	✓	—
Aggregate uncaught exceptions across a production fleet	—	—	✓
Release health and change-fail-rate trends at scale	—	—	✓

Two-sided: BugMojo bundles the regression's state and hands it to an agent over MCP, but it is not a production error-monitoring tool.

Hand your AI agent the regression, not just the trace

BugMojo captures the failing session with its rrweb replay, console, and network — then hands the whole bundle to Claude Code or Cursor over MCP, so your agent reads the state behind the regression and can find the change that broke it.

Install the extension

Frequently asked questions

Sources

Regression — "A degradation in the quality of a component or system due to a change" (ISTQB Glossary) — ISTQB (International Software Testing Qualifications Board) (2025)
Announcing the 2024 DORA report — AI adoption vs. delivery stability and throughput — Google Cloud / DORA (2024-10)
Accelerate State of DevOps Report 2024 — change fail rate as a core delivery metric — DORA (DevOps Research and Assessment) (2024)
AI Copilot Code Quality: 2025 Research — code churn and copy/paste across 211M changed lines — GitClear (2025-02)
The Cost of Poor Software Quality in the US: A 2022 Report — $2.41T total, $1.52T technical debt — CISQ (Consortium for Information & Software Quality) (2022-12)
Taming Google-Scale Continuous Testing — 1.23% of test runs catch a real breakage; ~84% of pass-to-fail transitions are flaky — Google Research / IEEE ICSE-SEIP (2017)
Introducing the Model Context Protocol — open standard for connecting AI agents to tools and data — Anthropic (2024-11)

Get bug-tracking insights, weekly.

Engineering deep-dives, QA playbooks, and honest tool comparisons. No spam — unsubscribe in one click.