Review and debug

How to review code, hunt bugs, and verify fixes without creating noise or whack-a-mole loops.

Source: gstack review/SKILL.md, investigate/SKILL.md, canary/SKILL.md, CLAUDE.md.

Investigate's Iron Law

"NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST. Fixing symptoms creates whack-a-mole debugging. Every fix that doesn't address root cause makes the next bug harder to find."

Four phases, in order:

Investigate — read the data flow before guessing
Analyze — what's the actual cause vs the symptom
Hypothesize — name the hypothesis explicitly
Implement — only after the first three

The 3-strike rule

3 failed hypotheses → STOP.

"This may be an architectural issue rather than a simple bug."

Recurring-bug heuristic

"Recurring bugs in the same files are an architectural smell, not a coincidence."

If the same file is on its third bug fix, the next move is a refactor, not another patch.

Hard rules for fixes

Never apply a fix you can't verify
Never say "this should fix it"
Regression test must fail without the fix and pass with it
>5 files touched → blast-radius gate (slow down, get a second look)

Red flags to slow down

"Quick fix for now" — there is no "for now"
Proposing a fix before tracing data flow — you're guessing
Each fix reveals a new problem elsewhere — wrong layer, not wrong code

Confidence calibration in reviews

Every finding rated 1-10:

Below 7 → caveat or suppress
1-2 → only report if severity is P0
8+ → call out clearly

Stop reporting noise as if it equals bugs.

Fix-first, not read-only

Reviews auto-fix:

Dead code
N+1 queries
Stale comments

Reviews ask on:

Architecture changes
Tradeoffs

Reviews don't report below confidence 7 unless P0.

Search-before-recommending

Before suggesting a pattern, verify it's still current best practice. Don't recommend yesterday's wisdom.

Order of operations in review

Critical-pass categories first:

SQL / data safety
Race conditions
LLM trust boundary
Shell injection
Enum completeness — the one category where within-diff-only review is insufficient. Grep siblings, read consumers.

Then informational findings.

Pre-existing blame protocol

"'Pre-existing' without receipts is a lazy claim. Prove it or don't say it."

When an E2E test fails, never claim "not related to our changes" without proving it.

Required: run the same test on main.

If it passes on main but fails on branch → it IS your change
If it fails on main → file an issue, link it

Long-running tasks

"Never switch to blocking mode and give up when the poll times out. Never say 'I'll be notified when it completes' and stop checking — keep the loop going until the task finishes or the user tells you to stop."

Alert on changes, not absolutes

From canary, but it's the universal observability rule:

A page with 3 baseline errors is fine if it still has 3.

"Don't cry wolf — alert only on patterns persisting across 2+ consecutive checks."

"Baseline is king. Without it, canary is just a health check."

Performance:

2x baseline = regression
1.5x baseline = variance

Read-only by default. Speed > analysis.

Slop-scan philosophy

"We are NOT trying to pass as human code. We are AI-coded and proud of it. The goal is code quality."

Distinguish "linter gaming" from "genuine quality":

Catch patterns where empty catches mask data loss
Ignore patterns where catch {} is the correct fire-and-forget choice

"Don't chase the number. Fix patterns that represent actual code quality problems."

When `catch {}` is correct

Contradicts every linter default — but:

"If a fire-and-forget operation can fail for ANY reason and you don't care, catch {} is the correct pattern."

The example: cleanup paths. "A cleanup path that throws on EPERM means the rest of cleanup doesn't run. That's worse."

Don't tighten best-effort cleanup paths.

Escalation

"Bad work is worse than no work. You will not be penalized for escalating. If you have attempted a task 3 times without success, STOP and escalate."

E2E test hygiene

"NEVER copy a full SKILL.md file into an E2E test fixture. Extract only the section the test actually needs."

"Never pkill running eval processes and restart — you lose results and waste money. One clean run beats three killed-and-restarted runs."

Cost transparency

When evaluating a debugging investigation, attach the dollar cost:

"Total cost of the investigation: $7 across three eval runs."

Decisions tied to dollar amounts are more honest than decisions described in vague terms.

review-and-debug

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

claude-api

skill-creator

oh-my-issues

claude-mem

Recibe nuevas skills de Desenvolvimento todos los lunes

Review and debug

Investigate's Iron Law

The 3-strike rule

Recurring-bug heuristic

Hard rules for fixes

Red flags to slow down

Confidence calibration in reviews

Fix-first, not read-only

Search-before-recommending

Order of operations in review

Pre-existing blame protocol

Long-running tasks

Alert on changes, not absolutes

Slop-scan philosophy

When `catch {}` is correct

Escalation

E2E test hygiene

Cost transparency

Comentarios · Sin comentarios

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

claude-api

skill-creator

oh-my-issues

claude-mem

Recibe nuevas skills de Desenvolvimento todos los lunes

Review and debug

Investigate's Iron Law

The 3-strike rule

Recurring-bug heuristic

Hard rules for fixes

Red flags to slow down

Confidence calibration in reviews

Fix-first, not read-only

Search-before-recommending

Order of operations in review

Pre-existing blame protocol

Long-running tasks

Alert on changes, not absolutes

Slop-scan philosophy

When catch {} is correct

Escalation

E2E test hygiene

Cost transparency

Comentarios · Sin comentarios

When `catch {}` is correct