A bug is a falsified assumption. Find the assumption, falsify it deliberately, observe the divergence, narrow until one line owns the lie. No speculation, no shotgun edits, no "fix and rerun" guessing.
When to Apply / NOT
Apply: test fails and cause unclear; production stack trace; intermittent / flaky behavior; wrong output without crash; regression after known commit window; heisenbug.
NOT apply: performance regression with correct outputs; security defect; symptom obvious from one-line read; architectural confusion.
Anti-patterns
- Shotgun debugging: editing several files hoping one fixes it.
- Print-and-rerun: adding logs without a target observation.
- Premature fix: patching symptom before isolating root cause.
- Ignoring the trace: stack frames are evidence.
- Changing two variables at once: defeats falsification.
- Deleting the failing test: capturing the bug is the asset.
Hypothesis Loop (language-neutral)
- Observe — Reproduce the failure deterministically.
- Trace — Read the failure artifact (stack, log, core dump).
- Hypothesize — One falsifiable claim. Rank hypotheses by likelihood.
- Instrument — Insert minimum probe (breakpoint, structured log, assertion).
- Run — Execute the minimal repro.
- Confirm or refute — If refuted, demote and pick next hypothesis.
- Narrow — Binary-search the suspect range. Use
git bisectfor regressions. - Confirm root cause — Inverse test: removing/altering the cause must restore correctness.
- Hand off — Forward to TDD: minimal repro becomes permanent failing test.
Stack-Trace Reading
- Top frame is innermost: the failure point.
- Cause vs context: An exception's
caused bychain encodes why; the stack encodes where. - Async traces: virtual stacks drop frames between awaits — capture causal context.
- Symbol fidelity: Strip-mode binaries lose frame names. Build with debug info.
- Inlined / optimized frames:
<inlined>markers signal source-line-to-instruction map is approximate.
Parallel Tooling
| Family | Live debugger | Postmortem / record | Remote attach |
|---|---|---|---|
| Systems (C/C++/Rust) | gdb, lldb, rust-gdb, rust-lldb | coredumpctl + gdb core, rr record/replay | gdb -p <pid> / lldb -p <pid> |
| Python | pdb, ipdb, pdbpp, breakpoint() | faulthandler, py-spy dump, traceback module | debugpy --listen |
| Go | dlv debug, dlv test, dlv attach <pid> | runtime/pprof, GOTRACEBACK=crash | dlv connect <addr> |
| Java/Kotlin | IntelliJ debugger, jdb | hs_err logs, JFR, heap dump (jmap) | JDWP -agentlib:jdwp=... |
| JavaScript/TypeScript | node --inspect, Chrome DevTools | --report-uncaught-exception reports | --inspect=0.0.0.0:9229 |
| OCaml | ocamldebug, Printexc.record_backtrace true | core file + ocaml-gdb, memtrace | ocamldebug -s <socket> |
Use procs (not ps) for PID. Use bat -P -p -n (not cat) for trace files. Use git grep -n -C 3 'pattern' (not grep) for callsites.
Constitutional Rules
- Reproduce before fixing.
- One hypothesis at a time.
- Evidence over inference.
- Capture the bug as a test (hand to TDD).
- Confirm with inverse.
- Bisect for regressions.
- No silent edits.
Reasoning approach
Before hypothesizing a fix, reason through the failure — SHORT-form KEYWORDS for trace notes, observe the symptoms, trace the execution path, break down where actual behavior diverges from expected, critically review each candidate cause, validate each hypothesis against the evidence. The root cause is the smallest explanation that accounts for all observed symptoms. For numeric calculation (timing math, bound arithmetic, off-by-N analysis), invoke fend per the baseline rule; never self-calculate. Causal reasoning and trace interpretation are in-head — they are not arithmetic.
Pre-flight Check
- Before writing a plan for a bug fix that touches multiple files
- Whenever you notice that the previous attempt to fix a bug failed