Systematic Debugging
Overview
Random fixes waste time and create new bugs. Quick patches mask underlying issues.
Core principle: ALWAYS find root cause before attempting fixes. Symptom fixes are failure.
Violating the letter of this process is violating the spirit of debugging.
The Iron Law
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
If you haven't completed Phase 1, you cannot propose fixes.
When to Use
Use for ANY technical issue:
- Test failures
- Bugs in production
- Unexpected behavior
- Performance problems
- Build failures
- Integration issues
Use this ESPECIALLY when:
- Under time pressure (emergencies make guessing tempting)
- "Just one quick fix" seems obvious
- You've already tried multiple fixes
- Previous fix didn't work
- You don't fully understand the issue
Don't skip when:
- Issue seems simple (simple bugs have root causes too)
- You're in a hurry (rushing guarantees rework)
- Manager wants it fixed NOW (systematic is faster than thrashing)
The Four Phases
Move through the phases in order. Phase 1 is mandatory; if it produces a clear root cause and a confident fix, you may go directly to Phase 4 (Implementation). Otherwise complete Phase 2 (Pattern Analysis) and Phase 3 (Hypotheses) in order before Phase 4. Between Phase 1 and Phase 2 — when you've decided pattern analysis is needed — run the research phase (default-on, see below) so pattern analysis has external context to build on.
Phase 1: Root Cause Investigation
BEFORE attempting ANY fix:
-
Read Error Messages Carefully
- Don't skip past errors or warnings
- They often contain the exact solution
- Read stack traces completely
- Note line numbers, file paths, error codes
-
Reproduce Consistently
- Can you trigger it reliably?
- What are the exact steps?
- Does it happen every time?
- If not reproducible → gather more data, don't guess
-
Check Recent Changes
- What changed that could cause this?
- Git diff, recent commits
- New dependencies, config changes
- Environmental differences
-
Gather Evidence in Multi-Component Systems
WHEN system has multiple components (CI → build → signing, API → service → database):
BEFORE proposing fixes, add diagnostic instrumentation:
For EACH component boundary: - Log what data enters component - Log what data exits component - Verify environment/config propagation - Check state at each layer Run once to gather evidence showing WHERE it breaks THEN analyze evidence to identify failing component THEN investigate that specific componentExample (multi-layer system):
# Layer 1: Workflow echo "=== Secrets available in workflow: ===" echo "IDENTITY: ${IDENTITY:+SET}${IDENTITY:-UNSET}" # Layer 2: Build script echo "=== Env vars in build script: ===" env | grep IDENTITY || echo "IDENTITY not in environment" # Layer 3: Signing script echo "=== Keychain state: ===" security list-keychains security find-identity -v # Layer 4: Actual signing codesign --sign "$IDENTITY" --verbose=4 "$APP"This reveals: Which layer fails (secrets → workflow ✓, workflow → build ✗)
-
Trace Data Flow
WHEN error is deep in call stack:
See
root-cause-tracing.mdin this directory for the complete backward tracing technique.Quick version:
- Where does bad value originate?
- What called this with bad value?
- Keep tracing up until you find the source
- Fix at source, not at symptom
Research phase
After Phase 1 (Root Cause Investigation) and before Phase 2 (Pattern Analysis), gather outside context. Phase 1 produced internal evidence (error messages, repro steps, recent diffs); the research phase adds external information that pattern analysis can build on.
Default-on. Skip only with explicit, justified statement per the skip protocol below.
This phase only fires when entering Phase 2. If Phase 1 yielded the root cause directly and you're going Phase 1 → Phase 4 (Implementation), you don't reach the research phase. The skip protocol governs the case where you've decided pattern analysis is needed but want to skip the research that would inform it.
1. Default research kinds (all four)
| Kind | Purpose | Tool |
|---|---|---|
| Web | Search the literal error string and the framework/library's open GitHub issues for prior reports | WebSearch + WebFetch (via subagent) |
| Codebase prior-bugs | git log --grep for related historical fixes; spots regressions and prior work in the same area | Bash + Grep (via subagent) |
| Authoritative | Fetch current live docs/spec for the API or library involved; catches "am I using this wrong" cases | WebFetch (via subagent) |
| User-context | Check MEMORY.md for related debugging history | Read (inline, no subagent) |
2. Dispatch
Three subagents in parallel (web via general-purpose, codebase prior-bugs via Explore, authoritative via general-purpose) plus the inline memory check running concurrently in the main thread. The subagent prompts each carry the bug context from Phase 1 (error message, repro, current diff) so they don't have to rediscover it.
3. Findings location
Findings land in .superpowers/debug-log-<slug>.md where <slug> is YYYY-MM-DD-<short-bug-description>. Examples:
.superpowers/debug-log-2026-05-05-test-failure-auth-handler.md
.superpowers/debug-log-2026-05-12-build-fails-on-arm64.md
.superpowers/debug-log-2026-06-03-flaky-redis-pubsub.md
The skill creates the file on first invocation if it doesn't exist; subsequent invocations on the same bug append. Each entry includes:
- Date/time of the research run
- Which research kinds fired (or were skipped, with the locked skip-justification line)
- Findings: 3-5 bullets per kind that fired, including load-bearing links/refs
- "Considered but ruled out" notes so future-you knows what was checked
Add .superpowers/ to your project's .gitignore so debug logs don't get accidentally committed. The upstream superpowers plugin uses this convention; superjawn follows the same pattern.
4. Skip protocol
If skipping, write one line to .superpowers/debug-log-<slug>.md: Skipped research because <reason>. <Verifiable pointer if applicable>.
Valid reasons:
- Trivial scope (typo, comment edit, single-line config)
- Fresh prior research — same topic in current session OR within last 7 days with verifiable spec/plan pointer. If the pointer doesn't resolve, the skip is invalid. (Beyond 7 days, repeat the research even if you remember the prior findings — the landscape drifts.)
- User explicit — must quote the phrase that authorized the skip.
- Repeat of identical task — must include a pointer to the prior successful run.
Invalid reasons: "I think I know", "seems straightforward", "moving fast", "user wants this done quickly", "already familiar with this codebase". If those are tempting, do the research.
Phase 2: Pattern Analysis
Find the pattern before fixing:
-
Find Working Examples
- Locate similar working code in same codebase
- What works that's similar to what's broken?
-
Compare Against References
- If implementing pattern, read reference implementation COMPLETELY
- Don't skim - read every line
- Understand the pattern fully before applying
-
Identify Differences
- What's different between working and broken?
- List every difference