Systematic Debugging

Overview

Random fixes waste time and create new bugs. Quick patches mask underlying issues.

Core principle: ALWAYS find root cause before attempting fixes. Symptom fixes are failure.

Violating the letter of this process is violating the spirit of debugging.

The Iron Law

NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST

If you haven't completed Phase 1, you cannot propose fixes.

When to Use

Use for ANY technical issue:

Test failures
Bugs in production
Unexpected behavior
Performance problems
Build failures
Integration issues

Use this ESPECIALLY when:

Under time pressure (emergencies make guessing tempting)
"Just one quick fix" seems obvious
You've already tried multiple fixes
Previous fix didn't work
You don't fully understand the issue

Don't skip when:

Issue seems simple (simple bugs have root causes too)
You're in a hurry (rushing guarantees rework)
Manager wants it fixed NOW (systematic is faster than thrashing)

The Four Phases

Move through the phases in order. Phase 1 is mandatory; if it produces a clear root cause and a confident fix, you may go directly to Phase 4 (Implementation). Otherwise complete Phase 2 (Pattern Analysis) and Phase 3 (Hypotheses) in order before Phase 4. Between Phase 1 and Phase 2 — when you've decided pattern analysis is needed — run the research phase (default-on, see below) so pattern analysis has external context to build on.

Phase 1: Root Cause Investigation

BEFORE attempting ANY fix:

Read Error Messages Carefully
- Don't skip past errors or warnings
- They often contain the exact solution
- Read stack traces completely
- Note line numbers, file paths, error codes
Reproduce Consistently
- Can you trigger it reliably?
- What are the exact steps?
- Does it happen every time?
- If not reproducible → gather more data, don't guess
Check Recent Changes
- What changed that could cause this?
- Git diff, recent commits
- New dependencies, config changes
- Environmental differences

Gather Evidence in Multi-Component Systems

WHEN system has multiple components (CI → build → signing, API → service → database):

BEFORE proposing fixes, add diagnostic instrumentation:

For EACH component boundary:
  - Log what data enters component
  - Log what data exits component
  - Verify environment/config propagation
  - Check state at each layer

Run once to gather evidence showing WHERE it breaks
THEN analyze evidence to identify failing component
THEN investigate that specific component

Example (multi-layer system):

# Layer 1: Workflow
echo "=== Secrets available in workflow: ==="
echo "IDENTITY: ${IDENTITY:+SET}${IDENTITY:-UNSET}"

# Layer 2: Build script
echo "=== Env vars in build script: ==="
env | grep IDENTITY || echo "IDENTITY not in environment"

# Layer 3: Signing script
echo "=== Keychain state: ==="
security list-keychains
security find-identity -v

# Layer 4: Actual signing
codesign --sign "$IDENTITY" --verbose=4 "$APP"

This reveals: Which layer fails (secrets → workflow ✓, workflow → build ✗)

Trace Data Flow

WHEN error is deep in call stack:

See root-cause-tracing.md in this directory for the complete backward tracing technique.

Quick version:
- Where does bad value originate?
- What called this with bad value?
- Keep tracing up until you find the source
- Fix at source, not at symptom

Research phase

After Phase 1 (Root Cause Investigation) and before Phase 2 (Pattern Analysis), gather outside context. Phase 1 produced internal evidence (error messages, repro steps, recent diffs); the research phase adds external information that pattern analysis can build on.

Default-on. Skip only with explicit, justified statement per the skip protocol below.

This phase only fires when entering Phase 2. If Phase 1 yielded the root cause directly and you're going Phase 1 → Phase 4 (Implementation), you don't reach the research phase. The skip protocol governs the case where you've decided pattern analysis is needed but want to skip the research that would inform it.

1. Default research kinds (all four)

Kind	Purpose	Tool
Web	Search the literal error string and the framework/library's open GitHub issues for prior reports	WebSearch + WebFetch (via subagent)
Codebase prior-bugs	`git log --grep` for related historical fixes; spots regressions and prior work in the same area	Bash + Grep (via subagent)
Authoritative	Fetch current live docs/spec for the API or library involved; catches "am I using this wrong" cases	WebFetch (via subagent)
User-context	Check `MEMORY.md` for related debugging history	Read (inline, no subagent)

2. Dispatch

Three subagents in parallel (web via general-purpose, codebase prior-bugs via Explore, authoritative via general-purpose) plus the inline memory check running concurrently in the main thread. The subagent prompts each carry the bug context from Phase 1 (error message, repro, current diff) so they don't have to rediscover it.

3. Findings location

Findings land in .superpowers/debug-log-<slug>.md where <slug> is YYYY-MM-DD-<short-bug-description>. Examples:

.superpowers/debug-log-2026-05-05-test-failure-auth-handler.md
.superpowers/debug-log-2026-05-12-build-fails-on-arm64.md
.superpowers/debug-log-2026-06-03-flaky-redis-pubsub.md

The skill creates the file on first invocation if it doesn't exist; subsequent invocations on the same bug append. Each entry includes:

Date/time of the research run
Which research kinds fired (or were skipped, with the locked skip-justification line)
Findings: 3-5 bullets per kind that fired, including load-bearing links/refs
"Considered but ruled out" notes so future-you knows what was checked

Add .superpowers/ to your project's .gitignore so debug logs don't get accidentally committed. The upstream superpowers plugin uses this convention; superjawn follows the same pattern.

4. Skip protocol

If skipping, write one line to .superpowers/debug-log-<slug>.md: Skipped research because <reason>. <Verifiable pointer if applicable>.

Valid reasons:

Trivial scope (typo, comment edit, single-line config)
Fresh prior research — same topic in current session OR within last 7 days with verifiable spec/plan pointer. If the pointer doesn't resolve, the skip is invalid. (Beyond 7 days, repeat the research even if you remember the prior findings — the landscape drifts.)
User explicit — must quote the phrase that authorized the skip.
Repeat of identical task — must include a pointer to the prior successful run.

Invalid reasons: "I think I know", "seems straightforward", "moving fast", "user wants this done quickly", "already familiar with this codebase". If those are tempting, do the research.

Phase 2: Pattern Analysis

Find the pattern before fixing:

Find Working Examples
- Locate similar working code in same codebase
- What works that's similar to what's broken?
Compare Against References
- If implementing pattern, read reference implementation COMPLETELY
- Don't skim - read every line
- Understand the pattern fully before applying
Identify Differences
- What's different between working and broken?
- List every difference

systematic-debugging

How to add

Drop this on your repo README

Related skills

claude-api

skill-creator

claude-mem

oh-my-issues

Get new Desenvolvimento skills every Monday