Bug Hunt — Proactive Bug Discovery
Systematically hunts for bugs before they reach users. An assessor analyzes the codebase to identify high-risk hotspots by cross-referencing code complexity, test coverage gaps, and structural risk factors. Focused hunters then deep-dive into each hotspot, writing reproducing tests to validate or invalidate suspected bugs.
This is deliberately thorough. Each suspected bug gets a reproducing test — no speculative reports. The goal is confirmed findings with evidence, not a noisy list of maybes.
Advisory only. The skill produces findings and proposes tickets; it does not implement fixes. The cognitive seam between "find bug" and "fix bug" is wide enough that mixing them under one workflow degrades both — investigation pressure shouldn't bias the hunters toward bugs they could easily fix, and remediation requires fresh reasoning the hunters aren't currently in. Tickets capture findings durably across that seam and compose with /implement and /implement-project for remediation. The reproducing tests serve as acceptance criteria — the fix is done when the test passes.
Workflow Overview
┌──────────────────────────────────────────────────────┐
│ BUG HUNT WORKFLOW │
├──────────────────────────────────────────────────────┤
│ 1. Determine scope │
│ 2. Spawn assessor (risk analysis) │
│ └─ Output: ranked hotspot list + coverage map │
│ 3. For each hotspot: │
│ └─ Spawn hunter (investigation + repro tests) │
│ └─ Prior findings passed to subsequent hunters │
│ 4. Synthesize findings │
│ 5. Present consolidated findings to user │
│ 6. Cut tickets + commit reproducing tests │
│ (proposed structure; operator-approved) │
│ 7. (If tickets declined) commit reproducing tests │
│ standalone for the coverage benefit │
└──────────────────────────────────────────────────────┘
Workflow Details
1. Determine Scope
Default: Production code only. Excluded by default:
- Test code (test files, test fixtures, test helpers)
- Dev-only dependencies and tooling
- Generated code, vendored code
Inform the user of these exclusions.
Ask the user:
- "What is the scope of the hunt?" (entire codebase, specific module, specific area of concern)
- "Are there areas you're particularly worried about?" (recent changes, complex features, etc.)
- "Anything to skip beyond the defaults?"
User concerns influence prioritization but don't replace systematic analysis.
2. Risk Assessment
Spawn a swe-bug-assessor agent:
You are the risk assessor for a proactive bug hunt. Your analysis will guide
focused investigators who will deep-dive into the hotspots you identify.
Scope: [entire codebase | user-specified scope]
User concerns: [any areas mentioned, or "none specified"]
Exclusions: [test code, vendored code, generated code, plus any user additions]
Perform your full methodology:
1. Map the codebase — language, framework, structure, entry points
2. Coverage analysis — use instrumented coverage if available, fall back to
manual inspection
3. Complexity analysis — identify functions with high cognitive complexity
4. Structural risk analysis — error handling gaps, input validation gaps,
shared mutable state, resource management issues, concurrency risks,
edge case blindness, consistency gaps
5. Git enrichment (optional) — churn hotspots, recent large changes
6. Cross-reference signals and produce a ranked hotspot list
Focus on hotspots where MULTIPLE signals converge — complex AND untested AND
structurally risky. Single-signal hotspots are lower priority.
Output your full assessment in your standard format.
When the assessor reports back: Review the hotspot list. This drives the investigation phase.
3. Focused Investigation — Hunters
For each hotspot in the assessor's list (ALL priorities), spawn a dedicated swe-bug-hunter agent:
You are a focused bug hunter investigating a specific hotspot.
## YOUR HOTSPOT
Target: [from assessor's report]
Files: [from assessor's report]
Risk signals: [from assessor's report]
Hypothesis: [from assessor's report]
Investigation approach: [from assessor's report]
## PRIOR FINDINGS (if any)
[Findings from previous hunters — confirmed bugs, patterns observed]
## YOUR MISSION
Deep-dive into this hotspot. Systematically probe for bugs. For each
suspected issue, write a reproducing test that encodes the correct expected
behavior.
- If the test FAILS: bug confirmed. Keep the test. Document the finding.
- If the test PASSES: hypothesis invalidated. Evaluate whether the test
improves coverage:
- Covers a previously untested path → keep it
- Redundant with existing tests → delete it
Every confirmed finding must have a reproducing test. No speculative reports.
Note any patterns that might apply to other hotspots.
Run hunters sequentially, not in parallel. Each hunter's findings and pattern observations are passed to the next. This enables cross-hotspot pattern detection — if hunter 2 finds that error handling is broken in module A, hunter 5 (investigating module B which shares error-handling utilities) gets that context.
Pass prior findings to each new hunter. As findings accumulate, each subsequent hunter receives confirmed bugs and observed patterns from previous investigations.
4. Synthesize Findings
After all hunters have reported, synthesize:
Cross-cutting analysis:
- Do confirmed bugs share a common root cause or pattern?
- Are there systemic issues (e.g., a utility function used across 10 modules is buggy, but only one module was a hotspot)?
- Do coverage improvements from invalidated hypotheses reveal areas worth further investigation?
Pattern escalation:
- If multiple hunters report the same pattern (e.g., "error handling is inconsistent"), note this as a systemic issue even if individual instances are low severity
- Systemic patterns may warrant additional investigation or a follow-up
/refactor
5. Present Consolidated Findings
Compile all findings into a single report:
## Bug Hunt Summary
Scope: [what was analyzed]
Assessment: [N hotspots identified across X files]
Hotspots investigated: [N]
Confirmed bugs: N (X critical, Y high, Z medium, W low)
Coverage improvements: N tests added
Systemic patterns: N
## CONFIRMED BUGS
### CRITICAL
- **[file:line — description]**
- Bug: [concrete description]
- Root cause: [why it exists]
- Impact: [what happens in practice]
- Reproducing test: [test file:test name]
- Fix guidance: [what needs to change]
### HIGH
[same format]
### MEDIUM
[same format]
### LOW
[same format]
## SYSTEMIC PATTERNS
[Cross-cutting issues observed across multiple hotspots]
- [pattern] — observed in [locations] — suggests [recommendation]
## COVERAGE IMPROVEMENTS
[Tests added that didn't find bugs but improved coverage]
- [test name] in [file] — covers [what]
## SUSPECTED BUT UNCONFIRMED
[Issues suspected but not validated with tests — lower confidence]
- [description] — couldn't test because [reason]
## AREAS NOT INVESTIGATED
[Hotspots deprioritized or areas outside scope that may warrant future attention]
Present to user interactively. Walk through CRITICAL findings first. For each, explain the bug, the impact, and show the reproducing test. Let the user ask questions before moving on.
6. Cut Tickets
After presenting findings, propose a ticket structure based on the hunt's shape. Each hunt produces a different mix — concentrated CRITICALs in one module, a single systemic pattern across many modules, mostly coverage-improvements with few confirmed bugs — and the right ticket granularity depends on that shape. Rather than prescribe a fixed mapping, examine the findings and propose a structure that fits.