Diagnose unknown failures: broken local setup, environment mismatch, tool misbehavior, hook problems, CI vs local divergence, permission errors, runtime anomalies. Gather signals broadly, eliminate hypotheses systematically, report confirmed root cause + recommended next skill. No fixes — diagnosis only.
NOT for: known Python test failures with traceback (use /develop:debug); .claude/ config quality sweep (use /foundry:audit).
-
$ARGUMENTS: required — symptom, question, or failing command, e.g.:
"hooks not firing on Save""codex:codex-rescue agent exits 127 on this machine""/calibrate times out every run""CI fails but passes locally""uv run pytest can't find conftest.py"
-
--fast: optional flag — skip Step 4 adversarial Codex review; use when speed matters more than thoroughness or Codex unavailable.
If $ARGUMENTS empty or too vague, use AskUserQuestion: "What exactly is failing or behaving unexpectedly? Include the command and any error output you can share."
</inputs> <workflow>Task hygiene:
# audit-skip: resilience-replication
_FS=$(python "${CLAUDE_PLUGIN_ROOT:-plugins/foundry}/bin/resolve_shared_path.py" foundry skills/_shared 2>/dev/null || echo "plugins/foundry/skills/_shared") # timeout: 5000
Read $_FS/task-hygiene.md — follow task hygiene protocol.
Task tracking: TaskCreate tasks for Gather, Hypothesise, Probe, Report; mark in_progress/completed as you go.
Step 1: Parse symptom and scope
From $ARGUMENTS extract:
- What: specific failure or anomaly
- Where: local / CI / both; which tool or command; which skill or hook if applicable
- When: started recently (after change) or always broken; intermittent or consistent
Unsupported flag check — after all supported flags extracted (--fast), scan $ARGUMENTS for remaining --<token> tokens. If found: print ! Unknown flag(s): \--<token>`. Supported: `--fast`.then invokeAskUserQuestion` — (a) Abort (stop, re-invoke with correct flags) · (b) Continue ignoring (skip unknown flags, proceed). On Abort: stop.
Step 2: Gather signals
Collect evidence in parallel — do NOT form hypotheses yet.
Tool versions and PATH:
which python && python --version # timeout: 5000
which uv 2>/dev/null && uv --version 2>/dev/null || echo "uv: not found" # timeout: 5000
node --version 2>/dev/null || echo "node: not found" # timeout: 5000
jq -e 'to_entries[] | select(.key | contains("codex")) | .value[].installPath' ~/.claude/plugins/installed_plugins.json 2>/dev/null | grep -q . && { jq -e '.enabledPlugins["codex@openai-codex"] == false' ~/.claude/settings.json >/dev/null 2>&1 && echo "codex (openai-codex): installed but disabled in settings.json" || echo "codex (openai-codex): enabled"; } || echo "codex (openai-codex): not found" # timeout: 5000
env | grep -E 'PATH|VIRTUAL_ENV|UV_|CLAUDE|HOME|SHELL|NODE' | grep -v -E '(_TOKEN|_KEY|_SECRET|_PASSWORD|_PASS)=' | sort # timeout: 5000
Recent changes:
git log --oneline -10 # timeout: 3000
git diff HEAD~3..HEAD --stat # timeout: 3000
Config state (when symptom involves Claude Code, hooks, or skills):
Use Read to check .claude/settings.json — look for hook registrations, allow entries relevant to failing command, and enabledMcpjsonServers. For ~/.claude/settings.json (outside allowed Read paths), use Bash:
jq . ~/.claude/settings.json # timeout: 5000
Logs (when symptom involves skill run, background agent, or hook):
Use Grep with pattern ERROR|WARN|failed|not found|exit across .claude/logs/, /tmp/, or relevant .reports/<skill>/ run dirs. Read last 50 lines of any relevant log file.
Capture all output before Step 3.
After gathering evidence, capture top signals as working notes for Step 4 spawn prompts:
SYMPTOM_DESCRIPTION— verbatim from$ARGUMENTSKEY_SIGNALS— write 3–5 bullet-point sentences summarizing the most diagnostic signals found above (tool versions, missing binaries, config anomalies, recent changes)
Step 3: Rank hypotheses
List candidate root causes ranked by probability, drawing only from gathered evidence:
| Rank | Hypothesis | Supporting evidence | Ruling-out test |
|---|---|---|---|
| 1 | … | … | … |
| 2 | … | … | … |
| 3 | … | … | … |
Capture the ranked hypothesis table as HYPOTHESIS_TABLE (inline text) for use in Step 4 spawn prompts.
Common categories:
- Environment mismatch — tool version differs; wrong virtualenv active; PATH missing entry
- Missing dependency — binary not on PATH; package not installed; module import fails
- Config / permission error — settings.json allow entry missing; hook path wrong; settings.local.json override
- State pollution — stale lock file, leftover tmp artifact, or cached state conflicts with current run
- Recent change regression — git commit or config edit introduced issue (check
git log) - Sync drift — project
.claude/and~/.claude/diverged; compare manually or/foundry:audit setup - External service — network unavailable, API rate-limited, or remote tool unreachable
Step 4: Auxiliary review (optional)
Skip entirely when --fast passed, or top hypothesis has strong direct evidence. (foundry:challenger is always available as part of foundry plugin, so the skip condition simplifies to: skip when --fast passed.) Skip → proceed to Step 5.
When --fast: mark Step 4 task as deleted (not completed — it was skipped).
Otherwise, set up adversarial review:
INVESTIGATE_RUN=".temp/investigate/$(date -u +%Y-%m-%dT%H-%M-%SZ)"
mkdir -p "$INVESTIGATE_RUN" # timeout: 5000
CODEX_OUT="$INVESTIGATE_RUN/codex-review.md"
# Resolve literal paths for spawn prompts — bash variables don't expand inside Agent() text blocks
INVESTIGATE_RUN_RESOLVED="$INVESTIGATE_RUN"
CODEX_OUT_RESOLVED="$CODEX_OUT"
echo "Run dir: $INVESTIGATE_RUN_RESOLVED" # timeout: 3000
Re-check Codex availability at point of use (bash variables don't persist across tool calls):
CODEX_AVAILABLE=false
# (1) plugin installed
jq -e 'to_entries[] | select(.key | contains("codex")) | .value[].installPath' ~/.claude/plugins/installed_plugins.json 2>/dev/null | grep -q . && CODEX_AVAILABLE=true # timeout: 5000
# (2) honor user opt-out — explicit `false` in enabledPlugins disables codex even when installed
if [ "$CODEX_AVAILABLE" = "true" ] && jq -e '.enabledPlugins["codex@openai-codex"] == false' ~/.claude/settings.json >/dev/null 2>&1; then
CODEX_AVAILABLE=false
printf " codex plugin installed but disabled in ~/.claude/settings.json — skipping codex review\n"
fi
If [ "$CODEX_AVAILABLE" = "true" ]: substitute $CODEX_OUT_RESOLVED (resolved above) for the literal output path in the spawn prompt — do not use $CODEX_OUT variable reference inside Agent() text:
Agent(subagent_type="codex:codex-rescue", prompt="Adversarial review of hypothesis quality — symptom: ${SYMPTOM_DESCRIPTION}, signals: ${KEY_SIGNALS}, hypothesis table: ${HYPOTHESIS_TABLE}. Challenge the top hypothesis, identify blindspots, and surface alternative root causes. Read-only. Write full findings to <CODEX_OUT_RESOLVED> using the Write tool. Return ONLY: {\"status\":\"done\",\"file\":\"<path>\",\"findings\":N,\"confidence\":0.N}")
Replace <CODEX_OUT_RESOLVED> with the value of $CODEX_OUT_RESOLVED printed in the bash block above before constructing the prompt.
Else (Codex unavailable): spawn foundry:challenger — substitute $INVESTIGATE_RUN_RESOLVED (resolved above) for the literal output path:
Agent(subagent_type="foundry:challenger", prompt="Adversarial review of hypothesis quality for this investigation — symptom: ${SYMPTOM_DESCRIPTION}, signals: ${KEY_SIGNALS}, hypothes