Test-first refactoring. Audit coverage, add characterization tests if missing, apply changes with safety net.
NOT for:
- bug fixes (use
/develop:fix) - new features (use
/develop:feature) .claude/config changes (use/foundry:manage(requires foundry plugin))- non-Python projects (JS/TS/Go/Rust) — toolchain assumes pytest; use language-native toolchain instead
- mixed refactor+feature tasks — run /develop:refactor first, then /develop:feature; do not attempt both in single skill run
- MAX_INNER_CYCLES: 5 (change-test cycles per outer session — Step 4 safety break)
Agent Resolution
_PATHS=$(python "${CLAUDE_PLUGIN_ROOT:-plugins/develop}/bin/dev_shared_resolve.py" --foundry 2>/dev/null) # timeout: 5000
_DEV_SHARED=$(echo "$_PATHS" | head -1)
_FOUNDRY_SHARED=$(echo "$_PATHS" | tail -1)
Read $_DEV_SHARED/agent-resolution.md. Contains: foundry check + fallback table. If foundry not installed: use table to substitute each foundry:X with general-purpose. Agents skill uses: foundry:sw-engineer, foundry:qa-specialist, foundry:linting-expert, foundry:challenger.
Read $_DEV_SHARED/task-hygiene.md.
Anti-Rationalizations
| Temptation | Reality |
|---|---|
| "The code is simple enough — I can skip characterization tests" | No safety net = no proof behavior unchanged. Characterization tests only proof. |
| "I'll fix this adjacent bug while I'm in here" | Scope creep conflates history. Adjacent bugs go in Follow-up, not this session. |
| "The tests are too brittle — I'll refactor them as well" | Refactoring tests + prod code simultaneously makes regressions unattributable. Fix tests first, separate pass. |
| "I know the codebase — no need for coverage audit" | Untested edge cases = most common refactoring breakage. Audit finds what you don't know you don't know. |
| "This is a small change — Step 4's max-5 cycles are overkill" | Simple changes = simple test loops. Guard costs nothing when unneeded; prevents runaway sessions when it is. |
Project Detection
Read $_DEV_SHARED/runner-detection.md — sets $TEST_CMD (full suite) and $PYTEST_CMD (pytest flags). Run at skill start.
Optional --plan <path>: if $ARGUMENTS ends with --plan <path>, read plan file first. Extract Affected files, Risks, Suggested approach — use to inform Step 1 scope analysis. Skip redundant codebase exploration for already-classified files. Store plan path as PLAN_FILE.
Read $_DEV_SHARED/preflight-helpers.md — execute --plan path extraction; sets $PLAN_FILE.
Checkpoint init: run DEV_DIR=$(python "${CLAUDE_PLUGIN_ROOT:-plugins/develop}/bin/dev_run_dir.py" 2>/dev/null) # timeout: 5000 to create .developments/<TS>/ and capture path. Write checkpoint.md inside $DEV_DIR. After each major step (1, 2, 3, 4, 5), append step: N — completed to $DEV_DIR/checkpoint.md. On skill start, check for existing .developments/*/checkpoint.md — offer resume from last completed step if found.
Flag parsing
Parse flags into actual shell variables (not prose) so downstream blocks see correct values. Persist to temp files for cross-block access (bash state lost between Bash() calls):
# timeout: 5000
CHALLENGE_ENABLED=true
SEMBLE_ENABLED=false
TEAM_MODE=false
ACCEPT_NO_PLAN=false
[[ " $ARGUMENTS " == *" --no-challenge "* ]] && CHALLENGE_ENABLED=false
[[ " $ARGUMENTS " == *" --semble "* ]] && SEMBLE_ENABLED=true
[[ " $ARGUMENTS " == *" --team "* ]] && TEAM_MODE=true
[[ " $ARGUMENTS " == *" --accept-no-plan "* ]] && ACCEPT_NO_PLAN=true
echo "$CHALLENGE_ENABLED" > ${TMPDIR:-/tmp}/dev-challenge-enabled
echo "$SEMBLE_ENABLED" > ${TMPDIR:-/tmp}/dev-semble-enabled
echo "$TEAM_MODE" > ${TMPDIR:-/tmp}/dev-team-mode
echo "$ACCEPT_NO_PLAN" > ${TMPDIR:-/tmp}/dev-accept-no-plan
Downstream blocks read back, e.g. TEAM_MODE=$(cat ${TMPDIR:-/tmp}/dev-team-mode 2>/dev/null || echo false).
CODEMAP_ENABLED raw flag — scan $ARGUMENTS: --no-codemap → off; --codemap (without preceding --no-) → strict; else → auto. Substitute value below.
Unsupported flag check — after all supported flags extracted, scan $ARGUMENTS for remaining --<token> tokens. If found: print ! Unknown flag(s): \--<token>`. Supported: `--plan`, `--team`, `--no-challenge`, `--codemap`, `--no-codemap`, `--accept-no-plan`, `--semble`.then invokeAskUserQuestion` — (a) Abort (stop, re-invoke with correct flags) · (b) Continue ignoring (skip unknown flags, proceed). On Abort: stop.
Codemap auto-detection — run after flag parsing, substituting raw flag (off/strict/auto) from rule above:
CODEMAP_ENABLED=$("${CLAUDE_PLUGIN_ROOT:-plugins/develop}/bin/codemap-resolve" "<off|strict|auto>") || exit 1 # timeout: 5000
echo "$CODEMAP_ENABLED" > ${TMPDIR:-/tmp}/dev-codemap-enabled
Preflight — if CODEMAP_ENABLED=true:
Read $_DEV_SHARED/preflight-helpers.md — execute codemap + semble preflight if respective flags set.
Step 1: Scope and understand
Read target code, build mental model before touching anything.
If <target> is directory: use Glob tool (pattern **/*.py, path <target>) to enumerate Python files.
# Measure current state
find <target> -name '*.py' -exec wc -l {} + 2>/dev/null | tail -1
If CODEMAP_ENABLED=true or SEMBLE_ENABLED=true: read $_DEV_SHARED/codemap-context.md and follow enabled sections (codemap block if CODEMAP_ENABLED, semble companion if SEMBLE_ENABLED). Skip if both false.
Multi-file / API-change scope — extended codemap scan (only when CODEMAP_ENABLED=true): if target is directory, spans multiple files, or goal mentions renaming/restructuring public API (i.e., refactoring NOT limited to internals of single function or class with unchanged public interface):
# Derive project name and affected modules
PROJ=$(basename "$(git rev-parse --show-toplevel 2>/dev/null)") # timeout: 3000
# Affected modules from <target> path: strip src/ prefix, drop .py, slash→dot
REFACTOR_FILES=$(find <target> -name '*.py' -type f 2>/dev/null)
AFFECTED_MODULES=$(echo "$REFACTOR_FILES" | sed 's|^\./||;s|^src/||;s|\.py$||;s|/|.|g' | grep . || echo "")
if command -v scan-query >/dev/null 2>&1 && [ -f ".cache/scan/${PROJ}.json" ] && [ -n "$AFFECTED_MODULES" ]; then
# Reusability: who calls each affected module outside the refactoring scope
while IFS= read -r mod; do
scan-query rdeps "$mod" 2>/dev/null
done <<< "$AFFECTED_MODULES"
# Tightest coupling pairs — determines refactor sequence and what must change together
scan-query coupled --top 10
fi
Include ## Scope & Reusability (codemap) block in foundry:sw-engineer spawn prompt. If rdeps returns callers outside refactoring scope: flag explicitly — those callers must update or refactoring silently breaks public contract. If CODEMAP_ENABLED=false and scope is multi-file: skip silently.
Spawn foundry:sw-engineer agent to analyze code and identify:
- Public API surface (functions, classes, methods external code calls)
- Internal complexity hotspots (cyclomatic complexity, deep nesting, long functions)
- Code smells relevant to stated goal
- Dependencies and coupling between modules
- Complexity smell: directory or cross-module scope — flag it; consider team mode
Goal classification gate: after sw-engineer analysis completes, scan goal text for mixed signals — if goal contains both refactor keywords (rename, extract, restructure, decouple, consolidate) AND feature keywords (add, implement, new, support), invoke AskUserQuestion: "Goal mixes refactoring and feature work — split into two runs." · (a) Abort — run refactor first, then feature · (b) Continue as refactor-only — treat feature additions as out of scope.
**Scope