Ablation study runner — after /research:run finds improvements, fortify identifies which components contributed, generates ablation variants (remove one component at a time), runs each in isolated git worktrees (main repo never modified), ranks component importance, optionally generates reviewer Q&A calibrated to venue.

NOT for: initial optimization loop (use /research:run); methodology validation (use /research:judge); paper-vs-code consistency (use /research:verify); hypothesis generation (use research:scientist directly). Fortify runs ablation studies on completed runs only.

</objective> <constants>

MAX_ABLATION_CANDIDATES:  8 (ceiling — scientist produces 3–8; --max-ablations caps further)
METRIC_TIMEOUT_MS:        360000 (6 min — same as run SKILL.md)
GUARD_TIMEOUT_MS:         360000
GIT_OP_TIMEOUT_MS:        15000
SANITY_DIVERGENCE_PCT:    2.0 (full-variant vs best_metric mismatch threshold)
IMPORTANCE_CLASS_CRITICAL: 50.0 (% of full metric lost)
IMPORTANCE_CLASS_SIGNIFICANT: 10.0
FORTIFY_DIR_BASE:         .experiments
STATE_DIR_BASE:           .experiments/state
METRIC_CMD_DEFAULT:       "python -m pytest -x --tb=no -q"
GUARD_CMD_DEFAULT:        "git diff --stat HEAD"

</constants>

Environment overrides — set these before invoking the skill to override per-variant defaults:

METRIC_CMD — command run inside each variant worktree to measure the ablation metric (default: python -m pytest -x --tb=no -q)
GUARD_CMD — command run inside each variant worktree to detect regressions (default: git diff --stat HEAD)
STATE_DIR_BASE — base directory for source-run state lookups (default: .developments/fortify-state)
FORTIFY_DIR_BASE — base directory for per-run fortify artifacts (default: .developments/fortify)

The constants block defaults above are YAML-only — bash blocks read environment variables (with ${VAR:-default} fallback) and never source values directly from YAML.

CRITICAL: Worktree-based isolation

Do NOT use git checkout -b <branch> for ablations — dirties main working tree, corrupts concurrent tool calls. Each ablation gets own git worktree under $FORTIFY_DIR/worktrees/<variant>, created from best_commit. Main working tree NEVER modified. Cleanup: git worktree remove --force per variant; git worktree prune on interrupt.

Fortify Mode (Steps F1–F8)

Triggered by fortify or fortify <run-id|program.md>.

Task tracking: create tasks for F1, F2, F3, F4, F5, F6, F7, F8 at start — before any tool calls.

Step F1: Locate source run, parse flags, and validate judge approval

Extract flags: --venue <VENUE>, --max-ablations <N>, --skip-run.

Unsupported flag check — after all supported flags extracted, scan $ARGUMENTS for remaining --<token> tokens. If found: print ! Unknown flag(s): \--<token>`. Supported: `--venue`, `--max-ablations`, `--skip-run`.then invokeAskUserQuestion` — (a) Abort (stop, re-invoke with correct flags) · (b) Continue ignoring (skip unknown flags, proceed). On Abort: stop.

Input resolution (priority order):

Explicit <run-id> argument → read $STATE_DIR_BASE/<run-id>/state.json
Explicit <program.md> argument → scan $STATE_DIR_BASE/*/state.json for matching program_file, pick latest with status: completed or status: goal-achieved
No argument → scan $STATE_DIR_BASE/, pick latest with status: completed or status: goal-achieved

None found → stop:

fortify: No completed run found. Run /research:run first.

Initialize base directory bash variables (constants YAML is not auto-exported; environment overrides honored):

STATE_DIR_BASE="${STATE_DIR_BASE:-.developments/fortify-state}"
FORTIFY_DIR_BASE="${FORTIFY_DIR_BASE:-.developments/fortify}"
METRIC_CMD="${METRIC_CMD:-python -m pytest -x --tb=no -q}"
GUARD_CMD="${GUARD_CMD:-git diff --stat HEAD}"

Assign $RUN_ID from input resolution above (must be set before guard block uses it):

# Resolve RUN_ID from $ARGUMENTS (first non-flag token) or auto-detect latest completed run
_ARG1=$(echo "$ARGUMENTS" | awk '{print $1}')
if [ -n "$_ARG1" ] && [ "${_ARG1#-}" = "$_ARG1" ] && [ ! -f "$_ARG1" ] && [ -d "$STATE_DIR_BASE/$_ARG1" ]; then
  RUN_ID="$_ARG1"
elif [ -n "$_ARG1" ] && [ -f "$_ARG1" ]; then
  # program.md path — find latest completed run matching program_file  <!-- loads: find_run_id.py -->
  RUN_ID=$(python "${CLAUDE_PLUGIN_ROOT:-plugins/research}/bin/find_run_id.py" "$STATE_DIR_BASE" --match-program "$_ARG1" 2>/dev/null)
else
  RUN_ID=$(python "${CLAUDE_PLUGIN_ROOT:-plugins/research}/bin/find_run_id.py" "$STATE_DIR_BASE" 2>/dev/null)
fi
[ -z "$RUN_ID" ] && { echo "fortify: No completed run found. Run /research:run first."; exit 1; }

Guard: judge approval required. Judge skill writes verdict to .reports/research/judge-<branch>-<date>.md — scan for APPROVED verdict line:

JUDGE_VERDICT_FILE=$(ls -t .reports/research/judge-*.md 2>/dev/null | head -1)  # timeout: 5000
if [ -z "$JUDGE_VERDICT_FILE" ]; then
  echo "fortify: BLOCKED — no judge verdict found in .reports/research/."
  echo "Ablation studies require an approved baseline. Run: /research:judge <program.md>"
  exit 1
fi
# Preserve multi-word verdicts (e.g. "NEEDS REVISION") — strip trailing whitespace only, not internal spaces
JUDGE_VERDICT=$(grep -i '^[*]*[Vv]erdict[*]*:' "$JUDGE_VERDICT_FILE" | head -1 | sed 's/\*\*//g' | sed -E 's/.*[Vv]erdict[: ]+//' | sed 's/[[:space:]]*$//')

# Program cross-match: confirm verdict was issued for the current experiment's program, not a different one
PROGRAM_FILE=$(grep -iE '^[*]*(Program(_file)?|Program file)[*]*:' "$JUDGE_VERDICT_FILE" | head -1 | sed 's/\*\*//g' | sed -E 's/.*:[[:space:]]*//' | sed 's/[[:space:]]*$//')
# Use explicit run-specific state.json path (resolved during F1 input resolution) — never CWD-relative
STATE_PROGRAM=$(jq -r '.program_file // ""' "$STATE_DIR_BASE/$RUN_ID/state.json" 2>/dev/null)
if [ -n "$STATE_PROGRAM" ] && [ -n "$PROGRAM_FILE" ] && [ "$PROGRAM_FILE" != "$STATE_PROGRAM" ]; then
    printf "! BLOCKED — judge verdict references program '%s' but current experiment is for '%s'\n" "$PROGRAM_FILE" "$STATE_PROGRAM"
    printf "Run: /research:judge %s\n" "$STATE_PROGRAM"
    exit 1
fi
# Confirm program file still exists on disk
if [ -n "$PROGRAM_FILE" ] && [ ! -f "$PROGRAM_FILE" ]; then
    printf "! BLOCKED — program file %s referenced by judge verdict not found on disk\n" "$PROGRAM_FILE"
    exit 1
fi

Verify JUDGE_VERDICT == "APPROVED". The program cross-match above guarantees the verdict was issued for the current experiment — fortify cannot ablate against a different program's verdict. Apply explicit bash gate — prose alone never halts execution:

JUDGE_VERDICT="${JUDGE_VERDICT:-REJECTED}"
if [ "$JUDGE_VERDICT" != "APPROVED" ]; then
    echo "fortify: BLOCKED — no APPROVED judge verdict found for this program."
    echo "Judge verdict: $JUDGE_VERDICT — stopping before implementation (F1–F7)."
    echo "Ablation studies require an approved baseline. Run: /research:judge <program.md>"
    exit 1
fi

Note: do NOT infer from methodology.md alone — methodology_rating: sound is one input to verdict, not verdict itself. Only ## Verdict line in judge output file is authoritative.

Read from state.json: goal, best_metric, best_commit, config (including metric_cmd, guard_cmd, compute), program_file.

Also read baseline_commit — iteration 0 commit from experiments.jsonl (first line, status: "baseline", field "commit").

Pre-compute run directory (each in separate Bash call):

BRANCH=$(git branch --show-current 2>/dev/null | tr '/' '-' || echo 'main')  # timeout: 3000
TS=$(date -u +%Y-%m-%dT%H-%M-%SZ)                                           # timeout: 3000
FORTIFY_DIR="$FORTIFY_DIR_BASE/fortify-$TS"                                  # timeout: 5000
STATE_DI

fortify

How to add

Drop this on your repo README

Related skills

dev-browser

agent-browser

understand-chat

understand-dashboard

Get new Pesquisa e Web skills every Monday

CRITICAL: Worktree-based isolation

Fortify Mode (Steps F1–F8)

Step F1: Locate source run, parse flags, and validate judge approval

Comments · No comments