Sustained metric-improvement loop — reads program.md, iterates specialist ideation agents, commits atomically, auto-rolls back on regression. For long-running automated improvement campaigns.
NOT for: methodology validation before run (use /research:judge); hypothesis generation (use research:scientist agent); one-off feature work (use /develop:feature).
Campaign mode only:
MAX_ITERATIONS: 20 (hard cap — program.md values above 20 are silently clamped with a warning; 20 is the effective maximum)
MAX_CODEX_RUNS: 10 (cost ceiling for --codex Phase 2c — disable Codex once exceeded)
STUCK_THRESHOLD: 5 consecutive discards → escalation
GUARD_REWORK_MAX: 2 attempts before revert
VERIFY_TIMEOUT_SEC: 120 (local), 300 (--colab)
COLAB_KNOWN_HW: H100, L4, T4, A100
SUMMARY_INTERVAL: 10 iterations
DIMINISHING_RETURNS_WINDOW: 5 iterations < 0.5% each → warn user and suggest stopping
STATE_DIR: .experiments/state/<run-id>/ (timestamped dir per run — see .claude/rules/artifact-lifecycle.md)
CLAUDE_SKILL_DIR: ""
SENTINEL_SLUG_FORMULA: |
REPO_SLUG=$(git rev-parse --show-toplevel | xargs basename | tr '[:upper:]' '[:lower:]' | tr -cs 'a-z0-9' '-' | tr -s '-' | sed 's/-$//')
BRANCH_SLUG=$(git branch --show-current | tr '[:upper:]' '[:lower:]' | tr -cs 'a-z0-9' '-' | tr -s '-' | sed 's/-$//')
# Sentinel path: ${TMPDIR:-/tmp}/claude-commit-auth-${REPO_SLUG}-${BRANCH_SLUG}
# Bash state is lost between tool calls — re-derive at each use site; this formula is the only authorized form.
<!-- Note: STATE_DIR (.experiments/state/) holds per-iteration artifacts (diary, experiments.jsonl).
Hypothesis pipeline outputs (hypotheses.jsonl, checkpoint.json, journal.md) go to .experiments/<run-id>/ (RUN_DIR).
These are two separate directories by design — see protocol.md for layout. -->
Agent strategy mapping (agent_strategy in config → ideation agent to spawn):
agent_strategy | Specialist agent | When to use |
|---|---|---|
auto | heuristic | Default — infer from metric_cmd keywords |
perf | foundry:perf-optimizer | latency, throughput, memory, GPU utilization |
code | foundry:sw-engineer | coverage, complexity, lines, coupling |
ml | research:scientist | accuracy, loss, F1, AUC, BLEU |
arch | foundry:solution-architect | coupling, cohesion, modularity metrics |
Auto-inference keyword heuristics (when agent_strategy: auto or omitted; checked against ## Goal text AND metric command):
Precedence order (first match wins; ML keywords take priority over test-framework keywords). ML-specific compound terms (not bare tokens) required to prevent over-triggering on eval/train/val as common words:
- contains
accuracy,loss(when paired withtrain_loss/val_loss/eval_loss),f1_score,auc_roc,auroc,train_step,val_acc,eval_loss,epoch,gradient,tensor, OR explicit--scientistflag →ml→research:scientist - contains
time,latency,bench,throughput,memory→perf→foundry:perf-optimizer - contains
pytest,coverage,complexity→code→foundry:sw-engineer - no keyword match →
perf(default fallback)
Bare tokens eval, train, val (without compound suffix) do NOT trigger ml routing — too common in non-ML contexts (test eval scripts, training-environment configs, validator command names).
Stuck escalation sequence (at STUCK_THRESHOLD consecutive discards):
-
Switch to different agent type. Rotation by current strategy:
Current strategy Next strategy Escalation agent codemlresearch:scientistmlperffoundry:perf-optimizerperfcodefoundry:sw-engineerarchcodefoundry:sw-engineer(fallbackfoundry:solution-architectif sw-engineer unavailable)autoinfer from resolved strategy follow rotation row for whichever concrete strategy autoheuristics resolved to at Step R3 (e.g.auto→ resolvedml→ nextperf→foundry:perf-optimizer) -
Spawn 2 agents parallel with competing strategies; each writes full analysis to
.experiments/state/<run-id>/stuck-escalation-<i>-<agent-type>.md, returns ONLY compact JSON envelope. Use this spawn prompt verbatim (substitute<run-id>,<i>, and strategy):Stuck-escalation handoff — iteration <i> after STUCK_THRESHOLD consecutive discards. Read `.experiments/state/<run-id>/state.json` for goal, best_metric, baseline, config. Read `.experiments/state/<run-id>/experiments.jsonl` for full iteration history. Read `.experiments/state/<run-id>/diary.md` for qualitative context (what was tried, why reverted). Read `.experiments/state/<run-id>/context-<i>.md` for current iteration's context block. Continue from the last completed iteration (do NOT restart from iteration 0). Write your full analysis and proposed change to `.experiments/state/<run-id>/stuck-escalation-<i>-<your-strategy>.md`. Write a resume point to `.experiments/state/<run-id>/resume.json`: {iteration: <i>, strategy: "<your-strategy>", proposed_change: "<one-line description>"}. Return ONLY: {"strategy":"<your-strategy>","description":"...","files_modified":[...],"confidence":0.N,"file":".experiments/state/<run-id>/stuck-escalation-<i>-<your-strategy>.md"}Consolidation: pick whichever returns delta ≥ 0.1% AND guard pass; if both qualify, pick higher delta.
-
Stop, report progress, surface to user — no blind looping
Agent Resolution
_RESEARCH_SHARED=$(python "${CLAUDE_PLUGIN_ROOT:-plugins/research}/bin/resolve_shared.py" 2>/dev/null) # timeout: 5000
CLAUDE_SKILL_DIR resolution — constants block provides default plugins/research/skills/run (source-tree path). Resolve to installed path before use:
CLAUDE_SKILL_DIR=$(ls -td ~/.claude/plugins/cache/borda-ai-rig/research/*/skills/run 2>/dev/null | head -1)
[ -z "$CLAUDE_SKILL_DIR" ] && CLAUDE_SKILL_DIR="$(git rev-parse --show-toplevel 2>/dev/null)/plugins/research/skills/run"
Read $_RESEARCH_SHARED/agent-resolution.md. Contains: foundry check + fallback table. If foundry not installed: use table to substitute each foundry:X with general-purpose. Agents this skill uses: foundry:sw-engineer, foundry:linting-expert, foundry:perf-optimizer, foundry:solution-architect, research:scientist.
Default Mode (Steps R1–R7)
Triggered by run <goal|file.md>.
Task tracking: create tasks R0–R7 at start. If no --researcher/--architect, mark R0 skipped. If --codex active, create task R5b: Codex co-pilot (iter ?/max) status pending.
Step R0: Hypothesis pre-phase (--researcher / --architect)
If no --researcher/--architect, skip to R1.
Read ${CLAUDE_SKILL_DIR}/modes/hypothesis-pipeline.md
Per-iteration hypothesis selection (when --researcher/--architect set, inside R5 loop): pop next from RESEARCH_QUEUE. Append to Phase 2 prompt: "Focus this iteration on testing this hypothesis: <hypothesis text>."
Per-iteration journal hook (inside R5, after Phase 7): if --journal active, append entry to <RUN_DIR>/journal.md after EVERY iteration — regardless of outcome. Entry format: protocol.md (companion file, same skill dir). Journals record kept and reverted iterations so ideation agent learns failed approaches.
Per-iteration checkpoint write (after Phase 7): if --researcher/--architect active, append one line to <RUN_DIR>/checkpoint.json per schema in protocol.md (companion file, same skill dir): {iteration, hypothesis_id, metric_before, metric_after, status: "passed"|"rolled_back"}.
Step R1: Load / build config
--resume flag detection: if --resume in args,