Sustained metric-improvement loop — reads program.md, iterates specialist ideation agents, commits atomically, auto-rolls back on regression. For long-running automated improvement campaigns.

NOT for: methodology validation before run (use /research:judge); hypothesis generation (use research:scientist agent); one-off feature work (use /develop:feature).

</objective> <constants>

Campaign mode only:

MAX_ITERATIONS:             20 (hard cap — program.md values above 20 are silently clamped with a warning; 20 is the effective maximum)
MAX_CODEX_RUNS:             10 (cost ceiling for --codex Phase 2c — disable Codex once exceeded)
STUCK_THRESHOLD:            5 consecutive discards → escalation
GUARD_REWORK_MAX:           2 attempts before revert
VERIFY_TIMEOUT_SEC:         120 (local), 300 (--colab)
COLAB_KNOWN_HW:             H100, L4, T4, A100
SUMMARY_INTERVAL:           10 iterations
DIMINISHING_RETURNS_WINDOW: 5 iterations < 0.5% each → warn user and suggest stopping
STATE_DIR:                  .experiments/state/<run-id>/  (timestamped dir per run — see .claude/rules/artifact-lifecycle.md)
CLAUDE_SKILL_DIR:           ""
SENTINEL_SLUG_FORMULA: |
  REPO_SLUG=$(git rev-parse --show-toplevel | xargs basename | tr '[:upper:]' '[:lower:]' | tr -cs 'a-z0-9' '-' | tr -s '-' | sed 's/-$//')
  BRANCH_SLUG=$(git branch --show-current | tr '[:upper:]' '[:lower:]' | tr -cs 'a-z0-9' '-' | tr -s '-' | sed 's/-$//')
  # Sentinel path: ${TMPDIR:-/tmp}/claude-commit-auth-${REPO_SLUG}-${BRANCH_SLUG}
  # Bash state is lost between tool calls — re-derive at each use site; this formula is the only authorized form.

Agent strategy mapping (agent_strategy in config → ideation agent to spawn):

`agent_strategy`	Specialist agent	When to use
`auto`	heuristic	Default — infer from metric_cmd keywords
`perf`	`foundry:perf-optimizer`	latency, throughput, memory, GPU utilization
`code`	`foundry:sw-engineer`	coverage, complexity, lines, coupling
`ml`	`research:scientist`	accuracy, loss, F1, AUC, BLEU
`arch`	`foundry:solution-architect`	coupling, cohesion, modularity metrics

Auto-inference keyword heuristics (when agent_strategy: auto or omitted; checked against ## Goal text AND metric command):

Precedence order (first match wins; ML keywords take priority over test-framework keywords). ML-specific compound terms (not bare tokens) required to prevent over-triggering on eval/train/val as common words:

contains accuracy, loss (when paired with train_loss/val_loss/eval_loss), f1_score, auc_roc, auroc, train_step, val_acc, eval_loss, epoch, gradient, tensor, OR explicit --scientist flag → ml → research:scientist
contains time, latency, bench, throughput, memory → perf → foundry:perf-optimizer
contains pytest, coverage, complexity → code → foundry:sw-engineer
no keyword match → perf (default fallback)

Bare tokens eval, train, val (without compound suffix) do NOT trigger ml routing — too common in non-ML contexts (test eval scripts, training-environment configs, validator command names).

Stuck escalation sequence (at STUCK_THRESHOLD consecutive discards):

Switch to different agent type. Rotation by current strategy:

Current strategy	Next strategy	Escalation agent
`code`	`ml`	`research:scientist`
`ml`	`perf`	`foundry:perf-optimizer`
`perf`	`code`	`foundry:sw-engineer`
`arch`	`code`	`foundry:sw-engineer` (fallback `foundry:solution-architect` if sw-engineer unavailable)
`auto`	infer from resolved strategy	follow rotation row for whichever concrete strategy `auto` heuristics resolved to at Step R3 (e.g. `auto` → resolved `ml` → next `perf` → `foundry:perf-optimizer`)

Spawn 2 agents parallel with competing strategies; each writes full analysis to .experiments/state/<run-id>/stuck-escalation-<i>-<agent-type>.md, returns ONLY compact JSON envelope. Use this spawn prompt verbatim (substitute <run-id>, <i>, and strategy):

Stuck-escalation handoff — iteration <i> after STUCK_THRESHOLD consecutive discards.
Read `.experiments/state/<run-id>/state.json` for goal, best_metric, baseline, config.
Read `.experiments/state/<run-id>/experiments.jsonl` for full iteration history.
Read `.experiments/state/<run-id>/diary.md` for qualitative context (what was tried, why reverted).
Read `.experiments/state/<run-id>/context-<i>.md` for current iteration's context block.
Continue from the last completed iteration (do NOT restart from iteration 0).
Write your full analysis and proposed change to `.experiments/state/<run-id>/stuck-escalation-<i>-<your-strategy>.md`.
Write a resume point to `.experiments/state/<run-id>/resume.json`: {iteration: <i>, strategy: "<your-strategy>", proposed_change: "<one-line description>"}.
Return ONLY: {"strategy":"<your-strategy>","description":"...","files_modified":[...],"confidence":0.N,"file":".experiments/state/<run-id>/stuck-escalation-<i>-<your-strategy>.md"}

Consolidation: pick whichever returns delta ≥ 0.1% AND guard pass; if both qualify, pick higher delta.

Stop, report progress, surface to user — no blind looping

</constants> <workflow>

Agent Resolution

_RESEARCH_SHARED=$(python "${CLAUDE_PLUGIN_ROOT:-plugins/research}/bin/resolve_shared.py" 2>/dev/null)  # timeout: 5000

CLAUDE_SKILL_DIR resolution — constants block provides default plugins/research/skills/run (source-tree path). Resolve to installed path before use:

CLAUDE_SKILL_DIR=$(ls -td ~/.claude/plugins/cache/borda-ai-rig/research/*/skills/run 2>/dev/null | head -1)
[ -z "$CLAUDE_SKILL_DIR" ] && CLAUDE_SKILL_DIR="$(git rev-parse --show-toplevel 2>/dev/null)/plugins/research/skills/run"

Read $_RESEARCH_SHARED/agent-resolution.md. Contains: foundry check + fallback table. If foundry not installed: use table to substitute each foundry:X with general-purpose. Agents this skill uses: foundry:sw-engineer, foundry:linting-expert, foundry:perf-optimizer, foundry:solution-architect, research:scientist.

Default Mode (Steps R1–R7)

Triggered by run <goal|file.md>.

Task tracking: create tasks R0–R7 at start. If no --researcher/--architect, mark R0 skipped. If --codex active, create task R5b: Codex co-pilot (iter ?/max) status pending.

Step R0: Hypothesis pre-phase (`--researcher` / `--architect`)

If no --researcher/--architect, skip to R1.

Read ${CLAUDE_SKILL_DIR}/modes/hypothesis-pipeline.md

Per-iteration hypothesis selection (when --researcher/--architect set, inside R5 loop): pop next from RESEARCH_QUEUE. Append to Phase 2 prompt: "Focus this iteration on testing this hypothesis: <hypothesis text>."

Per-iteration journal hook (inside R5, after Phase 7): if --journal active, append entry to <RUN_DIR>/journal.md after EVERY iteration — regardless of outcome. Entry format: protocol.md (companion file, same skill dir). Journals record kept and reverted iterations so ideation agent learns failed approaches.

Per-iteration checkpoint write (after Phase 7): if --researcher/--architect active, append one line to <RUN_DIR>/checkpoint.json per schema in protocol.md (companion file, same skill dir): {iteration, hypothesis_id, metric_before, metric_after, status: "passed"|"rolled_back"}.

Step R1: Load / build config

--resume flag detection: if --resume in args,

run

How to add

Drop this on your repo README

Related skills

understand-dashboard

understand-chat

understand-domain

dev-browser

Get new Pesquisa e Web skills every Monday

Agent Resolution

Default Mode (Steps R1–R7)

Step R0: Hypothesis pre-phase (`--researcher` / `--architect`)

Step R1: Load / build config

Comments · No comments

How to add

Drop this on your repo README

Related skills

understand-dashboard

understand-chat

understand-domain

dev-browser

Get new Pesquisa e Web skills every Monday

Agent Resolution

Default Mode (Steps R1–R7)

Step R0: Hypothesis pre-phase (--researcher / --architect)

Step R1: Load / build config

Comments · No comments

Step R0: Hypothesis pre-phase (`--researcher` / `--architect`)