/evolve — Goal-Driven Compounding Loop
Cross-vendor analog: Anthropic Managed Agents Outcomes (May 2026). Both close the loop "agent runs → grader scores against a rubric → agent retries"; AgentOps does it locally against any model.
Measure what's wrong. Fix the worst thing. Measure again. Compound.
V2 command surface: keep the name evolve. Use ao evolve for the
terminal-native loop. It is the top-level operator entrypoint for
ao rpi loop --supervisor, preserving the old /evolve concept while reusing
the v2 RPI loop engine.
Operator cadence: post-mortem finished work, analyze the current repo state,
select or create the next highest-value work item, let /rpi handle research,
planning, pre-mortem, implementation, and validation, then harvest follow-ups
and repeat until a kill switch, max-cycle cap, regression breaker, or real
dormancy stops the run.
Always-on autonomous loop over /rpi. Work selection order:
- Harvested
.agents/rpi/next-work.jsonlwork (freshest concrete follow-up) - Open ready beads work (
bd ready) - Failing goals and directive gaps (
ao goals measure) - Testing improvements (missing/thin coverage, missing regression tests)
- Validation tightening and bug-hunt passes (gates, audits, bug sweeps)
- Complexity / TODO / FIXME / drift / dead code / stale docs / stale research mining
- Concrete feature suggestions derived from repo purpose when no sharper work exists
Work generators that feed the selection ladder (auto-invoked, skip with --no-lifecycle):
Skill(skill="test", args="coverage")→ files with <40% coverage become queue items (Step 3.4)Skill(skill="refactor", args="--sweep all --dry-run")→ functions with CC > 20 become queue items (Step 3.6)Skill(skill="deps", args="audit")→ deps with CVSS >= 7.0 or 2+ major versions behind become queue items (Step 3.5)Skill(skill="perf", args="profile --quick")→ perf findings become queue items when hot paths detected (Step 3.5)
Dormancy is last resort. Empty current queues mean "run the generator layers", not "stop". Only go dormant after the queue layers and generator layers come up empty across multiple consecutive passes.
/evolve # Run until kill switch, max-cycles, or real dormancy
/evolve --max-cycles=5 # Cap at 5 cycles
/evolve --dry-run # Show what would be worked on, don't execute
/evolve --beads-only # Skip goals measurement, work beads backlog only
/evolve --quality # Quality-first mode: prioritize post-mortem findings
/evolve --quality --max-cycles=10 # Quality mode with cycle cap
/evolve --compile # Mine → Defrag warmup before first cycle
/evolve --compile --max-cycles=5 # Warm knowledge base then run 5 cycles
/evolve --test-first # Default strict-quality /rpi execution path
/evolve --no-test-first # Explicit opt-out from test-first mode
Delineation vs /dream
| Lane | Runs | Mutates code? | Mutates corpus? | Outer loop? | Budget |
|---|---|---|---|---|---|
/dream | nightly, private local | No | Yes (heavy) | Yes (convergence) | wall-clock + plateau |
/evolve | daytime, operator-driven | Yes (via /rpi) | Yes (light) | Yes | cycle cap |
Dream owns the knowledge compounding layer; /evolve owns the code compounding layer. Both share fitness-measurement substrate via corpus.Compute / ao goals measure. Run Dream overnight, then start each day with /evolve against the freshly-compounded corpus with a clean fitness baseline.
Flags
| Flag | Default | Description |
|---|---|---|
--max-cycles=N | unlimited | Stop after N completed cycles |
--dry-run | off | Show planned cycle actions without executing |
--beads-only | off | Skip goal measurement and run backlog-only selection |
--skip-baseline | off | Skip first-run baseline snapshot |
--quality | off | Prioritize harvested post-mortem findings |
--compile | off | Run ao mine + ao defrag warmup before cycle 1 |
--test-first | on | Pass strict-quality defaults through to /rpi |
--no-test-first | off | Explicitly disable test-first passthrough to /rpi |
--no-lifecycle | off | Skip lifecycle work generators in Steps 3.4-3.6 (/test, /deps, /perf, /refactor). Falls back to manual scanning. |
--mode=burst|loop | burst | Operator-loop; STOP refused. loop-mode.md. |
Execution Steps
YOU MUST EXECUTE THIS WORKFLOW. Do not just describe it.
FULLY AUTONOMOUS. Read references/autonomous-execution.md. Every /rpi uses --auto. Do NOT ask the user anything. Each cycle = complete 3-phase /rpi run.
For broad AgentOps 3.0 domain evolution across skills, CLI, hooks, docs, tests, beads, and knowledge, first read references/domain-evolution-bootstrap.md. It supplies the BDD/DDD/Hexagonal/TDD/XP control surface and the clean-room skill-factory guardrails.
Step 0: Setup
mkdir -p .agents/evolve
ao corpus inject --query "autonomous improvement cycle" --limit 5 2>/dev/null || true
bash scripts/evolve-update-session-state.sh 2>/dev/null || true # refresh derived idle_streak + mode_repeat_streak
ao corpus inject routes through the typed BC1 CorpusReaderPort
(cli/cmd/ao/corpus_reader_adapter.go, cycle 112 productionCorpusReader),
emitting one ranked ports.CorpusItem JSON record per line from
.agents/learnings/ by default. This closes soc-y5vh.1 — Step 0 prior-knowledge
retrieval is now load-bearing on the typed port, not an untyped ao lookup
shell-out.
Apply retrieved knowledge: If learnings are returned, check each for applicability to the current improvement cycle. For applicable learnings, cite by filename and record: ao metrics cite "<path>" --type applied 2>/dev/null || true
Prior-failure injection (mandatory): read the last 3 entries of .agents/evolve/cycle-history.jsonl. For any with gate containing FAIL|FAILED|BLOCKED, extract failure-surface keywords (registry|bats|markdown|supergate|canary|coverage|toolchain) and search .agents/learnings/ for matching learnings. Print the top matches before work selection. Without this read path, the loop accumulates write-only ledgers and re-derives lessons each cycle. See references/convergence-mechanics.md for the full recipe.
Before cycle recovery, load the repo execution profile contract when it exists. The repo execution profile is the source for repo policy; the user prompt should mostly supply mission/objective, not restate startup reads, validation bundle, tracker wrapper rules, or definition_of_done.
- Locate
docs/contracts/repo-execution-profile.mdanddocs/contracts/repo-execution-profile.schema.json. - Read the ordered
startup_readsand bootstrap from those repo paths before selecting work. - Cache repo
validation_commands,tracker_commands, anddefinition_of_doneinto session state. - If the repo execution profile is present but missing required fields, stop or downgrade with an explicit warning before cycle 1. Do not silently invent repo policy.
- Read operating-doctrine ADRs (
docs/adr/ordocs/decisions/) when present — intent the loop re-reads each cycle: only operator markers stop the loop; the bead queue is a hypothesis re-confirmed against the goal, not spec; file-a-bead when a candidate is architecture disguised as bounded work.
Then load the repo-local autodev program contract when it exists. The execution profile remains the repo bootstrap and landing-policy layer; PROGRAM.md or AUTODEV.md is the repo-local execution layer for the current improvement loop.
- Locate
PROGRAM.mdandAUTODEV.md.PROGRAM.mdtakes precedence. - Read the resolved program before cycle recovery and cache
program_path,mutable_scope,immutable_scope,validation_commands,decision_policy, an