/evolve — Goal-Driven Compounding Loop

Cross-vendor analog: Anthropic Managed Agents Outcomes (May 2026). Both close the loop "agent runs → grader scores against a rubric → agent retries"; AgentOps does it locally against any model.

Measure what's wrong. Fix the worst thing. Measure again. Compound.

V2 command surface: keep the name evolve. Use ao evolve for the terminal-native loop. It is the top-level operator entrypoint for ao rpi loop --supervisor, preserving the old /evolve concept while reusing the v2 RPI loop engine.

Operator cadence: post-mortem finished work, analyze the current repo state, select or create the next highest-value work item, let /rpi handle research, planning, pre-mortem, implementation, and validation, then harvest follow-ups and repeat until a kill switch, max-cycle cap, regression breaker, or real dormancy stops the run.

Always-on autonomous loop over /rpi. Work selection order:

Harvested .agents/rpi/next-work.jsonl work (freshest concrete follow-up)
Open ready beads work (bd ready)
Failing goals and directive gaps (ao goals measure)
Testing improvements (missing/thin coverage, missing regression tests)
Validation tightening and bug-hunt passes (gates, audits, bug sweeps)
Complexity / TODO / FIXME / drift / dead code / stale docs / stale research mining
Concrete feature suggestions derived from repo purpose when no sharper work exists

Work generators that feed the selection ladder (auto-invoked, skip with --no-lifecycle):

Skill(skill="test", args="coverage") → files with <40% coverage become queue items (Step 3.4)
Skill(skill="refactor", args="--sweep all --dry-run") → functions with CC > 20 become queue items (Step 3.6)
Skill(skill="deps", args="audit") → deps with CVSS >= 7.0 or 2+ major versions behind become queue items (Step 3.5)
Skill(skill="perf", args="profile --quick") → perf findings become queue items when hot paths detected (Step 3.5)

Dormancy is last resort. Empty current queues mean "run the generator layers", not "stop". Only go dormant after the queue layers and generator layers come up empty across multiple consecutive passes.

/evolve                      # Run until kill switch, max-cycles, or real dormancy
/evolve --max-cycles=5       # Cap at 5 cycles
/evolve --dry-run            # Show what would be worked on, don't execute
/evolve --beads-only         # Skip goals measurement, work beads backlog only
/evolve --quality            # Quality-first mode: prioritize post-mortem findings
/evolve --quality --max-cycles=10  # Quality mode with cycle cap
/evolve --compile            # Mine → Defrag warmup before first cycle
/evolve --compile --max-cycles=5 # Warm knowledge base then run 5 cycles
/evolve --test-first         # Default strict-quality /rpi execution path
/evolve --no-test-first      # Explicit opt-out from test-first mode

Delineation vs /dream

Lane	Runs	Mutates code?	Mutates corpus?	Outer loop?	Budget
`/dream`	nightly, private local	No	Yes (heavy)	Yes (convergence)	wall-clock + plateau
`/evolve`	daytime, operator-driven	Yes (via `/rpi`)	Yes (light)	Yes	cycle cap

Dream owns the knowledge compounding layer; /evolve owns the code compounding layer. Both share fitness-measurement substrate via corpus.Compute / ao goals measure. Run Dream overnight, then start each day with /evolve against the freshly-compounded corpus with a clean fitness baseline.

Flags

Flag	Default	Description
`--max-cycles=N`	unlimited	Stop after `N` completed cycles
`--dry-run`	off	Show planned cycle actions without executing
`--beads-only`	off	Skip goal measurement and run backlog-only selection
`--skip-baseline`	off	Skip first-run baseline snapshot
`--quality`	off	Prioritize harvested post-mortem findings
`--compile`	off	Run `ao mine` + `ao defrag` warmup before cycle 1
`--test-first`	on	Pass strict-quality defaults through to `/rpi`
`--no-test-first`	off	Explicitly disable test-first passthrough to `/rpi`
`--no-lifecycle`	off	Skip lifecycle work generators in Steps 3.4-3.6 (/test, /deps, /perf, /refactor). Falls back to manual scanning.
`--mode=burst\|loop`	burst	Operator-loop; STOP refused. loop-mode.md.

Execution Steps

YOU MUST EXECUTE THIS WORKFLOW. Do not just describe it.

FULLY AUTONOMOUS. Read references/autonomous-execution.md. Every /rpi uses --auto. Do NOT ask the user anything. Each cycle = complete 3-phase /rpi run.

For broad AgentOps 3.0 domain evolution across skills, CLI, hooks, docs, tests, beads, and knowledge, first read references/domain-evolution-bootstrap.md. It supplies the BDD/DDD/Hexagonal/TDD/XP control surface and the clean-room skill-factory guardrails.

Step 0: Setup

mkdir -p .agents/evolve
ao corpus inject --query "autonomous improvement cycle" --limit 5 2>/dev/null || true
bash scripts/evolve-update-session-state.sh 2>/dev/null || true  # refresh derived idle_streak + mode_repeat_streak

ao corpus inject routes through the typed BC1 CorpusReaderPort (cli/cmd/ao/corpus_reader_adapter.go, cycle 112 productionCorpusReader), emitting one ranked ports.CorpusItem JSON record per line from .agents/learnings/ by default. This closes soc-y5vh.1 — Step 0 prior-knowledge retrieval is now load-bearing on the typed port, not an untyped ao lookup shell-out.

Apply retrieved knowledge: If learnings are returned, check each for applicability to the current improvement cycle. For applicable learnings, cite by filename and record: ao metrics cite "<path>" --type applied 2>/dev/null || true

Prior-failure injection (mandatory): read the last 3 entries of .agents/evolve/cycle-history.jsonl. For any with gate containing FAIL|FAILED|BLOCKED, extract failure-surface keywords (registry|bats|markdown|supergate|canary|coverage|toolchain) and search .agents/learnings/ for matching learnings. Print the top matches before work selection. Without this read path, the loop accumulates write-only ledgers and re-derives lessons each cycle. See references/convergence-mechanics.md for the full recipe.

Before cycle recovery, load the repo execution profile contract when it exists. The repo execution profile is the source for repo policy; the user prompt should mostly supply mission/objective, not restate startup reads, validation bundle, tracker wrapper rules, or definition_of_done.

Locate docs/contracts/repo-execution-profile.md and docs/contracts/repo-execution-profile.schema.json.
Read the ordered startup_reads and bootstrap from those repo paths before selecting work.
Cache repo validation_commands, tracker_commands, and definition_of_done into session state.
If the repo execution profile is present but missing required fields, stop or downgrade with an explicit warning before cycle 1. Do not silently invent repo policy.
Read operating-doctrine ADRs (docs/adr/ or docs/decisions/) when present — intent the loop re-reads each cycle: only operator markers stop the loop; the bead queue is a hypothesis re-confirmed against the goal, not spec; file-a-bead when a candidate is architecture disguised as bounded work.

Then load the repo-local autodev program contract when it exists. The execution profile remains the repo bootstrap and landing-policy layer; PROGRAM.md or AUTODEV.md is the repo-local execution layer for the current improvement loop.

Locate PROGRAM.md and AUTODEV.md. PROGRAM.md takes precedence.
Read the resolved program before cycle recovery and cache program_path, mutable_scope, immutable_scope, validation_commands, decision_policy, an

evolve

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

claude-api

skill-creator

oh-my-issues

claude-mem

Recibe nuevas skills de Desenvolvimento todos los lunes

/evolve — Goal-Driven Compounding Loop

Delineation vs /dream

Flags

Execution Steps

Step 0: Setup

Comentarios · Sin comentarios