Auto Review Loop: Autonomous Research Improvement
🔒 Do not wrap this skill in
/loop,/schedule, orCronCreate. It already loops internally (review → fix → re-review) and the reviewer carries round-to-round memory in onethreadId(codex-reply). An external timer re-enters from the top each tick — freshthreadId, reviewer memory reset — firing the verdict on wall-clock time instead of on artifact change: zero new signal, full token cost. If you want to schedule something, schedule the external wait that precedes it (experiments done → then run this once). Seeshared-references/external-cadence.md.
Autonomously iterate: review → implement fixes → re-review, until the external reviewer gives a positive assessment or MAX_ROUNDS is reached.
Context: $ARGUMENTS
Constants
- MAX_ROUNDS = 4
- POSITIVE_THRESHOLD: score >= 6/10 AND verdict ∈ {"ready", "almost"} — both must hold. This matches the operative Phase-E STOP CONDITION exactly; the verdict vocabulary is {"ready", "almost", "not ready"} (a high score with a "not ready" verdict does NOT stop the loop). Earlier wording here used
orand a stale verdict set ("accept"/"sufficient"/"ready for submission") — that was an internal inconsistency; theANDform is authoritative. - REVIEW_DOC:
review-stage/AUTO_REVIEW.md(cumulative log) (fall back to./AUTO_REVIEW.mdfor legacy projects) - REVIEWER_MODEL =
gpt-5.5— Default model for the Codex backend. Must be an OpenAI model (e.g.,gpt-5.5,o3,gpt-4o). Manual backend uses whatever model the user chooses. - REVIEWER_BACKEND =
codex— Default: Codex MCP (xhigh). Override with— reviewer: oracle-profor Oracle MCP, or— reviewer: manualfor Manual Review MCP. If manual-review MCP is unavailable, stop and print the install command; do not fall back to Codex. Seeshared-references/reviewer-routing.md. - OUTPUT_DIR =
review-stage/— All review-stage outputs go here. Create the directory if it doesn't exist. - HUMAN_CHECKPOINT = false — When
true, pause after each round's review (Phase B) and present the score + weaknesses to the user. Wait for user input before proceeding to Phase C. The user can: approve the suggested fixes, provide custom modification instructions, skip specific fixes, or stop the loop early. Whenfalse(default), the loop runs fully autonomously. - COMPACT = false — When
true, (1) readEXPERIMENT_LOG.mdandfindings.mdinstead of parsing full logs on session recovery, (2) append key findings tofindings.mdafter each round. - REVIEWER_DIFFICULTY = medium — Controls how adversarial the reviewer is. Three levels:
medium(default): Current behavior — MCP-based review, the executor controls what context the reviewer sees.hard: Adds Reviewer Memory (the reviewer tracks its own suspicions across rounds) + Debate Protocol (the executor can rebut, the reviewer rules).nightmare: Everything inhard+ Codex exec reviewer reads the repo directly viacodex exec(the executor cannot filter what the reviewer sees) + Adversarial Verification (the reviewer independently checks if code matches claims).
- RENDER_HTML = true — When
true(default), auto-renderreview-stage/AUTO_REVIEW.mdto HTML on loop termination via/render-html. Uses--no-review(the loop itself IS the cross-model review; the HTML is a structural conversion). Setfalseto skip, or pass— render html: false.
⚠️ Nightmare + Manual incompatibility: If
REVIEWER_BACKEND = manualandREVIEWER_DIFFICULTY = nightmare, STOP with: "difficulty: nightmare requires Codex CLI / codex exec and is not compatible with --reviewer: manual. Use difficulty: hard, or switch reviewer to codex."
💡 Override:
/auto-review-loop "topic" — compact: true, human checkpoint: true, difficulty: hard
Reviewer Calling Convention
When calling the reviewer, branch on REVIEWER_BACKEND:
If REVIEWER_BACKEND = codex:
Use mcp__codex__codex for new review threads.
Use mcp__codex__codex-reply for follow-up rounds (reuse threadId).
If REVIEWER_BACKEND = manual:
Use mcp__manual_review__review for new review threads with:
prompt: [exact same prompt that would go to Codex]
config: {"model_reasoning_effort": "xhigh"}
Save the returned threadId.
Use mcp__manual_review__review_reply for follow-up rounds with:
threadId: [saved manual-review threadId]
prompt: [follow-up prompt]
config: {"model_reasoning_effort": "xhigh"}
Prompt fidelity: the manual prompt must be exactly the same text that Codex would receive. Review tracing applies equally to both backends.
State Persistence (Compact Recovery)
Long-running loops may hit the context window limit, triggering automatic compaction. To survive this, persist state to review-stage/REVIEW_STATE.json after each round:
{
"round": 2,
"threadId": "019cd392-...",
"status": "in_progress",
"difficulty": "medium",
"last_score": 5.0,
"last_verdict": "not ready",
"pending_experiments": ["screen_name_1"],
"timestamp": "2026-03-13T21:00:00"
}
Write this file at the end of every Phase E (after documenting the round). Overwrite each time — only the latest state matters.
On completion (positive assessment or max rounds), set "status": "completed" so future invocations don't accidentally resume a finished loop.
Output Protocols
Follow these shared protocols for all output files:
- Output Versioning Protocol — write timestamped file first, then copy to fixed name
- Output Manifest Protocol — log every output to MANIFEST.md
- Output Language Protocol — respect the project's language setting
Workflow
Initialization
- Check for
review-stage/REVIEW_STATE.json(fall back to./REVIEW_STATE.jsonif not found — legacy path):- If neither path exists: fresh start (normal case, identical to behavior before this feature existed)
- If it exists AND
statusis"completed": fresh start (previous loop finished normally) - If it exists AND
statusis"in_progress"ANDtimestampis older than 24 hours: fresh start (stale state from a killed/abandoned run — delete the file and start over) - If it exists AND
statusis"in_progress"ANDtimestampis within 24 hours: resume- Read the state file to recover
round,threadId,last_score,pending_experiments - Read
review-stage/AUTO_REVIEW.mdto restore full context of prior rounds (fall back to./AUTO_REVIEW.md) - If
pending_experimentsis non-empty, check if they have completed (e.g., check screen sessions) - Resume from the next round (round = saved round + 1)
- Log: "Recovered from context compaction. Resuming at Round N."
- Read the state file to recover
- Read project narrative documents, memory files, and any prior review documents. When
COMPACT = trueand compact files exist: readfindings.md+EXPERIMENT_LOG.mdinstead of fullreview-stage/AUTO_REVIEW.mdand raw logs — saves context window. - Read recent experiment results (check output directories, logs)
- Identify current weaknesses and open TODOs from prior reviews
- Initialize round counter = 1 (unless recovered from state file)
- Create/update
review-stage/AUTO_REVIEW.mdwith header and timestamp
Loop (repeat up to MAX_ROUNDS)
Phase A: Review
Route by REVIEWER_DIFFICULTY:
Medium (default) — MCP Review
Send comprehensive context to the external reviewer using the selected backend.
For codex backend:
mcp__codex__codex:
config: {"model_reasoning_effort": "xhigh"}
prompt: |
[Round N/MAX_ROUNDS of autonomous review loop]
[Full research context: claims, methods, results, known weaknesses]
[Changes since last round