Auto Review Loop: Autonomous Research Improvement

🔒 Do not wrap this skill in /loop, /schedule, or CronCreate. It already loops internally (review → fix → re-review) and the reviewer carries round-to-round memory in one threadId (codex-reply). An external timer re-enters from the top each tick — fresh threadId, reviewer memory reset — firing the verdict on wall-clock time instead of on artifact change: zero new signal, full token cost. If you want to schedule something, schedule the external wait that precedes it (experiments done → then run this once). See shared-references/external-cadence.md.

Autonomously iterate: review → implement fixes → re-review, until the external reviewer gives a positive assessment or MAX_ROUNDS is reached.

Context: $ARGUMENTS

Constants

MAX_ROUNDS = 4
POSITIVE_THRESHOLD: score >= 6/10 AND verdict ∈ {"ready", "almost"} — both must hold. This matches the operative Phase-E STOP CONDITION exactly; the verdict vocabulary is {"ready", "almost", "not ready"} (a high score with a "not ready" verdict does NOT stop the loop). Earlier wording here used or and a stale verdict set ("accept"/"sufficient"/"ready for submission") — that was an internal inconsistency; the AND form is authoritative.
REVIEW_DOC: review-stage/AUTO_REVIEW.md (cumulative log) (fall back to ./AUTO_REVIEW.md for legacy projects)
REVIEWER_MODEL = gpt-5.5 — Default model for the Codex backend. Must be an OpenAI model (e.g., gpt-5.5, o3, gpt-4o). Manual backend uses whatever model the user chooses.
REVIEWER_BACKEND = codex — Default: Codex MCP (xhigh). Override with — reviewer: oracle-pro for Oracle MCP, or — reviewer: manual for Manual Review MCP. If manual-review MCP is unavailable, stop and print the install command; do not fall back to Codex. See shared-references/reviewer-routing.md.
OUTPUT_DIR = review-stage/ — All review-stage outputs go here. Create the directory if it doesn't exist.
HUMAN_CHECKPOINT = false — When true, pause after each round's review (Phase B) and present the score + weaknesses to the user. Wait for user input before proceeding to Phase C. The user can: approve the suggested fixes, provide custom modification instructions, skip specific fixes, or stop the loop early. When false (default), the loop runs fully autonomously.
COMPACT = false — When true, (1) read EXPERIMENT_LOG.md and findings.md instead of parsing full logs on session recovery, (2) append key findings to findings.md after each round.
REVIEWER_DIFFICULTY = medium — Controls how adversarial the reviewer is. Three levels:
- medium (default): Current behavior — MCP-based review, the executor controls what context the reviewer sees.
- hard: Adds Reviewer Memory (the reviewer tracks its own suspicions across rounds) + Debate Protocol (the executor can rebut, the reviewer rules).
- nightmare: Everything in hard + Codex exec reviewer reads the repo directly via codex exec (the executor cannot filter what the reviewer sees) + Adversarial Verification (the reviewer independently checks if code matches claims).
RENDER_HTML = true — When true (default), auto-render review-stage/AUTO_REVIEW.md to HTML on loop termination via /render-html. Uses --no-review (the loop itself IS the cross-model review; the HTML is a structural conversion). Set false to skip, or pass — render html: false.

⚠️ Nightmare + Manual incompatibility: If REVIEWER_BACKEND = manual and REVIEWER_DIFFICULTY = nightmare, STOP with: "difficulty: nightmare requires Codex CLI / codex exec and is not compatible with --reviewer: manual. Use difficulty: hard, or switch reviewer to codex."

💡 Override: /auto-review-loop "topic" — compact: true, human checkpoint: true, difficulty: hard

Reviewer Calling Convention

When calling the reviewer, branch on REVIEWER_BACKEND:

If REVIEWER_BACKEND = codex: Use mcp__codex__codex for new review threads. Use mcp__codex__codex-reply for follow-up rounds (reuse threadId).

If REVIEWER_BACKEND = manual: Use mcp__manual_review__review for new review threads with: prompt: [exact same prompt that would go to Codex] config: {"model_reasoning_effort": "xhigh"} Save the returned threadId. Use mcp__manual_review__review_reply for follow-up rounds with: threadId: [saved manual-review threadId] prompt: [follow-up prompt] config: {"model_reasoning_effort": "xhigh"}

Prompt fidelity: the manual prompt must be exactly the same text that Codex would receive. Review tracing applies equally to both backends.

State Persistence (Compact Recovery)

Long-running loops may hit the context window limit, triggering automatic compaction. To survive this, persist state to review-stage/REVIEW_STATE.json after each round:

{
  "round": 2,
  "threadId": "019cd392-...",
  "status": "in_progress",
  "difficulty": "medium",
  "last_score": 5.0,
  "last_verdict": "not ready",
  "pending_experiments": ["screen_name_1"],
  "timestamp": "2026-03-13T21:00:00"
}

Write this file at the end of every Phase E (after documenting the round). Overwrite each time — only the latest state matters.

On completion (positive assessment or max rounds), set "status": "completed" so future invocations don't accidentally resume a finished loop.

Output Protocols

Follow these shared protocols for all output files:

Output Versioning Protocol — write timestamped file first, then copy to fixed name

Output Manifest Protocol — log every output to MANIFEST.md

Output Language Protocol — respect the project's language setting

Workflow

Initialization

Check for review-stage/REVIEW_STATE.json (fall back to ./REVIEW_STATE.json if not found — legacy path):
- If neither path exists: fresh start (normal case, identical to behavior before this feature existed)
- If it exists AND status is "completed": fresh start (previous loop finished normally)
- If it exists AND status is "in_progress" AND timestamp is older than 24 hours: fresh start (stale state from a killed/abandoned run — delete the file and start over)
- If it exists AND status is "in_progress" AND timestamp is within 24 hours: resume
  - Read the state file to recover round, threadId, last_score, pending_experiments
  - Read review-stage/AUTO_REVIEW.md to restore full context of prior rounds (fall back to ./AUTO_REVIEW.md)
  - If pending_experiments is non-empty, check if they have completed (e.g., check screen sessions)
  - Resume from the next round (round = saved round + 1)
  - Log: "Recovered from context compaction. Resuming at Round N."
Read project narrative documents, memory files, and any prior review documents. When COMPACT = true and compact files exist: read findings.md + EXPERIMENT_LOG.md instead of full review-stage/AUTO_REVIEW.md and raw logs — saves context window.
Read recent experiment results (check output directories, logs)
Identify current weaknesses and open TODOs from prior reviews
Initialize round counter = 1 (unless recovered from state file)
Create/update review-stage/AUTO_REVIEW.md with header and timestamp

Loop (repeat up to MAX_ROUNDS)

Phase A: Review

Route by REVIEWER_DIFFICULTY:

Medium (default) — MCP Review

Send comprehensive context to the external reviewer using the selected backend.

For codex backend:

mcp__codex__codex:
  config: {"model_reasoning_effort": "xhigh"}
  prompt: |
    [Round N/MAX_ROUNDS of autonomous review loop]

    [Full research context: claims, methods, results, known weaknesses]
    [Changes since last round

auto-review-loop

Como adicionar

Cole no README do seu repo

Skills relacionadas

dev-browser

agent-browser

understand-chat

understand-dashboard

Receba novas skills de Pesquisa e Web toda segunda