Auto Review Loop (MiniMax Version): Autonomous Research Improvement
🔒 Do not wrap this skill in
/loop,/schedule, orCronCreate. Like/auto-review-loop, it already loops internally (review → fix → re-review), feeding each round's prior-round summary into the next review prompt (the backend is a stateless per-round API call, not a shared thread). An external timer re-enters from the top each tick, dropping that accumulated context and firing the verdict on wall-clock time instead of on artifact change — zero new signal, full token cost. Schedule the external wait that precedes it, not the verdict. Seeshared-references/external-cadence.md.
Autonomously iterate: review → implement fixes → re-review, until the external reviewer gives a positive assessment or MAX_ROUNDS is reached.
Context: $ARGUMENTS
Constants
- MAX_ROUNDS = 4
- POSITIVE_THRESHOLD: score >= 6/10 AND verdict ∈ {"ready", "almost"} — both must hold, matching the operative STOP CONDITION below. Verdict vocabulary is {"ready", "almost", "not ready"}. (Earlier wording used
orand a stale verdict set; theANDform is authoritative.) - REVIEW_DOC:
review-stage/AUTO_REVIEW.md(cumulative log) (fall back to./AUTO_REVIEW.mdfor legacy projects) - REVIEWER_MODEL =
MiniMax-M2.7— Model used via MiniMax API
API Configuration
This skill uses MiniMax API for external review. Two methods are supported:
Method 1: MCP Tool (Primary)
If mcp__minimax-chat__minimax_chat is available, use it:
mcp__minimax-chat__minimax_chat:
prompt: |
[Review prompt content]
model: "MiniMax-M2.7"
system: "You are a senior machine learning researcher..."
Method 2: curl (Fallback)
If MCP is not available, use curl directly:
curl -s "https://api.minimax.io/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $MINIMAX_API_KEY" \
-d '{
"model": "MiniMax-M2.7",
"messages": [
{"role": "system", "content": "You are a senior ML researcher..."},
{"role": "user", "content": "[Review prompt]"}
],
"max_tokens": 4096
}'
API Key: Read from ~/.claude/settings.json under env.MINIMAX_API_KEY, or from environment variable.
Why MiniMax instead of Codex MCP? Codex CLI uses OpenAI's Responses API (/v1/responses) which is not supported by third-party providers. See: https://github.com/openai/codex/discussions/7782
State Persistence (Compact Recovery)
Long-running loops may hit the context window limit, triggering automatic compaction. To survive this, persist state to review-stage/REVIEW_STATE.json after each round:
{
"round": 2,
"status": "in_progress",
"last_score": 5.0,
"last_verdict": "not ready",
"pending_experiments": ["screen_name_1"],
"timestamp": "2026-03-13T21:00:00"
}
Write this file at the end of every Phase E (after documenting the round). Overwrite each time — only the latest state matters.
On completion (positive assessment or max rounds), set "status": "completed" so future invocations don't accidentally resume a finished loop.
Workflow
Initialization
- Check for
review-stage/REVIEW_STATE.json(fall back to./REVIEW_STATE.jsonif not found — legacy path):- If neither path exists: fresh start (normal case)
- If it exists AND
statusis"completed": fresh start (previous loop finished normally) - If it exists AND
statusis"in_progress"ANDtimestampis older than 24 hours: fresh start (stale state from a killed/abandoned run — delete the file and start over) - If it exists AND
statusis"in_progress"ANDtimestampis within 24 hours: resume- Read the state file to recover
round,last_score,pending_experiments - Read
review-stage/AUTO_REVIEW.mdto restore full context of prior rounds (fall back to./AUTO_REVIEW.md) - If
pending_experimentsis non-empty, check if they have completed (e.g., check screen sessions) - Resume from the next round (round = saved round + 1)
- Log: "Recovered from context compaction. Resuming at Round N."
- Read the state file to recover
- Read project narrative documents, memory files, and any prior review documents
- Read recent experiment results (check output directories, logs)
- Identify current weaknesses and open TODOs from prior reviews
- Initialize round counter = 1 (unless recovered from state file)
- Create/update
review-stage/AUTO_REVIEW.mdwith header and timestamp
Loop (repeat up to MAX_ROUNDS)
Phase A: Review
Send comprehensive context to the external reviewer.
Check MCP availability first, then use appropriate method:
If MCP available (Primary):
Use mcp__minimax-chat__minimax_chat tool with:
- system: "You are a senior machine learning researcher serving as a reviewer for top-tier conferences like NeurIPS, ICML, and ICLR. Provide rigorous, constructive feedback."
- prompt: [Full review prompt with context]
- model: "MiniMax-M2.7"
If MCP NOT available (Fallback):
curl -s "https://api.minimax.io/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $MINIMAX_API_KEY" \
-d '{
"model": "MiniMax-M2.7",
"messages": [
{
"role": "system",
"content": "You are a senior machine learning researcher serving as a reviewer for top-tier conferences like NeurIPS, ICML, and ICLR. Provide rigorous, constructive feedback."
},
{
"role": "user",
"content": "[Round N/MAX_ROUNDS of autonomous review loop]\n\n[Full research context: claims, methods, results, known weaknesses]\n[Changes since last round, if any]\n[For round 2+: Summary of previous review feedback and what was addressed]\n\nPlease act as a senior ML reviewer (NeurIPS/ICML level).\n\n1. Score this work 1-10 for a top venue\n2. List remaining critical weaknesses (ranked by severity)\n3. For each weakness, specify the MINIMUM fix (experiment, analysis, or reframing)\n4. State clearly: is this READY for submission? Yes/No/Almost\n\nBe brutally honest. If the work is ready, say so clearly."
}
],
"max_tokens": 4096
}'
Note: Each round is a standalone API call. For round 2+, include the summary of previous reviews and changes in the prompt itself.
Phase B: Parse Assessment
CRITICAL: Save the FULL raw response from the external reviewer verbatim (store in a variable for Phase E). Do NOT discard or summarize — the raw text is the primary record.
Then extract structured fields:
- Score (numeric 1-10)
- Verdict ("ready" / "almost" / "not ready")
- Action items (ranked list of fixes)
STOP CONDITION: If score >= 6 AND verdict ∈ {"ready", "almost"} (exact match — "not ready" does NOT qualify) → stop loop, document final state.
Phase C: Implement Fixes (if not stopping)
For each action item (highest priority first):
- Code changes: Write/modify experiment scripts, model code, analysis scripts
- Run experiments: Deploy to GPU server via SSH + screen/tmux
- Analysis: Run evaluation, collect results, update figures/tables
- Documentation: Update project notes and review document
Prioritization rules:
- Skip fixes requiring excessive compute (flag for manual follow-up)
- Skip fixes requiring external data/models not available
- Prefer reframing/analysis over new experiments when both address the concern
- Always implement metric additions (cheap, high impact)
Phase D: Wait for Results
If experiments were launched:
- Monitor remote sessions for completion
- Collect results from output files and logs
Phase E: Document Round
Append to review-stage/AUTO_REVIEW.md:
## Round N (timestamp)
### Assessment (Summary)
- Score: X/10
- Verdict: [ready/almost/not ready]
- Key criticisms: [bullet list]
### Reviewer Raw Response
<details>
<summary>Click to expand full rev