Override for Codex users who want Claude Code, not a second Codex agent, to act as the reviewer. Install this package after
skills/skills-codex/*.
Auto Paper Improvement Loop: Review → Fix → Recompile
Autonomously improve the paper at: $ARGUMENTS
Context
This skill is designed to run after Workflow 3 (/paper-plan → /paper-figure → /paper-write → /paper-compile). It takes a compiled paper and iteratively improves it through external LLM review.
Unlike /auto-review-loop (which iterates on research — running experiments, collecting data, rewriting narrative), this skill iterates on paper writing quality — fixing theoretical inconsistencies, softening overclaims, adding missing content, and improving presentation.
Constants
- MAX_ROUNDS = 2 — Two rounds of review→fix→recompile. Empirically, Round 1 catches structural issues (4→6/10), Round 2 catches remaining presentation issues (6→7/10). Diminishing returns beyond 2 rounds for writing-only improvements.
- REVIEWER_MODEL =
claude-review— Claude reviewer invoked through the localclaude-reviewMCP bridge. SetCLAUDE_REVIEW_MODELif you need a specific Claude model override. - REVIEW_LOG =
PAPER_IMPROVEMENT_LOG.md— Cumulative log of all rounds, stored in paper directory. - HUMAN_CHECKPOINT = false — When
true, pause after each round's review and present score + weaknesses to the user. The user can approve fixes, provide custom modification instructions, skip specific fixes, or stop early. Whenfalse(default), runs fully autonomously.
💡 Override:
/auto-paper-improvement-loop "paper/" — human checkpoint: true
Inputs
- Compiled paper —
paper/main.pdf+ LaTeX source files - All section
.texfiles — concatenated for review prompt
State Persistence (Compact Recovery)
If the context window fills up mid-loop, Codex auto-compacts. To recover, this skill writes PAPER_IMPROVEMENT_STATE.json after each round:
{
"current_round": 1,
"thread_id": "019ce736-...",
"last_score": 6,
"status": "in_progress",
"timestamp": "2026-03-13T21:00:00"
}
On startup: if PAPER_IMPROVEMENT_STATE.json exists with "status": "in_progress" AND timestamp is within 24 hours, read it + PAPER_IMPROVEMENT_LOG.md to recover context, then resume from the next round. Otherwise (file absent, "status": "completed", or older than 24 hours), start fresh.
After each round: overwrite the state file. On completion: set "status": "completed".
Workflow
Step 0: Preserve Original
cp paper/main.pdf paper/main_round0_original.pdf
Step 1: Collect Paper Text
Concatenate all section files into a single text block for the review prompt:
# Collect all sections in order
for f in paper/sections/*.tex; do
echo "% === $(basename $f) ==="
cat "$f"
done > /tmp/paper_full_text.txt
Step 2: Round 1 Review
Send the full paper text to Claude review:
mcp__claude-review__review_start:
prompt: |
You are reviewing a [VENUE] paper. Please provide a detailed, structured review.
## Full Paper Text:
[paste concatenated sections]
## Review Instructions
Please act as a senior ML reviewer ([VENUE] level). Provide:
1. **Overall Score** (1-10, where 6 = weak accept, 7 = accept)
2. **Summary** (2-3 sentences)
3. **Strengths** (bullet list, ranked)
4. **Weaknesses** (bullet list, ranked: CRITICAL > MAJOR > MINOR)
5. **For each CRITICAL/MAJOR weakness**: A specific, actionable fix
6. **Missing References** (if any)
7. **Verdict**: Ready for submission? Yes / Almost / No
Focus on: theoretical rigor, claims vs evidence alignment, writing clarity,
self-containedness, notation consistency.
After this start call, immediately save the returned jobId and poll mcp__claude-review__review_status with a bounded waitSeconds until done=true. Treat the completed status payload's response as the reviewer output, and save the completed threadId for any follow-up round.
Save the returned jobId, poll mcp__claude-review__review_status until done=true, then save the completed threadId for Round 2.
Step 2b: Human Checkpoint (if enabled)
Skip if HUMAN_CHECKPOINT = false.
Present the review results and wait for user input:
📋 Round 1 review complete.
Score: X/10 — [verdict]
Key weaknesses (by severity):
1. [CRITICAL] ...
2. [MAJOR] ...
3. [MINOR] ...
Reply "go" to implement all fixes, give custom instructions, "skip 2" to skip specific fixes, or "stop" to end.
Parse user response same as /auto-review-loop: approve / custom instructions / skip / stop.
Step 3: Implement Round 1 Fixes
Parse the review and implement fixes by severity:
Priority order:
- CRITICAL fixes (assumption mismatches, internal contradictions)
- MAJOR fixes (overclaims, missing content, notation issues)
- MINOR fixes (if time permits)
Common fix patterns:
| Issue | Fix Pattern |
|---|---|
| Assumption-model mismatch | Rewrite assumption to match the model, add formal proposition bridging the gap |
| Overclaims | Soften language: "validate" → "demonstrate practical relevance", "comparable" → "qualitatively competitive" |
| Missing metrics | Add quantitative table with honest parameter counts and caveats |
| Theorem not self-contained | Add "Interpretation" paragraph listing all dependencies |
| Notation confusion | Rename conflicting symbols globally, add Notation paragraph |
| Missing references | Add to references.bib, cite in appropriate locations |
| Theory-practice gap | Explicitly frame theory as idealized; add synthetic validation subsection |
Step 4: Recompile Round 1
cd paper && latexmk -C && latexmk -pdf -interaction=nonstopmode -halt-on-error main.tex
cp main.pdf main_round1.pdf
Verify: 0 undefined references, 0 undefined citations.
Step 5: Round 2 Review
Use mcp__claude-review__review_reply_start with the saved completed threadId:
mcp__claude-review__review_reply_start:
threadId: [saved from Round 1]
prompt: |
[Round 2 update]
Since your last review, we have implemented:
1. [Fix 1]: [description]
2. [Fix 2]: [description]
...
Please re-score and re-assess. Same format:
Score, Summary, Strengths, Weaknesses, Actionable fixes, Verdict.
After this start call, immediately save the returned jobId and poll mcp__claude-review__review_status with a bounded waitSeconds until done=true. Treat the completed status payload's response as the reviewer output, and save the completed threadId for any follow-up round.
Step 5b: Human Checkpoint (if enabled)
Skip if HUMAN_CHECKPOINT = false. Same as Step 2b — present Round 2 review, wait for user input.
Step 6: Implement Round 2 Fixes
Same process as Step 3. Typical Round 2 fixes:
- Add controlled synthetic experiments validating theory
- Further soften any remaining overclaims
- Formalize informal arguments (e.g., truncation → formal proposition)
- Strengthen limitations section
Step 7: Recompile Round 2
cd paper && latexmk -C && latexmk -pdf -interaction=nonstopmode -halt-on-error main.tex
cp main.pdf main_round2.pdf
Step 8: Format Check
After the final recompilation, run a format compliance check:
# 1. Page count vs venue limit
PAGES=$(pdfinfo paper/main.pdf | grep Pages | awk '{print $2}')
echo "Pages: $PAGES (limit: 9 main body for ICLR/NeurIPS)"
# 2. Overfull hbox warnings (content exceeding margins)
OVERFULL=$(grep -c "Overfull" paper/main.log 2>/dev/null || echo 0)
echo "Overfull hbox warnings: $OVERFULL"
grep "Overfull" paper/main.log 2>/dev/null | head -10
# 3. Underfull hbox warnings (loose spacing)
UNDERFULL=$(grep -c "Underfull" paper/main.log 2>/dev/null || echo 0)
echo "Underfull hbox warnings: $UNDERFULL"
# 4. Bad boxes summary
grep -c "badness" paper/main.log 2>/dev/null || echo "0 ba