Paper Pre-Submission Review (Lite, Cross-Model Adversarial)
This is the cross-model sibling of paper-review-lite, itself the in-session, Claude-Code-native counterpart to presubmit (our port of the reviewer2 adversarial peer-review pipeline). The heritage carries over wholesale. Sub-agents adopt a Critical-Reviewer posture, every finding is grounded in a verbatim quote, and a verification cascade filters hallucinations before they reach the final report.
The new mechanic is cross-model adversarial verification. Two reviewers — Claude (the orchestrator) and Codex (GPT-5.4, called through codex:codex-rescue) — independently apply the same paper-review-lite specification to the same paper. Each then plays Blue Team to the other's Red Team. Two different model families have different blind spots, so:
- Mutual catches (both teams flag the issue, both cross-checks confirm) are high-confidence.
- Asymmetric catches (one team flags, the other's cross-checker confirms against the paper) survive at standard confidence and often surface real but easy-to-miss problems.
- Asymmetric refutations (one team flags, the other's cross-checker refutes against the paper) are dropped by default. The orchestrator can override by re-reading the manuscript directly, but the burden is on the override.
- Quote-failed findings (the cross-checker cannot find the cited verbatim span) are dropped as hallucinations.
This is the heavier sibling. Roughly 22 model calls total (9 Claude Red Team, 9 Codex Red Team, 4 cross-model Blue Team) plus orientation and synthesis by the orchestrator. Reach for it before submission when you want maximum adversarial pressure and a second model family's blind spots. For the heaviest standalone deliverable — Red Team personas (Breaker, Butcher, Shredder, Collector, Void), math audits, code-replication checks, resumable, cost-tracked — use presubmit.
Instructions
1. Orientation (orchestrator, before any review agent launches)
Run the orientation step from paper-review-lite § 1 verbatim. Read the paper yourself to determine source format (LaTeX, Pandoc, Word), SI location, figure paths, replication archive, bibliography format, and design family. Use this to write specific review prompts that name actual file paths and section names. Generic prompts produce shallow reviews from both model families.
For experimental manuscripts, also invoke methods-reporting in audit mode and fold its 45-item checklist into Agents 6 and 7 on both teams. For conjoint, list-experiment, topic-modeling, LLM-classification, or VLM-OCR manuscripts, invoke the matching sibling skill and fold its checklist into Agent 9 on both teams.
2. Orchestration contract
Create this scratch layout in the paper's working directory.
.review-tmp/
├── claude/
│ ├── agent-1-content.md
│ ├── agent-2-numbers.md
│ ├── agent-3-references.md
│ ├── agent-4-dois.md
│ ├── agent-5-writing.md
│ ├── agent-6-consort.md
│ ├── agent-7-prereg.md
│ ├── agent-8-figures.md
│ └── agent-9-archive.md
├── codex/
│ ├── agent-1-content.md (same dimensions, Codex output)
│ ├── ...
│ └── agent-9-archive.md
└── cross-check/
├── claude-checks-codex-content.md (covers Codex agents 1, 2, 6, 7)
├── claude-checks-codex-technical.md (covers Codex agents 3, 4, 5, 8, 9)
├── codex-checks-claude-content.md
└── codex-checks-claude-technical.md
Both teams write independently to their own subdirectory. Cross-checkers read the other team's subdirectory. Use absolute paths when spawning Codex sub-agents. The codex:codex-rescue runtime inherits the parent working directory, but explicit paths remove ambiguity when the paper lives outside the current CWD.
3. Phase 2 — Dual independent Red Team (18 parallel calls)
Launch all 18 review calls in a single message.
- 9 Claude sub-agents via the
Agenttool (defaultsubagent_type). Use the agent prompts frompaper-review-lite§ 2 (Agents 1–9) verbatim, but redirect output to.review-tmp/claude/agent-N-*.mdinstead of the original.review-tmp/agent-N-*.md. - 9 Codex sub-agents via the
Agenttool withsubagent_type: codex:codex-rescue. Use the Codex Phase 2 template below, one call per dimension. Output to.review-tmp/codex/agent-N-*.md.
Both teams apply the same dimension definitions, the same Critical-Reviewer posture, and the same severity rubric ([CRITICAL], [RECOMMENDED], [MINOR]) from paper-review-lite § 2. The point of running two model families on one specification is to compare independent applications of one standard, not to give them different jobs. Neither team sees the other's findings during Phase 2.
Agents 6 (CONSORT) and 7 (pre-registration) are required for experimental manuscripts and marked NA for non-experimental ones on both teams.
4. Phase 3 — Cross-model adversarial verification (4 parallel calls)
Each model verifies the other's findings. Cross-checkers do not add new findings; that was Phase 2's job. They verify, refute, or downgrade.
Claude cross-checks Codex (2 sub-agents via Agent).
- Sub-agent A reads
.review-tmp/codex/agent-{1,2,6,7}-*.mdand the manuscript. Writes.review-tmp/cross-check/claude-checks-codex-content.md. - Sub-agent B reads
.review-tmp/codex/agent-{3,4,5,8,9}-*.mdplus the manuscript, bibliography, and archive. Writes.review-tmp/cross-check/claude-checks-codex-technical.md.
For each Codex [CRITICAL] or [RECOMMENDED] finding, the cross-checker verifies the cited verbatim quote appears at the cited location, verifies the issue against the actual paper, flags any Codex finding that Claude also flagged independently in .review-tmp/claude/ (mutual catch — note for synthesis), and steel-mans the paper. When the paper anticipates or partially addresses the concern, note it for severity downgrade.
Codex cross-checks Claude (2 sub-agents via Agent with subagent_type: codex:codex-rescue).
- Sub-agent C reads
.review-tmp/claude/agent-{1,2,6,7}-*.md. Writes.review-tmp/cross-check/codex-checks-claude-content.md. - Sub-agent D reads
.review-tmp/claude/agent-{3,4,5,8,9}-*.md. Writes.review-tmp/cross-check/codex-checks-claude-technical.md.
Use the Codex Phase 3 template below. Same verification protocol.
5. Phase 4 — Adjudication and synthesis (orchestrator, direct)
Build the consolidated Pre-Submit Report by adjudicating across teams.
- Mutual. Both teams flagged it; both cross-checkers confirmed. Mark
[CRITICAL ✓✓]or[RECOMMENDED ✓✓]. Highest-confidence findings. - Asymmetric, cross-confirmed. One team flagged; the other team's cross-checker confirmed against the paper. Retain at original severity. Mark
[CRITICAL ✓]or[RECOMMENDED ✓]. These often surface real but easy-to-miss problems. - Asymmetric, cross-refuted. One team flagged; the other team's cross-checker refuted against the paper. Default action is to drop. The orchestrator can override by re-reading the manuscript directly, in which case retain at one tier below original severity with the note "single-team finding, cross-refuted, retained after orchestrator re-read at file:line".
- Quote-failed. The cross-checker could not find the cited verbatim span. Drop as hallucination on the finder's side.
Apply the synthesis rules from paper-review-lite § 4 in order. Deduplicate within-team first (multiple agents on the same team flagging the same underlying issue). Then deduplicate across-team (a mutual catch is one entry, not two). Then demote self-conceded critiques (any finding whose own description includes language conceding the point). Then write a single-line Recommendation at the top of the report. Then write the Editor's Note (3–6 paragraph prose memo). Then the issue lists. Then the journal-readiness chec