Kill Argument Exercise: Adversarial Attack-Defense Review
Stress-test the headline claims of a paper against the strongest possible rejection argument: $ARGUMENTS
Why This Exists
Standard score-based reviews (/research-review, /auto-paper-improvement-loop) tend to produce balanced weakness lists. Each weakness gets ~equal attention, ranked CRITICAL > MAJOR > MINOR. Empirically, this misses one specific failure mode: the single most damaging argument a reviewer would write in a rejection paragraph — the one sentence that, if a senior area chair reads it, kills the paper.
A balanced reviewer might list "scope-overclaim risk" as MAJOR alongside 3-5 other MAJORs, never quite committing. An adversarial reviewer must commit: their entire job is to convince the area chair to reject in 200 words.
This skill runs that adversarial pass deliberately, then forces a second fresh reviewer to defend point-by-point, classify each rejection as already-fixed / partially-fixed / still-unresolved, and surface what's actually load-bearing.
Empirical motivation: in a real submission run, after several rounds of standard improvement (score 7-8/10), the kill-argument exercise surfaced framing weaknesses that no prior review caught (e.g., a setting being mostly conditional rather than truly general, or a baseline being irrelevant to real systems). Author rebuttal forced explicit scope qualifications in abstract and discussion that weren't visible from the score-based reviews alone.
How This Differs From Other Review Skills
| Skill | What it asks the reviewer | Output |
|---|---|---|
| Standard peer review | "Score this paper, list weaknesses by severity" | balanced weakness list |
/research-review | "Deep technical review of methods + claims" | structured deep critique |
/proof-checker | "Is this theorem actually proved?" | per-step proof obligation audit |
/paper-claim-audit | "Does the paper report numbers truthfully?" | per-claim evidence verification |
/citation-audit | "Are citations real and used in correct context?" | per-entry KEEP/FIX/REPLACE/REMOVE |
/kill-argument | "Write the single strongest rejection paragraph; then defend it." | attack memo + per-point defense + unresolved surfaced |
This skill is complementary, not a replacement. Run after standard reviews when you want to know what the worst-case reviewer paragraph would look like, before camera-ready or rebuttal preparation.
When To Use
- After 1-2 rounds of
/auto-paper-improvement-loopsettled at a stable score, but before submission. Surfaces what additional fixes would close the headline-attack gap. - During rebuttal preparation, to predict reviewer-2's strongest objection so you can prepare the response in advance.
- For theory papers with a high-level title that may oversimplify the actual theorem (the most common reject-attack pattern).
- For papers where a reviewer might attack scope, assumption-vs-claim mismatch, missing proof obligations, or evidence-vs-headline gaps.
This skill is most valuable for theory papers with ≥5 theorem-class environments (so the headline depends on real proof obligations). For empirical papers without theorems, use /research-review instead.
Constants
- REVIEWER_MODEL =
gpt-5.5(default; specifygpt-5.4if you want to fall back to the legacy default). Reviewer reasoning effort =xhigh. - CONTEXT_POLICY =
fresh(REVIEWER_BIAS_GUARD). Each thread is a freshspawn_agentcall. Never usesend_input. No prior review summary, fix list, or executor explanation enters either prompt. - ATTACK_LENGTH = approximately 200 words (do not exceed 250). Single coherent argument, not a list.
- DEFENSE_DECOMPOSITION = 3-7 atomic rejection points extracted from the attack memo. Each gets its own classification.
- CLASSIFICATION =
answered_by_current_text/partially_answered/still_unresolved. (Names chosen so the adjudicator does not assume "fixed" implies prior history of patching — they read the paper as a fresh reviewer would.) - OUTPUT =
KILL_ARGUMENT.md(human-readable) +KILL_ARGUMENT.json(machine-readable) in the paper directory. - RENDER_HTML = true — When
true(default), auto-renderKILL_ARGUMENT.mdto HTML after writing the report via/render-html "<paper-dir>/KILL_ARGUMENT.md" --json "<paper-dir>/KILL_ARGUMENT.json". Uses full review gate (audit-class artifact). Setfalseto skip, or pass— render html: false. Non-blocking: failures don't invalidate the kill-argument verdict.
Workflow
Step 1: Discover paper files
Locate the paper directory and inventory the source.
PAPER_DIR="$ARGUMENTS" # e.g., paper-overleaf/ or paper/
cd "$PAPER_DIR"
# Find the LaTeX entry point
ENTRY=$(grep -lE '^\\documentclass' *.tex 2>/dev/null | head -1)
echo "Entry: $ENTRY"
# Find all source files codex should read
find . -name "*.tex" -not -path "./.git/*" 2>/dev/null
find . -name "*.bib" -not -path "./.git/*" 2>/dev/null
find figures/ -name "*.pdf" -o -name "*.png" 2>/dev/null
ls -la *.pdf 2>/dev/null # compiled PDF
If a compiled PDF is missing, the skill should still run on .tex source alone, but the prompt should mention this so the reviewer doesn't waste cycles trying to extract from a non-existent PDF.
Step 2: Attack memo (Thread 1, fresh codex)
Invoke spawn_agent (NOT send_input) with the following prompt structure. Use absolute or paper-directory-relative paths inside the prompt; do not rely on a cwd parameter.
spawn_agent:
model: gpt-5.5
reasoning_effort: xhigh
message: |
You are simulating a hostile NeurIPS / ICLR / ICML reviewer for a paper.
This is a kill-argument adversarial check — your task is NOT to give a
balanced review but to construct the **single strongest argument for
rejecting this paper**.
## Files to read
- LaTeX entry: <ENTRY>
- All section files under sections/ or wherever they live
- Macro files (math_commands.tex, etc.)
- Compiled PDF: <main.pdf> (if available)
Read the source carefully. Do not consult any prior reviews, fix lists,
or summaries; this must be a fresh, zero-context adversarial pass.
## Your task
Construct the single best argument to reject this paper in approximately
200 words. Your goal is to write the worst-case rejection memo a senior
NeurIPS area chair would produce after reading the paper.
Focus on these axes (pick the most damaging combination, do not list all):
1. Theorem validity: are central theorems actually proved as stated?
2. Assumption-vs-claim mismatch: does the body silently retreat to a
narrower object than the title/abstract advertise?
3. Missing proof obligations: is a fundamental lemma invoked but not
proved (e.g., concentration, generic position, prefactor envelope)
that the headline depends on?
4. Limit-order ambiguity: are limits in K/n/d/eps composed in a way the
paper does not commit to?
5. Claim-vs-evidence gap: is the empirical/numerical evidence too narrow
to support the breadth of the stated theorem or take-away?
6. Scope overclaim: does the title or abstract sell a result substantially
broader than what the body proves?
## Constraints
- Approximately 200 words total (do NOT exceed 250).
- Single argument, not a list — pick the most damaging line of attack
and develop it.
- Cite specific file:line locations or equation numbers when accusing.
- Tone: dispassionate but uncompromising. Do NOT hedge. Do NOT acknowledge
mitigations the paper might have made elsewhere. This is the rejection
paragraph; the defense gets the next pass.
- Do NOT reference prior review rounds, fix lists, or any context outside
the current paper files.
Output: just the rejection memo, nothing else.
Save the