Proof Checker: Rigorous Mathematical Verification & Fixing
🔒 Do not wrap this skill in
/loop,/schedule, orCronCreate. It is verdict-bearing — it judges proof validity across rounds, threading the reviewer's memory from Phase 1 → Phase 3 viacodex-replyso the reviewer can check whether a fix actually closed the gap it flagged. An external timer re-enters from the top each tick, starting a fresh thread and losing that memory. Schedule the external wait that precedes it, not the verdict. Seeshared-references/external-cadence.md.
Systematically verify a mathematical proof via cross-model adversarial review, fix identified gaps, re-review until convergence, and generate a detailed audit report with proof-obligation accounting.
Context: $ARGUMENTS
Constants
- MAX_REVIEW_ROUNDS = 3
- REVIEWER_MODEL =
gpt-5.5— Default model for the Codex backend, reasoning effort alwaysxhigh. Manual backend uses whatever model the user chooses, but it must be a non-Claude model — the executor is Claude, so routing the proof review into any Claude product makes Claude judge Claude and voids the cross-model invariant (seeshared-references/reviewer-routing.md). - REVIEWER_BACKEND =
codex— Default: Codex MCP (xhigh). Override with— reviewer: oracle-profor Oracle MCP, or— reviewer: manualfor Manual Review MCP. If manual-review MCP is unavailable, stop and print the install command; do not fall back to Codex. Seeshared-references/reviewer-routing.md.
Reviewer Calling Convention
When calling the reviewer, branch on REVIEWER_BACKEND:
If REVIEWER_BACKEND = codex:
Use mcp__codex__codex for new review threads.
Use mcp__codex__codex-reply for follow-up rounds (reuse threadId).
If REVIEWER_BACKEND = manual:
Use mcp__manual_review__review for new review threads with:
prompt: [exact same prompt that would go to Codex]
config: {"model_reasoning_effort": "xhigh"}
Save the returned threadId.
Use mcp__manual_review__review_reply for follow-up rounds with:
threadId: [saved manual-review threadId]
prompt: [follow-up prompt]
config: {"model_reasoning_effort": "xhigh"}
Prompt fidelity: the manual prompt must be exactly the same text that Codex would receive. Review tracing applies equally to both backends.
- AUDIT_DOC:
PROOF_AUDIT.mdat the paper directory root, alongsidemain.tex(cumulative log; when invoked via/paper-writing, this ispaper/PROOF_AUDIT.md) - REPORT_TEX:
proof_audit_report.tex(formal before/after PDF) - STATE_FILE:
PROOF_CHECK_STATE.json(for recovery) - SKELETON_DOC:
PROOF_SKELETON.md(micro-claim inventory) - RENDER_HTML = true — When
true(default), auto-renderPROOF_AUDIT.mdto HTML at workflow end via/render-html. Uses full Codex review gate (audit-class artifact — math-heavy content; render-fidelity check protects against MathJax breakage and matches the skill's cross-model audit invariant). Setfalseto skip, or pass— render html: false.
Acceptance Gate (objective, replaces subjective scoring)
The proof passes when ALL of the following hold:
- Zero open FATAL or CRITICAL issues
- Every theorem/lemma has: (i) explicit hypotheses, (ii) proof with all interchanges justified, (iii) every application discharges hypotheses in the ledger
- All big-O/Θ/o statements have declared parameter dependence and uniformity scope
- Counterexample pass executed on all key lemmas (log candidates even if none found)
Issue Taxonomy (20 categories, 4 groups)
Group A: Logic & Proof Structure
| Category | Description | Example |
|---|---|---|
| UNJUSTIFIED_ASSERTION | Claim stated without proof or reference | "The Hessian splits into Gram blocks" |
| UNPROVEN_SUBCLAIM | "Clearly" / "it follows" hides a nontrivial lemma | "By symmetry, the cross-terms vanish" without checking |
| QUANTIFIER_ERROR | Wrong order ∀/∃, missing "for sufficiently small κ" | "For all π, there exists ε" vs "there exists ε for all π" |
| IMPLICATION_REVERSAL | Uses (A⇒B) as (B⇒A), or claims equivalence with only one direction | |
| CASE_INCOMPLETE | Misses boundary/degenerate cases | Singular covariance, zero weight, non-unique argmin |
| CIRCULAR_DEPENDENCY | Lemma uses theorem that depends on it | |
| LOGICAL_GAP | A step is not justified by what precedes it | B=Θ(1) → β_K=0 without analyzing W |
Group B: Analysis & Measure Theory
| Category | Description | Example |
|---|---|---|
| ILLEGAL_INTERCHANGE | Swaps limit/expectation/derivative/integral without DCT/MCT/Fubini | Differentiating under E without domination |
| NONUNIFORM_CONVERGENCE | Pointwise convergence used as uniform | sup and limit swapped |
| MISSING_DOMINATION | DCT cited but no dominating function given | |
| INTEGRABILITY_GAP | Uses E | X |
| REGULARITY_GAP | Differentiability/Lipschitz/convexity used but not established | |
| STOCHASTIC_MODE_CONFUSION | Mixes a.s./in prob./in L²/in expectation |
Group C: Model & Parameter Tracking
| Category | Description | Example |
|---|---|---|
| MISSING_DERIVATION | A quantity is used but never derived from the model | Risk functional with undefined B, W |
| HIDDEN_ASSUMPTION | Proof silently uses a condition not in the theorem | Gaussianity assumed but not stated |
| INSUFFICIENT_ASSUMPTION | Hypotheses too weak for proof (counterexample exists) | Moment conditions admitting 2-point distributions |
| DIMENSION_TRACKING | Parameter dependence (d, n, K, ...) not explicit | d enters only through κ |
| NORMALIZATION_MISMATCH | Coordinate/scaling conventions inconsistent | Rescaled vs raw coordinates |
| CONSTANT_DEPENDENCE_HIDDEN | "C" depends on d,n,K but treated as universal |
Group D: Scope & Claims
| Category | Description | Example |
|---|---|---|
| SCOPE_OVERCLAIM | Conclusion stated more broadly than proof supports | "β_K=0" with only generic overlap |
| REFERENCE_MISMATCH | Cited theorem's hypotheses not verified at point of use |
Two-Axis Severity System
Axis A — Proof Status (what is wrong)
| Status | Meaning |
|---|---|
| INVALID | Statement false as written (counterexample exists or contradiction) |
| UNJUSTIFIED | Could be true, but current proof does not establish it |
| UNDERSTATED | True only after strengthening assumptions |
| OVERSTATED | True only after weakening conclusion / adding qualifiers |
| UNCLEAR | Ambiguous notation / definition drift (not wrong per se) |
Axis B — Impact (how much breaks)
| Impact | Meaning |
|---|---|
| GLOBAL | Breaks main theorem or core dependency chain |
| LOCAL | Affects a side result but not the main theorem |
| COSMETIC | Exposition only |
Severity Labels (derived)
| Label | Definition |
|---|---|
| FATAL | INVALID + GLOBAL |
| CRITICAL | (INVALID + LOCAL) or (UNJUSTIFIED + GLOBAL) |
| MAJOR | (UNJUSTIFIED + LOCAL) or (UNDERSTATED/OVERSTATED + GLOBAL) |
| MINOR | Clarity / notation / dimension bookkeeping that doesn't change claims |
Side-Condition Checklists for Common Theorems
When the proof invokes any of the following, require explicit verification of ALL listed conditions:
| Theorem | Required Conditions |
|---|---|
| DCT (Dominated Convergence) | Pointwise a.e. convergence + integrable dominating function |
| MCT (Monotone Convergence) | Monotone increasing + non-negative |
| Fubini/Tonelli | Product measurability + integrability (Fubini) or non-negative (Tonelli) |
| Leibniz integral rule | Continuity of integrand + dominating function for derivative |
| Implicit Function Theorem | Continuous differentiability + non-singular Jacobian |