Citation Audit
🔒 Do not wrap this skill in
/loop,/schedule, orCronCreate. It is verdict-bearing — it judges bibliographic correctness. Re-running that verdict on a timer adds no new signal (it changes only when the bibliography changes). Schedule the external wait that precedes it — bibliography finalized → then audit once. Seeshared-references/external-cadence.md.
Verify every \cite{...} in a paper against three independent layers:
- Existence — the cited paper actually exists at the claimed arXiv ID / DOI / venue.
- Metadata correctness — author names, year, venue, and title match canonical sources (DBLP, arXiv, ACL Anthology, Nature, OpenReview, etc.).
- Context appropriateness — the cited paper actually supports the claim it is being used to support in the manuscript.
This skill is the fourth layer of \aris{}'s evidence-and-claim assurance, complementing experiment-audit (code), result-to-claim (science verdict), and paper-claim-audit (numerical claims). Together they form a bottom-up integrity stack from raw evaluation code to manuscript bibliography.
When to Use This Skill
Run before submission. The right gating point is:
- After
paper-writehas produced the LaTeX draft and bib file - After
paper-claim-audithas verified numerical claims - Before final
paper-compilefor submission
Do not run this on a half-written draft — most of the work is in cross-checking each \cite against context, which is wasted on placeholder text.
What This Skill Catches
The dangerous citation problems are not wildly fake citations — those are easy to spot. The dangerous ones are:
- Wrong-context citations: real paper, but the cited claim is not what that paper actually establishes (e.g., citing Self-Refine to support "self-feedback produces correlated errors" — Self-Refine actually argues the opposite).
- Author hallucinations: anonymous-author placeholders that slipped through, missing co-authors, wrong order.
- Title drift: arXiv v1 vs v3 with different titles silently merged.
- Venue confusion: arXiv preprint cited but the official venue is now CVPR/ICML/NeurIPS — using the wrong record.
- Year mismatch: arXiv 2023 preprint with 2024 conference acceptance, year reported inconsistently.
- Phantom DOIs: DOI looks real but does not resolve.
- Self-citation drift: your own prior work cited with year off by one.
Constants
- REVIEWER_MODEL =
gpt-5.5— Used via Codex MCP. Default for cross-model review with web access. - CONTEXT_POLICY =
fresh— Each audit run uses a new reviewer thread (REVIEWER_BIAS_GUARD). Nevercodex-reply. - WEB_SEARCH = required — The reviewer must perform real web/DBLP/arXiv lookups, not pattern-match from memory.
- OUTPUT =
CITATION_AUDIT.md— Human-readable per-entry verdict report. - STATE =
CITATION_AUDIT.json— Machine-readable verdict ledger consumable by downstream tools. - SOFT_ONLY =
false— When true (set via— soft-only/— soft_onlyflag), the audit runs all three layers normally but forbids any.bibfile mutation. Findings that would otherwise mutate the bib (FIX / REPLACE / REMOVE) are translated into per-occurrence sentence-rewrite proposals against the citing*.texfiles. Used by/resubmit-pipelinePhase 1 to honor the user's hard "freeze the bib" constraint. - RENDER_HTML = true — When
true(default), auto-renderCITATION_AUDIT.mdto HTML after writing the report. Uses full Codex review gate (audit-class artifact — render-fidelity check matches the skill's cross-model audit invariant). Setfalseto skip, or pass— render html: false.
Workflow
Step 1: Discover bib file and section files
Locate:
references.bib(orpaper.bib/ similar) under the paper directory- All
*.texfiles containing\cite{...}calls (typicallysec/orsections/)
If multiple bib files exist, audit each separately.
Step 2: Extract all (cite-key, context) pairs
For each \cite{key1,key2,...} invocation in the paper:
- Record the cite key
- Record the file + line number
- Record the surrounding sentence (≥ 1 full sentence around the cite, for context check)
Output a flat list of (key, file, line, surrounding_sentence) tuples.
Also build the inverse: for each bib entry, the list of all places it is cited.
Define two protocol sets used throughout the rest of the workflow: cited_keys is the set of unique cite keys appearing in any \cite{...} invocation across the audited *.tex files (de-duplicated), and bib_keys is the set of keys parsed from the audited bib file(s). cited_keys drives Step 3 (audit only cited entries); bib_keys \ cited_keys is the uncited residual surfaced by the --uncited opt-in.
If the user passed --uncited, also compute the set difference bib_keys \ cited_keys here and stash it for use in Steps 5 and the JSON aggregation; see "Uncited Entry Detection (opt-in)" below for the protocol. The set-diff is a string operation only and does not consume reviewer budget.
Save the extracted contexts to paper/.aris/citation-audit/contexts.txt so the reviewer can read it directly. Use the paper-dir-relative path .aris/citation-audit/contexts.txt when recording the file in audited_input_hashes; do not stage under /tmp or other transient locations that the verifier cannot rehash later.
Step 3: Send each entry to fresh cross-model reviewer
For each cited bib entry — i.e., each key in cited_keys with at least one extracted citation context — invoke mcp__codex__codex (NOT codex-reply — fresh thread per entry, or batch with explicit per-entry isolation). Do not send entries in bib_keys \ cited_keys to the reviewer; those are detect-only and surface only when --uncited is explicitly enabled (see "Uncited Entry Detection" below).
mcp__codex__codex:
model: gpt-5.5
config: {"model_reasoning_effort": "xhigh"}
sandbox: read-only
prompt: |
You are auditing a bibliographic entry. Use web/DBLP/arXiv search.
## Bib entry
@article{key2024example,
author = {...}, title = {...}, journal = {...}, year = {...}, ...
}
## Where this entry is cited in the paper
[paste extracted contexts]
For this entry, verify:
1. EXISTENCE: does this paper exist at the claimed arXiv ID / DOI / venue?
Output: YES / NO / UNCERTAIN, with the verifying URL.
2. METADATA: are author names, year, venue, title correct?
For each, output: correct / wrong: should be ... / typo: ...
3. CONTEXT: for each use, does the cited paper actually support the surrounding claim?
Output per-use: SUPPORTS / WEAK / WRONG, with one-sentence reasoning.
VERDICT: KEEP / FIX / REPLACE / REMOVE
- KEEP: entry is clean, all uses are appropriate
- FIX: metadata needs correction; uses are appropriate
- REPLACE: cite is wrong-context, find a different paper that actually supports the claim
- REMOVE: entry is hallucinated or unsupportable
Be honest. If you cannot verify online, say UNCERTAIN; do not guess.
Save the response to .aris/traces/citation-audit/<date>_runNN/<key>.md per the review-tracing protocol.
Step 4: Aggregate verdicts
Build CITATION_AUDIT.json following the schema defined in "Submission
Artifact Emission" below (single authoritative schema for this file).
Per-entry ledger data goes under details.per_entry, not under a
top-level entries field. The top-level verdict is a single overall
value (PASS / WARN / FAIL / NOT_APPLICABLE / BLOCKED / ERROR) derived
from per-entry verdicts per the decision table in "Submission Artifact
Emission"; the top-level summary is a one-line human-readable string.
Concretely, details carries the per-entry ledger:
"details": {
"total_entries": 29,
"counts": { "KEEP": 11, "FIX": 14, "REPLACE": 3, "REMOVE": 1 },
"per_entry