Research Wiki: Persistent Research Knowledge Base
Subcommand: $ARGUMENTS
Overview
The research wiki is a persistent, per-project knowledge base that accumulates structured knowledge across the entire ARIS research lifecycle. Unlike one-off literature surveys that are used and forgotten, the wiki compounds — every paper read, idea tested, experiment run, and review received makes the wiki smarter.
Inspired by Karpathy's LLM Wiki pattern: compile knowledge once, keep it current, don't re-derive on every query.
Core Concepts
Four Entity Types
| Entity | Directory | Node ID format | What it represents |
|---|---|---|---|
| Paper | papers/ | paper:<slug> | A published or preprint research paper |
| Idea | ideas/ | idea:<id> | A research idea (proposed, tested, or failed) |
| Experiment | experiments/ | exp:<id> | A concrete experiment run with results |
| Claim | claims/ | claim:<id> | A testable scientific claim with evidence status |
Typed Relationships (graph/edges.jsonl)
| Edge type | From → To | Meaning |
|---|---|---|
extends | paper → paper | Builds on prior work |
contradicts | paper → paper | Disagrees with results/claims |
addresses_gap | paper|idea → gap | Targets a known field gap |
inspired_by | idea → paper | Idea sourced from this paper |
tested_by | idea|claim → exp | Tested in this experiment |
supports | exp → claim|idea | Experiment confirms claim |
invalidates | exp → claim|idea | Experiment disproves claim |
supersedes | paper → paper | Newer work replaces older |
Edges are stored in graph/edges.jsonl only. The ## Connections section on each page is auto-generated from the graph — never hand-edit it.
Wiki Directory Structure
research-wiki/
index.md # categorical index (auto-generated)
log.md # append-only timeline
gap_map.md # field gaps with stable IDs (G1, G2, ...)
query_pack.md # compressed summary for /idea-creator (auto-generated, max 8000 chars)
papers/
<slug>.md # one page per paper
ideas/
<idea_id>.md # one page per idea
experiments/
<exp_id>.md # one page per experiment
claims/
<claim_id>.md # one page per testable claim
graph/
edges.jsonl # materialized current relationship graph
Subcommands
/research-wiki init
Initialize the wiki for the current project:
- Create
research-wiki/directory structure - Create empty
index.md,log.md,gap_map.md - Create empty
graph/edges.jsonl - Log: "Wiki initialized"
/research-wiki ingest "<paper title>" — arxiv: <id>
Add a paper to the wiki. This subcommand is thin wrapping around the
canonical helper python3 "$ARIS_REPO/tools/research_wiki.py" ingest_paper …, which
is the single implementation of paper ingest in ARIS (per
shared-references/integration-contract.md
— one helper, no copies). The helper does all of:
- Fetch metadata — queries the arXiv Atom API when
--arxiv-idis given - Generate slug —
<first_author_last_name><year>_<keyword> - Check dedup — skip an existing page unless
--update-on-exist - Create page —
papers/<slug>.mdwith the schema below - Rebuild
index.mdandquery_pack.md - Append
log.md
Edge extraction (step 5/8 in the old manual flow) is not in
ingest_paper; do it as a follow-up with add_edge per relationship
identified:
ARIS_REPO="${ARIS_REPO:-$(awk -F'\t' '$1=="repo_root"{print $2; exit}' .aris/installed-skills-codex.txt 2>/dev/null)}"
WIKI_SCRIPT=""
[ -n "$ARIS_REPO" ] && [ -f "$ARIS_REPO/tools/research_wiki.py" ] && WIKI_SCRIPT="$ARIS_REPO/tools/research_wiki.py"
[ -z "$WIKI_SCRIPT" ] && [ -f tools/research_wiki.py ] && WIKI_SCRIPT="tools/research_wiki.py"
[ -z "$WIKI_SCRIPT" ] && [ -f ~/.codex/skills/research-wiki/research_wiki.py ] && WIKI_SCRIPT="$HOME/.codex/skills/research-wiki/research_wiki.py"
# arXiv-known paper
[ -n "$WIKI_SCRIPT" ] && python3 "$WIKI_SCRIPT" ingest_paper research-wiki/ \
--arxiv-id 2501.12345 --thesis "One-line claim from abstract."
# Venue paper with no arXiv mirror
[ -n "$WIKI_SCRIPT" ] && python3 "$WIKI_SCRIPT" ingest_paper research-wiki/ \
--title "Attention Is All You Need" \
--authors "Ashish Vaswani, Noam Shazeer, …" --year 2017 --venue "NeurIPS"
# Manual edge after ingest
[ -n "$WIKI_SCRIPT" ] && python3 "$WIKI_SCRIPT" add_edge research-wiki/ \
--from "paper:vaswani2017_attention_all_you" \
--to "paper:chen2025_factorized_gap" \
--type "extends" --evidence "Section 3.2: adapts the encoder block …"
Other skills (/research-lit, /arxiv, /alphaxiv, /deepxiv,
/semantic-scholar, /exa-search) call the same helper directly in
their own last step — they don't re-route through /research-wiki ingest as a subcommand, so they don't need an LLM roundtrip.
/research-wiki sync — arxiv-ids <id1>,<id2>,...
Batch backfill: ingest one or more arXiv IDs that were read earlier
without being ingested (e.g., because research-wiki/ was set up after
the reading happened, or a hook didn't fire).
# Explicit list
[ -n "$WIKI_SCRIPT" ] && python3 "$WIKI_SCRIPT" sync research-wiki/ \
--arxiv-ids 2310.06770,1706.03762
# From a file (one id per line, # comments ok)
[ -n "$WIKI_SCRIPT" ] && python3 "$WIKI_SCRIPT" sync research-wiki/ --from-file ids.txt
Dedup is handled per-id; already-ingested papers are skipped silently.
This is the recommended manual repair step (see integration
contract §5 Backfill). sync does not scan session traces — callers
declare the ids explicitly.
Paper page schema (exactly what ingest_paper emits — do not
handwrite alternative fields; lint will flag drift):
---
type: paper
node_id: paper:<slug>
title: "<full title>"
authors: ["First A. Author", "Second B. Author"]
year: 2025
venue: "arXiv"
external_ids:
arxiv: "2501.12345"
doi: null
s2: null
tags: ["tag1", "tag2"]
added: 2026-04-07T10:12:00Z
---
# <full title>
## One-line thesis
[Single sentence capturing the paper's core contribution]
## Problem / Gap
## Method
## Key Results
## Assumptions
## Limitations / Failure Modes
## Reusable Ingredients
[Techniques, datasets, or insights that could be repurposed]
## Open Questions
## Claims
[Reference claim pages: claim:C1, claim:C2, etc.]
## Connections
[AUTO-GENERATED from graph/edges.jsonl — do not edit manually]
## Relevance to This Project
[Why this paper matters for our specific research direction]
Additionally, when the paper was ingested via --arxiv-id and the arXiv
API returned an abstract, the helper appends an ## Abstract (original)
section after Relevance to This Project containing the raw abstract
text as a blockquote. Manual ingests (no --arxiv-id) do not include
this section.
/research-wiki query "<topic>"
Generate query_pack.md — a compressed, context-window-friendly summary:
Fixed budget (max 8000 chars / ~2000 tokens):
| Section | Budget | Content |
|---|---|---|
| Project direction | 300 chars | From AGENTS.md or RESEARCH_BRIEF.md |
| Top 5 gaps | 1200 chars | From gap_map.md, ranked by: unresolved + linked ideas + failed experiments |
| Paper clusters | 1600 chars | 3-5 clusters by tag overlap, 2-3 sentences each |
| Failed ideas | 1400 chars | Always included — highest anti-repetition value |
| Top papers | 1800 chars | 8-12 pages ranked by: linked gaps, linked ideas, centrality, relevance flag |
| Active chains | 900 chars | limitation → opportunity relationship chains |
| Open unknowns | 500 chars | Unresolved questions across the wiki |
Pruning priority (when over budget): low-ranked papers > cluster detail >