LLM Wiki
A skill for building and maintaining an LLM-curated knowledge base inside a project, following the pattern Andrej Karpathy described in his April 2026 gist. The wiki is a directory of markdown files that the LLM owns and maintains; the user curates sources and asks questions, and the LLM does the bookkeeping.
The pattern in one paragraph
Conventional RAG re-derives knowledge from raw chunks on every query; nothing accumulates. The LLM Wiki pattern flips this: when a new source arrives, the LLM compiles it once into a persistent, structured wiki — extracting concepts, writing entity pages, updating cross-references, flagging contradictions. Subsequent queries read the pre-synthesized wiki rather than the raw sources. Knowledge compounds. The user is in charge of sourcing and asking good questions; the LLM handles the summarizing, linking, and consistency work that humans abandon wikis over.
When to use this skill
The trigger surface is broad. Any time the user is accumulating textual material over time — research papers, articles, transcripts, meeting notes, book chapters, customer calls, code repos, journal entries — and would benefit from having that material organized rather than dumped into a chat each session, this skill applies. It is equally useful for one source ("ingest this paper") and for the steady-state operations against an existing wiki ("what does my wiki say about diffusion models", "lint the wiki", "what's missing").
If the project does not yet have a wiki, run the bootstrap step first (see "Initializing a new wiki" below). Otherwise, locate the existing wiki and read its SCHEMA.md before doing anything else — the schema encodes the conventions for that specific wiki and may override defaults documented here.
Architecture: three layers, three operations
The wiki has three layers and three operations. Internalize this vocabulary because the rest of the skill assumes it.
The three layers are raw sources (the user's curated source material — articles, papers, PDFs, transcripts; immutable, the LLM reads but never modifies them), the wiki (a directory of LLM-generated markdown pages — entity pages, concept pages, comparisons, summaries; the LLM owns this layer entirely), and the schema (a SCHEMA.md file at the wiki root that documents the conventions for this particular wiki — page types, naming rules, tag taxonomy, ingest workflow customizations; co-evolved with the user).
The three operations are ingest (a new source arrives; the LLM reads it, writes a summary page, updates relevant entity and concept pages, appends to the log), query (the user asks a question; the LLM navigates the wiki via the index, reads the relevant pages, and synthesizes an answer — often filing the answer back as a new page so the exploration compounds), and lint (a periodic health check; the LLM scans for contradictions, stale claims, orphan pages, missing concepts, broken links).
For the canonical write-up of these operations, read references/architecture.md. For the step-by-step procedures, read references/ingest-workflow.md, references/query-workflow.md, and references/lint-workflow.md as needed.
Graph layer (compiled, optional)
Pages can carry typed graph: metadata in frontmatter. A bundled extractor compiles every page into wiki/graph/: nodes.jsonl, edges.jsonl, graph.sqlite, graph.graphml. Markdown is canonical; the graph is a regenerable index. Pages without graph: still appear as nodes (derived from their type/kind) and contribute low-confidence mentions edges from body wikilinks. Typed semantic edges (e.g. founded, proposed, depends_on) require an explicit source and evidence quote — never emit one inferred from training data.
The conventions for the graph layer (predicate vocabulary, node id format, required fields) live in wiki/graph/ontology.yaml. The full reference is references/graph-workflow.md. Run the bundled scripts after substantive ingests:
python scripts/wiki_graph_lint.py wiki/ # check ontology + evidence + alias collisions
python scripts/wiki_graph_extract.py wiki/ # rebuild nodes.jsonl, edges.jsonl, graph.sqlite, graph.graphml
python scripts/wiki_graph_query.py wiki/ neighbors --node product:konvy
If wiki/graph/ontology.yaml does not exist, the wiki is pre-graph and you should treat the graph step as a no-op — don't fabricate it.
Default project layout
Unless the user's SCHEMA.md says otherwise, the wiki lives in the project at this layout:
<project-root>/
├── wiki/
│ ├── SCHEMA.md ← conventions, the "config file" — read this FIRST
│ ├── index.md ← entry point: catalog of all pages with one-line summaries
│ ├── log.md ← append-only chronological log of ingests/queries/lints
│ ├── indexes/ ← (appears once index.md shards) per-category indexes
│ ├── entities/ ← pages about specific things (people, products, papers, places)
│ ├── concepts/ ← pages about ideas, methods, frameworks
│ ├── sources/ ← per-source summary pages (one per ingested source)
│ └── synthesis/ ← cross-cutting analyses, comparisons, query results filed back
├── raw/ ← the user's source material (PDFs, .md clippings, images)
│ └── assets/ ← downloaded images referenced by raw clippings
└── ...
This layout is a default, not a requirement. If the project already has a wiki under a different name (e.g. kb/, notes/, vault/), use that. If the user has placed sources outside raw/, follow their convention.
The scalability discipline
The single biggest failure mode of the LLM Wiki pattern is the wiki itself becoming a context bottleneck. Naive implementations break around a few hundred pages: the LLM either reads too many pages per query or starts hallucinating because it skipped the relevant ones. This skill's design is shaped almost entirely by avoiding that failure. The principles below are non-negotiable; ignoring them is what makes the pattern collapse at scale.
Atomic pages. Every wiki page is about one concept and stays small — soft cap 400 lines or roughly 2,000 words, hard cap 800 lines. When a page outgrows this, split it: extract sub-concepts into their own pages and have the parent link to them. A page that takes up 30% of the context window on its own is a design smell.
Index-first navigation. Never grep or glob the wiki blindly when answering a query. Always read index.md (or the relevant sharded index under indexes/) first to identify candidate pages, then drill into only those. The index is engineered to be cheap to read — one line per page, no bodies — and it is the cache that makes the whole pattern scalable.
Sharded indexes. When index.md itself exceeds ~300 lines or the wiki passes ~150 pages, shard it: move category-specific entries into indexes/<category>.md files (e.g. indexes/entities.md, indexes/concepts.md, indexes/sources.md, or finer domain shards), and have the top-level index.md become a directory of those shards. Now reading the index is a two-step lookup but each step is bounded.
YAML frontmatter on every page. Every wiki page begins with frontmatter that includes at minimum type, tags, sources, and updated. The bundled wiki_search.py script can filter on these without reading page bodies. See references/page-conventions.md.
Surgical edits, not rewrites. When updating a page (e.g. adding a new cross-reference because a freshly ingested source mentions an existing entity), use str_replace to touch only the relevant section. Rewriting whole pages is slow, expensive in tokens, and risks losing prior nuance.
Backlink discovery via grep. To find every page that references a given entity, run grep -rl "\[\[entity-name\]\]" wiki/ rather than reading pages to look for mentions.