Documentation Architecture Review
Evaluate whether documentation is organized so that readers can find what they need, understand where they are, and navigate efficiently. The output is an architecture assessment with specific restructuring recommendations — not new content.
When to Use
- When restructuring or reorganizing documentation
- When adding a new section or doc type to an existing set
- When users report "I know it's documented somewhere but can't find it"
- When the doc set has grown organically and needs rationalization
- After
doc-completeness-auditidentifies gaps — before filling them, ensure the structure can accommodate new content - Periodic review of navigation and discoverability
Quick Reference
| Resource | Purpose | Load when |
|---|---|---|
references/personas.md | Six concrete reader personas with eval signals | Always (Phase 0) |
scripts/link_graph.py | Mechanical link-graph analyzer (orphans, reciprocity, broken links, hubs) | Always (Phase 1) |
references/ia-heuristics.md | Doc-type-aware IA evaluation heuristics | Always (Phase 2) |
Workflow Overview
Phase 0: Personas → Establish the doc set's primary 1-3 personas
Phase 1: Map → Build the current doc structure map (incl. link graph)
Phase 2: Evaluate → Score against IA heuristics, parameterized by personas + doc type
Phase 3: Model → Compare structure to user mental models per persona
Phase 4: Report → Produce the architecture review with per-persona findings
Phase 0: Establish Personas
A "good" architecture is good for someone specific. Without personas, the heuristics apply a default standard that systematically misjudges docs serving non-default audiences (a flat reference doc scored as "poorly hierarchical" because it doesn't follow Quick Start → advanced).
Step 0a — Identify the doc set's audiences
Read the doc set's entry pages (README, index.md, landing pages) and
the highest-traffic top-level docs. Identify which 1–3 personas from
references/personas.md are the primary readers. Common patterns:
| Doc set shape | Likely personas |
|---|---|
| Library / SDK with public API | API Looker-Up + Onboarding User |
| End-user product | Onboarding User + Operator |
| Internal infrastructure | Operator + Incident Responder + Architect Debugger |
| OSS project | Onboarding User + Contributor |
| Operations-heavy system | Operator + Incident Responder |
Step 0b — Draft persona profiles
For each identified persona, copy the profile from
references/personas.md verbatim. Don't paraphrase — the explicit
profile is what calibrates downstream sub-agents. If a persona almost
fits but a dimension differs, define a custom persona using the same
five-field structure.
Step 0c — Note conflicts
If the doc set serves more than one persona with conflicting needs (e.g., Onboarding User wants narrative, API Looker-Up wants terseness), note this explicitly. The synthesis report will surface where current structure favors one persona at the cost of another.
Output: A persona block (1–3 personas + any conflict notes) that feeds every downstream sub-agent prompt.
Phase 1: Map the Current Structure
Build a complete picture of the documentation architecture.
Step 1a: Physical Structure
Generate the file tree of all documentation:
find docs/ site/ -name '*.md' -o -name '*.html' | sort
Record:
- Directory hierarchy and nesting depth
- File count per directory
- Naming conventions (kebab-case, snake_case, mixed)
Step 1b: Navigation Structure
Identify every way a reader can navigate:
| Navigation type | Where to find it |
|---|---|
| Sidebar / table of contents | _config.yml nav, front matter nav_order/parent, SUMMARY.md |
| Landing pages | index.md files — read each one for link lists |
| In-page cross-references | [text](link) and {% link %} references between pages |
| Breadcrumbs | Theme configuration or layout templates |
| Search | Search configuration, indexed content |
| Previous/Next links | Auto-generated or manual nav_order sequencing |
Step 1c: Entry Points
Identify how readers arrive:
- Direct — typing a URL or bookmarking
- Search — site search or external search engine
- Navigation — sidebar, breadcrumb, landing page links
- Cross-reference — link from another doc page
- External — README, GitHub, blog post, error message linking to docs
Map which pages are reachable from each entry point. Pages unreachable from common entry points are effectively invisible.
Step 1d: Link Graph (mechanical)
Run the bundled link graph analyzer to extract deterministic facts about inter-doc linking:
python3 skills/doc-architecture-review/scripts/link_graph.py --scope all --json > graph.json
# Or human-readable:
python3 skills/doc-architecture-review/scripts/link_graph.py --scope all
The script produces:
- Orphans — pages with no inbound links (excluding entry points like
index.mdandREADME.md). Direct input to Heuristic 1 (Findability). - Dead-ends — pages with no outbound links. Content silos.
- Reciprocity ratio — fraction of edges that have a back-link. Direct input to Heuristic 4 (Cross-Linking Quality).
- Hubs — pages with high in-degree. Natural reference targets.
- Broken links — internal links that don't resolve. Direct input to Heuristic 4.
These are mechanical facts, not judgments. The judgment-heavy parts of Heuristics 1 and 4 (are links contextual? do navigation labels use user language?) are evaluated by sonnet sub-agents in Phase 2.
Output: A structure map with physical hierarchy, navigation paths, entry points, and the link graph JSON.
Phase 2: Evaluate Against IA Heuristics
Assess the structure against seven heuristics. Load references/ia-heuristics.md
for detailed scoring criteria.
Mechanical vs judgment split
For a doc set of any meaningful size, the orchestrator can't read every page to score every heuristic — that strains the context window and produces patchy evaluation. Phase 2 splits work:
- Mechanical part — driven by the Phase 1d link graph JSON. Orphan counts, reciprocity ratio, broken-link counts, hub identification: these are facts, not judgments. The orchestrator reads the JSON and assigns scores deterministically.
- Judgment part — dispatched to
general-purpose+sonnetsub-agents organized by heuristic. Each agent receives a focused slice of the doc set and returns specific findings with citations.
Sonnet sub-agent dispatch
Three judgment-heavy heuristics warrant dedicated agents. Each agent's
prompt inlines the persona block from Phase 0 and the relevant
doc-type criteria from references/ia-heuristics.md. The agent scores
per-persona, not against a generic default.
Agent 1 — Findability narrative review (Heuristic 1):
subagent_type: "general-purpose"
model: "sonnet"
description: "Findability narrative review"
Prompt template:
Read landing pages, navigation configs (_config.yml, front-matter
nav_order/parent), and the orphans list from the link graph JSON.
Personas (from Phase 0):
<INLINE PERSONA BLOCK — full profile per persona, not summary>
Doc-type criteria for Heuristic 1:
<INLINE Heuristic 1 section from references/ia-heuristics.md>
For each persona, score Findability 1-5 and identify specific failures:
- Are navigation labels in this persona's language?
- Are entry points appropriate for how this persona arrives?
- Are orphans concentrated in a doc type that fails this persona's task?
Output per-persona scores plus findings. When personas conflict (e.g.,
nav labels in one's language fail another), surface the conflict
explicitly rather than averaging.
Agent 2 — Cross-linking quality review (Heuristic 4):
subagent_type: "general-purpose"
model: "sonnet"
description: "Cross-link qua