Scholar Deep Research
End-to-end academic research workflow that turns a question into a cited, structured report. Built for depth: multi-source federation, transparent ranking, citation chasing, and a mandatory self-critique pass before the report ships.
When to use
Explicit triggers: "literature review", "research report", "state of the art", "survey the field", "what's known about X", "deep research on Y", "systematic review", "scoping review", "compare papers on Z".
Proactive triggers (use without being asked):
- User asks a factual question whose honest answer is "it depends on the literature"
- User frames a research plan and needs the background section
- User is drafting a paper intro/related-work and hasn't yet scoped prior work
- User proposes a method and asks whether it's novel
Do not use when: a single known paper answers the question, the user wants a tutorial (not a survey), or they're debugging code.
Guiding principles
- Scripts over vibes. Every search, dedupe, rank, and export step runs through a script in
scripts/. The same input should produce the same output. Do not improvise ranking or counting by eye. - Sources are federated, not singular. OpenAlex is the primary backbone (free, 240M+ works, no key). arXiv (CS/ML/physics preprints), Crossref (DOI metadata), PubMed (biomedical), DBLP (CS conferences/journals), bioRxiv (life-sci preprints via Europe PMC), and Exa (open-web, requires
EXA_API_KEY) fill gaps. Semantic Scholar is also script-driven —build_citation_graph.py --source s2|bothis the spine path for Phase 4, with better CS / arXiv / cross-disciplinary coverage than OpenAlex; the two graphs disagree more than you'd expect. The asta MCP tools (mcp__asta__*) and Brave Search are skin — used opportunistically for relevance ranking or non-academic context, never on the critical path. If MCP times out, research continues. - State is persistent. Everything goes through
research_state.json. Queries ran, papers seen, decisions made, phase progress. Research becomes resumable and auditable. - Citations are anchors, not decorations. Every non-trivial claim in the draft carries
[^id]whereidmatches a paper in state. Unanchored claims are treated as hallucinations and fail the gate. - Saturation, not exhaustion, is the stop signal. A phase ends when a new round of search adds <20% novel papers AND no new paper has >100 citations.
- Self-critique is a phase, not a checkbox. Phase 6 reads the draft with adversarial intent. Its output goes into the report appendix.
The 8-phase workflow (Phase 0..7)
Phase 0: Scope → decompose question, pick archetype, init state
Phase 1: Discovery → multi-source search, dedupe
Phase 2: Triage → rank, select top-N for deep read
Phase 3: Deep read → extract evidence per paper
Phase 4: Chasing → citation graph (forward + backward)
Phase 5: Synthesis → cluster by theme, map tensions
Phase 6: Self-critique → adversarial review, gap finding
Phase 7: Report → render archetype template, export bibliography
Each phase writes to research_state.json before advancing. If the user pauses or a session crashes, the next run reads the state and picks up from the last completed phase.
Phase 0 — Scope
Before searching anything, decompose the question.
- Restate the question in one sentence. Surface ambiguities.
- PICO-style decomposition (or equivalent for non-biomedical fields):
- Population / Problem — what system, species, setting, or phenomenon?
- Intervention / Independent var — what method, factor, or manipulation?
- Comparison — against what baseline or alternative?
- Outcome — what is being measured or claimed?
- Pick an archetype that matches user intent (see
references/report_templates.md):literature_review— what is known about X (default)systematic_review— rigorous PRISMA-lite, comparison of many studies on one narrow questionscoping_review— what has been studied and how (breadth over depth)comparative_analysis— X vs Y, head-to-headgrant_background— narrative background + gap for a proposal
- Draft keyword clusters — 3-5 Boolean clusters covering synonyms, acronyms, and variant spellings. Include a "negative" cluster (terms to exclude).
- Initialize state:
(python scripts/research_state.py --state research_state.json init \ --question "<restated question>" \ --archetype literature_review--stateis top-level and applies to every subcommand;inititself takes--question,--archetype, and optional--force.)
When in doubt about archetype, ask the user. The choice shapes everything downstream.
Phase 1 — Discovery
Run searches across all available sources, in parallel where the source can take it. OpenAlex is primary; the others fill gaps.
Where parallelism actually pays off. The right place to fan out is Phase 3 (one agent per paper to read PDFs concurrently — see references/agent_prompts/phase3_deep_read.md). At Phase 1 the bottleneck is the upstream API, not local compute, and parallel fan-out across the same source mostly buys 429s and sticky cooldowns. The skill's bias should be: parallel between different sources, serial within one source. Concretely:
- Parallel-friendly: OpenAlex (polite-pool, very tolerant), Crossref (polite-pool), Exa (paid quota), bioRxiv (Europe PMC).
- Self-serialised (file-locked, automatic): arXiv (≥3s/req), PubMed (≥0.34s/req without
NCBI_API_KEY, ≥0.10s with), DBLP (1s buffer to avoid SSL EOF flakes).
The serialised sources use a per-source file lock under ${SCHOLAR_CACHE_DIR:-.scholar_cache}/rate/<source>.lock, so even N parallel search_arxiv.py invocations from the same agent will queue automatically and sleep the right gap — no agent-side coordination required, but parallel calls don't speed those sources up either, just don't error.
# Primary (no API key, always available)
python scripts/search_openalex.py --query "<cluster 1>" --limit 50 --state research_state.json
python scripts/search_openalex.py --query "<cluster 2>" --limit 50 --state research_state.json
# Domain-specific (use when relevant)
python scripts/search_arxiv.py --query "<cluster>" --limit 50 --state research_state.json # CS/ML/physics preprints
python scripts/search_dblp.py --query "<cluster>" --limit 50 --state research_state.json # CS gold-standard bibliography (no abstracts)
python scripts/search_pubmed.py --query "<cluster>" --limit 50 --state research_state.json # biomedical (PubMed)
python scripts/search_biorxiv.py --query "<cluster>" --limit 50 --state research_state.json # life-sci preprints (bioRxiv + medRxiv via Europe PMC)
python scripts/search_crossref.py --query "<cluster>" --limit 50 --state research_state.json # DOI-backed metadata
# Open-web coverage (optional, requires EXA_API_KEY) — finds material the
# scholarly APIs miss: lab sites, institutional PDFs, conference mirrors,
# preprints parked outside arXiv, NGO/government reports.
python scripts/search_exa.py --query "<cluster>" --limit 50 --state research_state.json
# Dedupe across sources (DOI-first, title-similarity fallback)
python scripts/dedupe_papers.py --state research_state.json
MCP enrichment (optional, run if available): call mcp__asta__search_papers_by_relevance and mcp__asta__snippet_search and feed results via scripts/research_state.py ingest. If the MCP call errors or times out, do not retry — move on.
Iterate. Read the state file. Are there keyword gaps? Are there authors appearing 3+ times whose other work you haven't pulled? Run another round. Stop when saturation hits — every source, not just the last one queried:
python scripts/research_state.py saturation --state research_state.json
# Returns { "per_source": {...}, "overall_saturated": true/false, ... }