Deep Research Team (Lead Orchestrator)
Conduct thorough, iterative research by coordinating a persistent team of researcher agents across multiple rounds. This architecture enables mid-investigation steering, targeted follow-up based on emerging findings, and cross-agent verification.
Architecture Overview
Round 1: Investigation Round 2: Follow-up Synthesis
┌──────────┐ ┌──────────┐ ┌────────┐
│Researcher│ sends findings │Researcher│ sends findings │ │
│ A ├─────────┬───────>│ A ├─────────┬───────>│ │
└──────────┘ │ └──────────┘ │ │ │
│ │ │ │
v dispatches v │ │
┌──────────┐ ┌────────┐ ┌──────────┐ ┌────────┐ │ Lead │
│Researcher├───>│ Lead │───>│Researcher├───>│ Lead │───>│ synth │
│ B │ │triages │ │ B │ │triages │ │ esizes│
└──────────┘ └────────┘ └──────────┘ └────────┘ │ │
^ ^ │ │
┌──────────┐ │ ┌──────────┐ │ │ │
│Researcher├─────────┴───────>│Researcher├─────────┴───────>│ │
│ C │ sends findings │ C │ sends findings │ │
└──────────┘ └──────────┘ └────────┘
Key principles:
-
No peer-to-peer researcher communication. All coordination goes through the lead. This preserves the independence that accounts for 87% of multi-agent gains (Choi et al.) and avoids sycophancy failures (Wynn et al.). Researchers never see each other's findings.
-
Multi-round iteration. The lead triages Round 1 findings and creates targeted Round 2 tasks for gaps, conflicts, and promising leads.
-
Cross-agent verification (Comprehensive scope). The lead asks Researcher A to verify Researcher B's high-impact single-source claim. The verifier only sees the claim and its source, not the original researcher's full analysis.
-
Dynamic task evolution. The shared task list starts with pre-planned angles but grows organically as follow-up tasks emerge from findings. The lead dispatches follow-up tasks directly to specific researchers via SendMessage.
When to Use
Use this skill for:
- Complex questions that benefit from multiple research angles
- Topics where initial findings will reveal what to investigate next
- Research requiring cross-verification of contested claims
- Any question needing synthesis across 5+ sources
Do NOT use for:
- Simple factual lookups (use regular web search)
- Questions answerable with 1-2 searches
- Debugging or code questions
Effort Calibration
| Scope | Researchers | Rounds | Verification | Model |
|---|---|---|---|---|
| Focused | 2 | 1-2 | None | sonnet |
| Broad | 3 | 2-3 | None | sonnet |
| Comprehensive | 4 | 3-4 | Cross-agent | opus |
Round counts are heuristics, not targets. Stop early when you hit citation convergence -- additional rounds that don't surface new substantive findings waste tokens and context. A Broad run that converges in 2 rounds is a success, not a shortcut. After each round's triage, ask: "Would another round change the report's conclusions?" If not, proceed to synthesis.
Default scope is determined by question type (see references/question-types.md).
Present the recommended scope to the user and allow override.
Model selection:
- Lead: inherits user's session model (no override)
- Researchers:
sonnetfor Focused/Broad,opusfor Comprehensive
(Sonnet validated as viable override for Comprehensive when cost matters)
Output Directory
Research artifacts persist to disk for resumability and backup.
Directory resolution -- run this command FIRST, before creating anything. The output is
your {output_dir}. Only the fallback branch creates a directory; the others reuse what exists.
if [ -d "$(pwd)/deep-research" ]; then
echo "$(pwd)/deep-research"
elif [ -n "$CLAUDE_DEEP_RESEARCH_DIR" ]; then
eval echo "$CLAUDE_DEEP_RESEARCH_DIR"
else
mkdir -p "$(pwd)/deep-research"
echo "$(pwd)/deep-research"
fi
After resolving {output_dir}, create only the topic subdirectory in Phase 3.
Each session creates a subdirectory: {output_dir}/{topic-slug}/
Contents:
state.md-- triage checkpoint, cross-references, follow-up plan (written in Phase 4)researcher-{letter}-findings.md-- backup of each researcher's findingsreport.md-- final synthesized report (written in Phase 6)
Calibrated Confidence Language
Use Kent-style verbal probability expressions in all confidence assessments:
| Term | Range | Use When |
|---|---|---|
| Almost certain | 93-99% | Multiple high-quality sources, no dissent |
| Highly likely | 80-92% | Strong evidence, minor caveats |
| Likely | 63-79% | Good evidence, some gaps |
| Roughly even | 40-62% | Conflicting evidence, genuinely uncertain |
| Unlikely | 20-39% | Limited or weak evidence |
Always pair the verbal term with the probability range in the final report.
Process
Phase 0: Classify Question Type
Silently classify the user's question before any interaction.
- Read
references/question-types.mdfor the full taxonomy. - Assign a primary type: Factual, Scientific/Health, Consumer, Technical, Opinion/Sentiment, Contested, or Emerging/Frontier.
- For compound questions, decompose into sub-questions and classify each.
- Note the default scope from the type-to-scope mapping.
Resume check: Before starting, list the subdirectories in {output_dir} and scan for any
that look related to the current question (similar topic, overlapping keywords). If you find
a plausible match, read its state.md and offer to resume: present what was completed, what
remains, and ask the user whether to resume or start fresh. If resuming, create a new team
and tasks for only the remaining work.
Topic slug: When creating a new session, generate a slug (lowercase, hyphenated, max 40
chars) for the subdirectory name: {output_dir}/{slug}/.
Classification is internal—do not present it to the user.
Phase 1: Clarify and Plan
Step 1: Make sure you understand the question. Before planning anything, ask yourself:
do I understand what the user is asking and why well enough to design research angles that
will actually be useful to them? If not, use AskUserQuestion to fill the gaps. This isn't
just about ambiguous wording -- a perfectly clear question can still lack enough context to
research well ("How does Nix handle dependencies?" means very different research depending on
whether you're evaluating Nix, debugging an issue, or writing docs). If the question and its
context are clear, skip this step.
Step 2: Scope and decompose. Determine the appropriate scope (Phase 2 has the details) and decompose the question into independent research angles. Default angle counts by scope:
- Focused: 2 angles
- Broad: 3 angles
- Comprehensive: 4 angles
These are defaults, not caps. If the decomposition reveals one more genuinely independent facet than the default, add it (e.g., 3 angles for a Focused run). If the question has fewer real facets, use fewer. Beyond ±1 from the default, re-scope rather than stretching -- the scope was probably wrong. Each angle must be independent and substantial enough to warrant a dedicated researcher; "I can think of another angle" isn't sufficient.
For compound questions, map sub-questions to angles. Multiple sub-questions can share an angle if closely related; a