Deep Brainstorming
Research-hardened brainstorming that catches biases agents naturally introduce. Collapses what would otherwise be 3-4 sessions of iterative discovery into one disciplined session.
REQUIRED: Run a standard brainstorming pass first (e.g., superpowers:brainstorming or equivalent). This skill augments brainstorming output — it does not replace it.
When to Use
digraph when {
"Starting brainstorming?" [shape=diamond];
"Multiple valid tech choices?" [shape=diamond];
"Wrong choice costly to reverse?" [shape=diamond];
"Use regular brainstorming" [shape=box];
"Use deep brainstorming" [shape=box, style=bold];
"Starting brainstorming?" -> "Multiple valid tech choices?";
"Multiple valid tech choices?" -> "Use regular brainstorming" [label="no"];
"Multiple valid tech choices?" -> "Wrong choice costly to reverse?";
"Wrong choice costly to reverse?" -> "Use regular brainstorming" [label="no — easy to swap"];
"Wrong choice costly to reverse?" -> "Use deep brainstorming" [label="yes"];
}
Intensity Selector
Pick intensity based on stakes and time budget. Default to Standard unless stakes clearly call for Quick or Thorough.
| Level | Research Rounds | Bias Audit | Review Phases | When |
|---|---|---|---|---|
| Quick | 1 round, 2-3 agents | Spot-check top recommendation | Verify-fix only | Low-stakes, easily reversible |
| Standard | 2 rounds, 3-4 agents each | Full 9-type audit after each round | Verify-fix + adversarial | Default for most architecture decisions |
| Thorough | 3-4 rounds, 4-6 agents each | Full audit + retroactive sweep | All 3 phases + independent re-verification | High-stakes, expensive to reverse, novel domain |
Process Overview
| Phase | What | Why |
|---|---|---|
| 1. Sanitize vision | Strip tool/vendor names from requirements | Prevents anchoring bias in research |
| 2. Research rounds | Parallel agents with clean prompts, progressive debiasing | Catches marketing/popularity bias |
| 3. Bias audit | Check each round's output against 9 bias types | Agents over-represent popular tools |
| 4. Independent verify | Check key claims via web/docs yourself | Catches hallucinated benchmarks |
| 5. Claim provenance | Record source, quote, verification method for every cited number | Prevents stat drift across sessions |
| 6. Assemble spec | Progressive file capture, one section at a time | Survives compaction |
| 7. Three-phase review | Verify-fix, adversarial, security/ops | Each type catches different classes of bugs |
| 8. Quality gate | "Is this objectively best?" challenge on final spec | Catches premature satisfaction |
Phase 1: Sanitize the Vision
Extract the client brief into a clean requirements document. This is the anchor for all research.
Strip: All tool names, frameworks, architecture patterns, vendor references.
Keep: What it does, who uses it, hard constraints (language, platform, compliance).
Save as: <project-docs-dir>/<project>-vision.md
Scoping discipline: Only include requirements the client actually stated. If a requirement wasn't in the brief, it doesn't belong in the vision. Agents inject phantom requirements from training data (compliance frameworks, accessibility standards, monitoring stacks). Challenge every requirement: "Did the client ask for this, or did an agent add it?"
Non-Requirements section: Explicitly list what the client did NOT ask for in the vision doc. This is the primary defense against phantom requirements — research agents that recommend solutions to Non-Requirements get flagged immediately.
Phase 2: Research with Clean Prompts
Prompt rules:
- Zero vendor/tool names — only functional requirements + vision doc
- "What's the best way to achieve Y?" not "Is X good for Y?"
- Include: "Verify non-library claims via web search. Do not rely on training data."
- Attach the sanitized vision document to every prompt
Agent discipline (mandatory for ALL researcher agents):
- Synthesize first: Every researcher prompt MUST include: "SYNTHESIZE FIRST — write your findings report before doing more searches. You can always search more after writing what you know." Without this, agents spend 80%+ of their token budget on searches and never produce output. (Evidence: 60% researcher agent failure rate observed from token exhaustion on web-search-heavy tasks.)
- Max 5 questions per agent. Split larger research across parallel agents. One agent with 10 questions will exhaust tokens on the first 5 and never synthesize. Each agent gets 3-5 focused questions.
- Structured doc tools for library docs. Prefer structured documentation sources (e.g., Context7 MCP, official API docs) over generic web search for library API verification. Structured docs return focused content at lower token cost than web search chains.
- Relaunch failed agents. When a researcher fails to synthesize (returns partial output like "Let me search for more..."), relaunch with a shorter prompt. Doing the research inline burns 5-10x more main context.
Round structure:
- Split research by domain (e.g., backend, frontend, data pipeline, infrastructure, NLP)
- One agent per domain per round, all running in parallel
- Round 1: broad discovery — "What are the top approaches for [functional need]?"
- Round 2: verification with fresh prompts that do NOT reference Round 1 findings. Ask the same functional questions differently. Compare convergence.
- Round 3+ (Thorough only): targeted investigation of divergences between rounds
Consensus guard: If all agents in a round converge on the same tool, that's a red flag — not validation. Force dissent: "What's the strongest alternative to [consensus pick] and under what conditions would it win?"
Prompt review cycle: Before launching each round, present prompts to the user for review and revision. Agents internalize biased framing invisibly — the user catches phrasing that anchors research. Specific things to check: did any functional description smuggle in a tool name? Did "agentic orchestration" imply a specific architecture? Soften loaded terms to neutral descriptions.
Full reset option (Thorough): For the final research round, discard all prior findings: "All previous research counts as zero. Start from the requirements only." This prevents anchoring to earlier rounds' conclusions and surfaces genuinely different approaches.
Convergence table: After each round, produce a cross-round comparison table showing every decision layer with what each round recommended. Items that converge across rounds get high confidence. Items that diverge get "benchmark to decide" status. This table is the primary decision tool — not any single round's output.
For detailed prompt templates, see references/research-prompts.md.
Phase 3: Bias Audit
Check after EVERY research round. Audit the full output, not just the recommendations.
| Bias | Signal | Detection |
|---|---|---|
| Marketing | Most-blogged tool recommended as best | Check: is the recommendation backed by benchmarks or by blog post count? |
| Popularity | Most-downloaded/starred treated as winner | Check: do downloads measure quality or awareness? |
| License | Defaulting to OSS or to commercial without comparison | Check: was the full landscape evaluated, or just one license type? |
| Information landscape | Domain keywords trigger unrelated associations | Check: is the recommendation actually relevant to THIS use case? |
| Training data | Stale versions, deprecated tools, renamed SDKs | Check: verify version numbers and project status against official sources |
| Vendor benchmark | Performance numbers sourced from the tool's own maker | Check: find independent verification or note as unverified |
| Hallucinated evidence | Specific benchmark numbers or repo URLs that don't exist |