Council Review
Run any question, plan, or code through 5 independent advisors who use distinct reasoning methods, collaborate to refine answers, peer-review each other anonymously, and synthesize a verdict you can trust.
This skill implements the Diverse Multi-Agent Debate (DMAD) pattern. It is collaborative, not adversarial: agents seek truth through diversity of reasoning, not by arguing opposing positions.
Why This Works (Research Backing)
- Method diversity beats single-method debate. DMAD (ICLR 2025) shows that agents using distinct reasoning methods reliably outperform homogeneous councils — diverse medium-capacity models can beat GPT-4 on GSM-8K (91% vs 82%) when each agent applies a different reasoning approach.
- Collaborative debate beats adversarial debate. M3MADBench (2026) shows that across all modalities, collaborative DMAD outperforms adversarial Div-MAD "by a substantial margin." Adversarial paradigms introduce divergent noise; for open questions, plans, and decisions, collaborative deliberation is the right tool.
- Anonymous peer review prevents provider bias. Universal across the literature — reviewers defer to role names if visible, so peer-review responses must be shuffled.
- Confidence calibration breaks the martingale ceiling. Vanilla MAD often underperforms simple majority vote; confidence-modulated updates ("Demystifying MAD" 2026) systematically drift the council toward correct answers.
- Adaptive stopping cuts cost. KS-statistic convergence detection (S2 MAD via llmcouncil) reports up to 94.5% cost reduction on convergent questions.
For stress-testing a known artifact (PR, draft, spec), use the separate /adversarial-review skill instead — single-critic adversarial probing is the right tool there.
When to Use
The council is for questions where being wrong is expensive.
Good for: Architecture decisions, implementation plans, PR reviews, product decisions, migration strategies, API design, naming, pricing, scope decisions
Bad for: Factual lookups, writing tasks, simple yes/no, anything with one obvious right answer
Use a different tool: Single-critic stress test of an existing artifact → /adversarial-review
Flags
| Flag | Effect |
|---|---|
--quick | Lite mode: 3 advisors + chairman, no peer review (4 calls instead of 11) |
--adaptive | KS-statistic adaptive stopping. Run multi-round debate; halt when response distributions converge below epsilon for two consecutive rounds. Up to 94.5% cost cut on convergent questions. |
--confidence | Confidence-modulated synthesis. Each advisor rates own confidence (1–10) and rates each peer's confidence. Chairman synthesis is confidence-weighted, not majority-vote. Surfaces low-confidence consensus as a yellow flag. |
--measure-diversity | After advisors respond, score reasoning-footprint overlap across the responses. Report when the council agreed despite different reasoning methods — that's a signal the consensus may be theatrical. |
--adversarial | DEPRECATED. 2 advocates FOR + 2 skeptics AGAINST + 1 neutral. Retained for backward compatibility but contradicts M3MADBench evidence. Prefer /adversarial-review for single-critic stress tests. |
Flags compose: /council-review --adaptive --confidence "Should we adopt GraphQL?" runs convergence-stopped, confidence-weighted deliberation.
The Five Advisors
| # | Advisor | Angle | Reasoning Method | Catches |
|---|---|---|---|---|
| 1 | The Contrarian | What will fail? | Inversion — assume it shipped and failed, trace backward to the cause | "Sounds great but..." gaps you skip when excited |
| 2 | First Principles Thinker | What are we actually solving? | Decomposition — break into atomic claims, challenge each one | "You're optimizing the wrong variable" |
| 3 | The Expansionist | What upside are we missing? | Analogy — what adjacent domain solved this differently? | "You're thinking too small" |
| 4 | The Outsider | Zero context, fresh eyes only | Naive questioning — explain like you just joined; flag anything that requires insider knowledge to make sense | Curse of knowledge blind spots |
| 5 | The Executor | What do you do Monday morning? | Dependency graphing — what blocks what? What's the critical path? | Brilliant plans with no actionable first step |
Natural tensions: Contrarian vs Expansionist (downside vs upside), First Principles vs Executor (rethink vs ship it), Outsider keeps everyone honest.
The five reasoning methods are not interchangeable angles — each is a different cognitive operation. This is the DMAD lever: same model, different reasoning.
Execution Flow
Step 0: Pre-flight
<pre_flight>
Parse flags from $ARGUMENTS:
- If
--quickis present: use Lite Mode (see below) - If
--adaptiveis present: enable KS-statistic adaptive stopping (Step 3.5) - If
--confidenceis present: enable confidence-modulated synthesis (Steps 2 and 4) - If
--measure-diversityis present: enable diversity verification (Step 2.5) - If
--adversarialis present: use Adversarial Mode (deprecated, see below) - Remove flags from the input before classifying
Scope validation: Before convening the council, assess whether the input actually warrants it. If the question is purely factual, has one obvious right answer, or has no meaningful tradeoff, say so directly: "This doesn't need a council — [direct answer]. Use /council-review for decisions with genuine stakes and tradeoffs." Do not spawn agents for trivial questions.
Classify the remaining input:
- PR — Numeric value, or URL containing
/pull/. Fetch PR diff and description viagh pr view. - File path — String ending in a file extension or pointing to an existing file. Read the file contents.
- Plan/Decision/Question — Everything else. Use as-is.
For PRs and files, read the actual content and include it in the framed question. Don't just pass a URL — advisors need the substance.
</pre_flight>
Step 1: Gather Context and Frame
Auto-context gathering — before framing, read these project files (skip any that don't exist):
README.md— what the project doesCLAUDE.mdorAGENTS.md— conventions, architecture, patterns- Recent git log (
git log --oneline -10) — what's been happening - Any files the user referenced or that relate to the topic
- PR diff and description if reviewing a PR
Reframe the raw input as a clear, neutral prompt:
QUESTION:
[Core decision, plan, or code being reviewed]
CONTEXT:
[Key context from project files: what the project does, constraints, recent changes, stakes]
WHAT'S AT STAKE:
[Why this matters — cost of getting it wrong]
Don't add your own opinion. Don't steer toward an answer. If too vague, ask ONE clarifying question before proceeding.
Step 2: Convene the Council (5 agents in parallel)
Launch all 5 advisors simultaneously using the Agent tool. Each advisor runs in parallel. Use a lightweight model (haiku) for advisors — they're doing focused analysis, not complex reasoning.
CRITICAL: Launch all 5 in a single message with 5 Agent tool calls. Sequential execution lets earlier responses bleed into later ones and defeats the purpose.
Each advisor gets this prompt:
You are [ADVISOR NAME] on an LLM Council reviewing a decision.
Your angle: [ADVISOR ANGLE]
Your reasoning method: [ADVISOR REASONING METHOD — see table above]
A user has brought this to the council:
---
[framed question from Step 1]
---
Apply your assigned reasoning method rigorously. Don't just state opinions — show your work using your method.
Rules:
- 150-300 words. No preamble. Straight into your analysis.
- Name specific risks, opportunities, or issues — not vague concerns.
- If reviewing code: cite specific files, functions, or patterns.
- If reviewing a plan: point to specific steps, gaps, or sequencing issue