SSkilltecabyclaudinhocode
Enviar skill
← Voltar para o catálogo

council-review

Desenvolvimento

Run any question, plan, PR, or code through a Diverse Multi-Agent Debate (DMAD) council of 5 AI advisors with distinct reasoning methods. Advisors collaborate, peer-review each other anonymously, and a chairman synthesizes a verdict. Empirically outperforms adversarial debate (M3MADBench 2026, DMAD ICLR 2025). Use when: 'council this', 'run the council', 'council review', 'pressure-test this', 'st

5estrelas
Ver no GitHub ↗Autor: ngmeyerLicença: MIT

Council Review

Run any question, plan, or code through 5 independent advisors who use distinct reasoning methods, collaborate to refine answers, peer-review each other anonymously, and synthesize a verdict you can trust.

This skill implements the Diverse Multi-Agent Debate (DMAD) pattern. It is collaborative, not adversarial: agents seek truth through diversity of reasoning, not by arguing opposing positions.

Why This Works (Research Backing)

  • Method diversity beats single-method debate. DMAD (ICLR 2025) shows that agents using distinct reasoning methods reliably outperform homogeneous councils — diverse medium-capacity models can beat GPT-4 on GSM-8K (91% vs 82%) when each agent applies a different reasoning approach.
  • Collaborative debate beats adversarial debate. M3MADBench (2026) shows that across all modalities, collaborative DMAD outperforms adversarial Div-MAD "by a substantial margin." Adversarial paradigms introduce divergent noise; for open questions, plans, and decisions, collaborative deliberation is the right tool.
  • Anonymous peer review prevents provider bias. Universal across the literature — reviewers defer to role names if visible, so peer-review responses must be shuffled.
  • Confidence calibration breaks the martingale ceiling. Vanilla MAD often underperforms simple majority vote; confidence-modulated updates ("Demystifying MAD" 2026) systematically drift the council toward correct answers.
  • Adaptive stopping cuts cost. KS-statistic convergence detection (S2 MAD via llmcouncil) reports up to 94.5% cost reduction on convergent questions.

For stress-testing a known artifact (PR, draft, spec), use the separate /adversarial-review skill instead — single-critic adversarial probing is the right tool there.

When to Use

The council is for questions where being wrong is expensive.

Good for: Architecture decisions, implementation plans, PR reviews, product decisions, migration strategies, API design, naming, pricing, scope decisions Bad for: Factual lookups, writing tasks, simple yes/no, anything with one obvious right answer Use a different tool: Single-critic stress test of an existing artifact → /adversarial-review

Flags

FlagEffect
--quickLite mode: 3 advisors + chairman, no peer review (4 calls instead of 11)
--adaptiveKS-statistic adaptive stopping. Run multi-round debate; halt when response distributions converge below epsilon for two consecutive rounds. Up to 94.5% cost cut on convergent questions.
--confidenceConfidence-modulated synthesis. Each advisor rates own confidence (1–10) and rates each peer's confidence. Chairman synthesis is confidence-weighted, not majority-vote. Surfaces low-confidence consensus as a yellow flag.
--measure-diversityAfter advisors respond, score reasoning-footprint overlap across the responses. Report when the council agreed despite different reasoning methods — that's a signal the consensus may be theatrical.
--adversarialDEPRECATED. 2 advocates FOR + 2 skeptics AGAINST + 1 neutral. Retained for backward compatibility but contradicts M3MADBench evidence. Prefer /adversarial-review for single-critic stress tests.

Flags compose: /council-review --adaptive --confidence "Should we adopt GraphQL?" runs convergence-stopped, confidence-weighted deliberation.

The Five Advisors

#AdvisorAngleReasoning MethodCatches
1The ContrarianWhat will fail?Inversion — assume it shipped and failed, trace backward to the cause"Sounds great but..." gaps you skip when excited
2First Principles ThinkerWhat are we actually solving?Decomposition — break into atomic claims, challenge each one"You're optimizing the wrong variable"
3The ExpansionistWhat upside are we missing?Analogy — what adjacent domain solved this differently?"You're thinking too small"
4The OutsiderZero context, fresh eyes onlyNaive questioning — explain like you just joined; flag anything that requires insider knowledge to make senseCurse of knowledge blind spots
5The ExecutorWhat do you do Monday morning?Dependency graphing — what blocks what? What's the critical path?Brilliant plans with no actionable first step

Natural tensions: Contrarian vs Expansionist (downside vs upside), First Principles vs Executor (rethink vs ship it), Outsider keeps everyone honest.

The five reasoning methods are not interchangeable angles — each is a different cognitive operation. This is the DMAD lever: same model, different reasoning.


Execution Flow

Step 0: Pre-flight

<pre_flight>

Parse flags from $ARGUMENTS:

  • If --quick is present: use Lite Mode (see below)
  • If --adaptive is present: enable KS-statistic adaptive stopping (Step 3.5)
  • If --confidence is present: enable confidence-modulated synthesis (Steps 2 and 4)
  • If --measure-diversity is present: enable diversity verification (Step 2.5)
  • If --adversarial is present: use Adversarial Mode (deprecated, see below)
  • Remove flags from the input before classifying

Scope validation: Before convening the council, assess whether the input actually warrants it. If the question is purely factual, has one obvious right answer, or has no meaningful tradeoff, say so directly: "This doesn't need a council — [direct answer]. Use /council-review for decisions with genuine stakes and tradeoffs." Do not spawn agents for trivial questions.

Classify the remaining input:

  1. PR — Numeric value, or URL containing /pull/. Fetch PR diff and description via gh pr view.
  2. File path — String ending in a file extension or pointing to an existing file. Read the file contents.
  3. Plan/Decision/Question — Everything else. Use as-is.

For PRs and files, read the actual content and include it in the framed question. Don't just pass a URL — advisors need the substance.

</pre_flight>

Step 1: Gather Context and Frame

Auto-context gathering — before framing, read these project files (skip any that don't exist):

  • README.md — what the project does
  • CLAUDE.md or AGENTS.md — conventions, architecture, patterns
  • Recent git log (git log --oneline -10) — what's been happening
  • Any files the user referenced or that relate to the topic
  • PR diff and description if reviewing a PR

Reframe the raw input as a clear, neutral prompt:

QUESTION:
[Core decision, plan, or code being reviewed]

CONTEXT:
[Key context from project files: what the project does, constraints, recent changes, stakes]

WHAT'S AT STAKE:
[Why this matters — cost of getting it wrong]

Don't add your own opinion. Don't steer toward an answer. If too vague, ask ONE clarifying question before proceeding.

Step 2: Convene the Council (5 agents in parallel)

Launch all 5 advisors simultaneously using the Agent tool. Each advisor runs in parallel. Use a lightweight model (haiku) for advisors — they're doing focused analysis, not complex reasoning.

CRITICAL: Launch all 5 in a single message with 5 Agent tool calls. Sequential execution lets earlier responses bleed into later ones and defeats the purpose.

Each advisor gets this prompt:

You are [ADVISOR NAME] on an LLM Council reviewing a decision.

Your angle: [ADVISOR ANGLE]
Your reasoning method: [ADVISOR REASONING METHOD — see table above]

A user has brought this to the council:
---
[framed question from Step 1]
---

Apply your assigned reasoning method rigorously. Don't just state opinions — show your work using your method.

Rules:
- 150-300 words. No preamble. Straight into your analysis.
- Name specific risks, opportunities, or issues — not vague concerns.
- If reviewing code: cite specific files, functions, or patterns.
- If reviewing a plan: point to specific steps, gaps, or sequencing issue

Como adicionar

/plugin marketplace add ngmeyer/council-review

O comando exato pode variar conforme o repositório. Confira o README no GitHub.

Comentários · Nenhum comentário

Entre para comentar. Entrar

  • Ainda não há comentários. Seja o primeiro.