Agent Review Panel v3.3.0
A multi-agent adversarial review system based on nine research foundations: ChatEval (ICLR 2024), AutoGen, Du et al. (ICML 2024), MachineSoM (ACL 2024), DebateLLM, DMAD (ICLR 2025), "Talk Isn't Always Cheap" (ICML 2025), CONSENSAGENT (ACL 2025), Trust or Escalate (ICLR 2025 Oral).
When NOT to Use This Skill
Do NOT trigger for these requests — they need single-agent handling or other skills:
- Single code review ("review this function for bugs")
- Quick sanity checks ("just a quick look before I push")
- Bug fixes ("fix the type errors", "fix the failing test")
- Peer review without multi-perspective signal ("peer review this doc")
- Code explanation ("what does this code do?")
- Deployment tasks ("deploy to staging")
- Addressing existing feedback ("address the PR comments")
- Skill improvement ("make this skill better") → use schliff
- Writing tests, READMEs, or documentation
- Asking for a single opinion ("what do you think?", "is this any good?")
The key signal is multiple independent perspectives — if the user wants one opinion, don't launch a panel.
Input
This skill takes as input one or more of: file paths to review, inline code/text in the conversation, a git diff or PR reference, or a plan/design document. It expects the user to specify (or let it auto-detect) what to review.
Dependencies
This skill depends on the Agent tool to launch parallel subagent reviewers and
requires bash for context gathering (grep, file reads). All agents MUST use
model: "opus". This includes VoltAgent specialist agents launched via
subagent_type — always pass model: "opus" explicitly alongside
subagent_type to override the agent's default model. Omitting it causes
the launched agent to fall through to its own frontmatter-declared model
(which may be sonnet or haiku), introducing cross-run reasoning variance.
Knowledge mining reads from memory paths if they exist; if not available,
it degrades gracefully — no hard dependency.
HTML report CDN dependencies (Phase 15.3 output file only): The generated
review_panel_report.html loads Tailwind CSS, Chart.js, and — new in v2.15 —
Prism.js from CDN for syntax highlighting in the Code Evidence sections of
expandable issue cards. If the CDNs are unreachable, the HTML degrades
gracefully: layout and text remain readable, charts show a placeholder, code
blocks render as unstyled monospace.
Optional enhancement: When VoltAgent specialist agents are installed, the panel can use them instead of generic persona-prompted agents for stronger domain-specific reviews. See "VoltAgent Integration" section below.
This skill is scoped to multi-perspective adversarial review. For skill improvement requests, use schliff instead. For post-review plan updates, use plan-review-integrator. Supported versions: Claude Code v1.0+.
Examples
Example 1: Code review panel
Input: "Do a review panel on src/auth/middleware.ts — I want multiple perspectives before merging"
Output: Classifies as pure code → selects Correctness Hawk + Architecture Critic + Security Auditor + Devil's Advocate → gathers context → 4 parallel reviewers → 2 debate rounds → completeness audit → claim verification → supreme judge → writes review_panel_report.md
Example 2: Mixed content with deep research Input: "Deep review of our migration plan — it includes SQL and Terraform" Output: Classifies as mixed → adds Code Quality Auditor + Data Quality Auditor (SQL signal) + Reliability/SRE (infra signal) → runs web research for best practices → full panel → report with epistemic labels
Process Overview
Phase 1: Setup → Identify work, pick personas, define criteria
Phase 2: Data Flow Trace → Trace critical path(s), document schemas [code only] (v2.14)
Phase 3: Independent Review → All reviewers evaluate in parallel (no cross-talk)
Phase 4: Private Reflection → Each reviewer re-reads source, rates own confidence
Phase 5: Debate (rounds 1–3) → Reviewers engage with each other + find new issues
Phase 6: Round Summarization → Distill resolved/unresolved points between rounds
Phase 7: Blind Final → Each reviewer gives final score independently
Phase 8: Completeness Audit → Dedicated agent scans for what the panel missed
Phase 9: Verify Commands → Run up to 5 reviewer verification commands (advisory)
Phase 10: Claim Verification → Verify all line-number citations against source
Phase 11: Severity Verification → Read actual code for every P0/P1, downgrade if overstated + web-verify external domain claims (v2.16.3)
Phase 12: Verification Tier Assign → Confidence draft (12a) + judge-advised refinement (12b)
Phase 13: Targeted Verification → Persona-matched agents dispatched per dispute point
Phase 14: Supreme Judge → Opus arbitrates everything including verification round
Phase 14.5: Post-Judge Verification → Re-verify judge-introduced P0/P1 against ground truth (v3.2.0)
Phase 15: Output Generation → (parent) Three output files (all sequential: 15.1 → 15.2 → 15.3)
Phase 15.1: Primary Markdown Report → Structured markdown summary (review_panel_report.md)
Phase 15.2: Process History → Full director's-cut log (review_panel_process.md)
Phase 15.3: HTML Report → Interactive dashboard (review_panel_report.html)
[Multi-Run mode (--runs N > 1): repeat Phases 2–15 with rotated personas, then:]
Phase 16: Merge → Deduplicate, score stability, produce merged report (v2.14)
Phase 1: Setup
Identify the Work
The user provides: file paths, inline content, git diff/PR, or a plan/design doc. Collect full content, then run Context Gathering (below).
Classify content type (matters for persona selection):
- Pure code — only code files
- Pure plan/design — architecture docs, proposals, RFCs
- Mixed — plans with code snippets, SQL, or config
- Documentation — READMEs, guides, API docs
Review Mode Detection (v2.8)
Auto-detect review mode from content type. No user toggle.
| Content Type | Review Mode | Behavior |
|---|---|---|
| Pure code | Precise | Every finding MUST cite a specific file, line number, or code snippet. Findings without concrete evidence are demoted to [UNVERIFIED]. |
| Pure plan/design | Exhaustive | Broader risk identification allowed. Findings may reference design sections or architectural patterns without line-number evidence. |
| Mixed | Precise for code, Exhaustive for prose | Reviewers label each finding with its mode. Code findings without line citations are demoted. |
| Documentation | Exhaustive | Same as plan/design. |
The detected mode is injected into Phase 3 reviewer prompts and the judge prompt. Report header states the detected mode.
Detect Content Signals
Scan work for technology-specific signals (case-insensitive, 3+ keyword threshold).
See references/signals-and-checklists.md for the full detection table and domain
checklists. Signal detection only fires when auto-selecting personas.
Context Gathering
Run these steps before launching reviewers for file-path reviews. Skipping is the #1 cause of incorrect [CRITICAL] recommendations.
-
Sibling Directory Scan — From reviewed files' parent, scan for
docs/,README*,CLAUDE.md,config.py,package.json, etc. Read first 50 lines of each. If files are nested, scan both immediate parent and project root. -
Reference Tracing — Scan for imports, config references, cross-file references in comments, SQL table references, file path strings.
-
Safety Mechanism Discovery — Grep reviewed code + imports for:
_valid,_flag,_guard,_check,_mask,<= target_date,BETWEEN,fillna,COALESCE,try/except,DELETE FROM,MERGE,WRITE_TRUNCATE,upsert,idempoten,--dry-run,duplicate, `assertion