Agent Review Panel v3.3.0

A multi-agent adversarial review system based on nine research foundations: ChatEval (ICLR 2024), AutoGen, Du et al. (ICML 2024), MachineSoM (ACL 2024), DebateLLM, DMAD (ICLR 2025), "Talk Isn't Always Cheap" (ICML 2025), CONSENSAGENT (ACL 2025), Trust or Escalate (ICLR 2025 Oral).

When NOT to Use This Skill

Do NOT trigger for these requests — they need single-agent handling or other skills:

Single code review ("review this function for bugs")
Quick sanity checks ("just a quick look before I push")
Bug fixes ("fix the type errors", "fix the failing test")
Peer review without multi-perspective signal ("peer review this doc")
Code explanation ("what does this code do?")
Deployment tasks ("deploy to staging")
Addressing existing feedback ("address the PR comments")
Skill improvement ("make this skill better") → use schliff
Writing tests, READMEs, or documentation
Asking for a single opinion ("what do you think?", "is this any good?")

The key signal is multiple independent perspectives — if the user wants one opinion, don't launch a panel.

Input

This skill takes as input one or more of: file paths to review, inline code/text in the conversation, a git diff or PR reference, or a plan/design document. It expects the user to specify (or let it auto-detect) what to review.

Dependencies

This skill depends on the Agent tool to launch parallel subagent reviewers and requires bash for context gathering (grep, file reads). All agents MUST use model: "opus". This includes VoltAgent specialist agents launched via subagent_type — always pass model: "opus" explicitly alongside subagent_type to override the agent's default model. Omitting it causes the launched agent to fall through to its own frontmatter-declared model (which may be sonnet or haiku), introducing cross-run reasoning variance. Knowledge mining reads from memory paths if they exist; if not available, it degrades gracefully — no hard dependency.

HTML report CDN dependencies (Phase 15.3 output file only): The generated review_panel_report.html loads Tailwind CSS, Chart.js, and — new in v2.15 — Prism.js from CDN for syntax highlighting in the Code Evidence sections of expandable issue cards. If the CDNs are unreachable, the HTML degrades gracefully: layout and text remain readable, charts show a placeholder, code blocks render as unstyled monospace.

Optional enhancement: When VoltAgent specialist agents are installed, the panel can use them instead of generic persona-prompted agents for stronger domain-specific reviews. See "VoltAgent Integration" section below.

This skill is scoped to multi-perspective adversarial review. For skill improvement requests, use schliff instead. For post-review plan updates, use plan-review-integrator. Supported versions: Claude Code v1.0+.

Examples

Example 1: Code review panel Input: "Do a review panel on src/auth/middleware.ts — I want multiple perspectives before merging" Output: Classifies as pure code → selects Correctness Hawk + Architecture Critic + Security Auditor + Devil's Advocate → gathers context → 4 parallel reviewers → 2 debate rounds → completeness audit → claim verification → supreme judge → writes review_panel_report.md

Example 2: Mixed content with deep research Input: "Deep review of our migration plan — it includes SQL and Terraform" Output: Classifies as mixed → adds Code Quality Auditor + Data Quality Auditor (SQL signal) + Reliability/SRE (infra signal) → runs web research for best practices → full panel → report with epistemic labels

Process Overview

Phase 1:    Setup                     → Identify work, pick personas, define criteria
Phase 2:    Data Flow Trace           → Trace critical path(s), document schemas [code only] (v2.14)
Phase 3:    Independent Review        → All reviewers evaluate in parallel (no cross-talk)
Phase 4:    Private Reflection        → Each reviewer re-reads source, rates own confidence
Phase 5:    Debate (rounds 1–3)       → Reviewers engage with each other + find new issues
Phase 6:    Round Summarization       → Distill resolved/unresolved points between rounds
Phase 7:    Blind Final               → Each reviewer gives final score independently
Phase 8:    Completeness Audit        → Dedicated agent scans for what the panel missed
Phase 9:    Verify Commands           → Run up to 5 reviewer verification commands (advisory)
Phase 10:   Claim Verification        → Verify all line-number citations against source
Phase 11:   Severity Verification     → Read actual code for every P0/P1, downgrade if overstated + web-verify external domain claims (v2.16.3)
Phase 12:   Verification Tier Assign  → Confidence draft (12a) + judge-advised refinement (12b)
Phase 13:   Targeted Verification     → Persona-matched agents dispatched per dispute point
Phase 14:   Supreme Judge             → Opus arbitrates everything including verification round
Phase 14.5: Post-Judge Verification   → Re-verify judge-introduced P0/P1 against ground truth (v3.2.0)
Phase 15:   Output Generation         → (parent) Three output files (all sequential: 15.1 → 15.2 → 15.3)
  Phase 15.1: Primary Markdown Report → Structured markdown summary (review_panel_report.md)
  Phase 15.2: Process History         → Full director's-cut log (review_panel_process.md)
  Phase 15.3: HTML Report             → Interactive dashboard (review_panel_report.html)

[Multi-Run mode (--runs N > 1): repeat Phases 2–15 with rotated personas, then:]
Phase 16:   Merge                     → Deduplicate, score stability, produce merged report (v2.14)

Phase 1: Setup

Identify the Work

The user provides: file paths, inline content, git diff/PR, or a plan/design doc. Collect full content, then run Context Gathering (below).

Classify content type (matters for persona selection):

Pure code — only code files
Pure plan/design — architecture docs, proposals, RFCs
Mixed — plans with code snippets, SQL, or config
Documentation — READMEs, guides, API docs

Review Mode Detection (v2.8)

Auto-detect review mode from content type. No user toggle.

Content Type	Review Mode	Behavior
Pure code	Precise	Every finding MUST cite a specific file, line number, or code snippet. Findings without concrete evidence are demoted to [UNVERIFIED].
Pure plan/design	Exhaustive	Broader risk identification allowed. Findings may reference design sections or architectural patterns without line-number evidence.
Mixed	Precise for code, Exhaustive for prose	Reviewers label each finding with its mode. Code findings without line citations are demoted.
Documentation	Exhaustive	Same as plan/design.

The detected mode is injected into Phase 3 reviewer prompts and the judge prompt. Report header states the detected mode.

Detect Content Signals

Scan work for technology-specific signals (case-insensitive, 3+ keyword threshold). See references/signals-and-checklists.md for the full detection table and domain checklists. Signal detection only fires when auto-selecting personas.

Context Gathering

Run these steps before launching reviewers for file-path reviews. Skipping is the #1 cause of incorrect [CRITICAL] recommendations.

Sibling Directory Scan — From reviewed files' parent, scan for docs/, README*, CLAUDE.md, config.py, package.json, etc. Read first 50 lines of each. If files are nested, scan both immediate parent and project root.
Reference Tracing — Scan for imports, config references, cross-file references in comments, SQL table references, file path strings.
Safety Mechanism Discovery — Grep reviewed code + imports for: _valid, _flag, _guard, _check, _mask, <= target_date, BETWEEN, fillna, COALESCE, try/except, DELETE FROM, MERGE, WRITE_TRUNCATE, upsert, idempoten, --dry-run, duplicate, `assertion

agent-review-panel

How to add

Drop this on your repo README

Related skills

webapp-testing

brand-guidelines

frontend-design

mcp-builder

Get new Design e Frontend skills every Monday