ql-deep-review — whole-feature post-implementation review
Purpose
quantum-loop's built-in two-stage review gate (ql-review: spec-compliance → code-quality) operates on ONE story at a time inside ONE worktree. It does not detect:
- Cross-story divergence (e.g., story A uses
'google'as a secret key while story B uses'google-api-key'for the same constant). - Post-merge regressions (test that was green in isolation breaks after integration).
- Drift from original user intent (paraphrase chain from intent → design → PRD → plan → code).
- Low-signal comments that look like findings but lack evidence (CRA actionability is 0.9-19.2% per Chowdhury 2604.03196; human baseline is ~60%).
ql-deep-review closes these gaps with a whole-feature review that runs AFTER all stories in a wave / feature pass the per-story gate.
When to use
- After
ql-executeemitsCOMPLETEfor a wave and before merging the feature branch to master. - After cherry-picking or merging a foreign branch whose conflict-resolution changed semantics.
- Manually, when suspicion of cross-story drift is high (e.g., follow-on work after a long autonomous run).
What it does NOT do
- Does not replace the per-story two-stage gate. Run
ql-reviewper story, then this. - Does not auto-fix findings. Produces a structured report; user or orchestrator drives action.
- Does not block merge autonomously. Emits a verdict + confidence; user decides.
Risk scoring (0-100)
Risk factors and weights (inspired by soliton's risk-adaptive dispatch):
| Factor | Weight | Measurement |
|---|---|---|
| Blast radius | 25 | count of files touched in wave × (max transitive callers of any touched symbol ÷ 100) |
| Change complexity | 15 | difftastic or cloc diff line count; tree-sitter function edit count |
| Sensitive paths | 20 | glob match: auth/, payment/, *.env*, *secret*, *password*, *token* |
| File size / scope | 10 | total LOC touched / number of files |
| AI-authored signal | 10 | git commit trailer Co-Authored-By: Claude, uniform-style heuristic |
| Test coverage gap | 10 | production files touched without corresponding test edits |
| Intent-drift signal | 10 | ql-intent-check CRITICAL findings count (optional input) |
Score → dispatch tier:
- 0-30 LOW: 2 reviewers (code-reviewer, synthesizer). Target turnaround 2-3 min.
- 31-60 MEDIUM: 4 reviewers (+security-reviewer, test-engineer). 4-6 min.
- 61-80 HIGH: 6 reviewers (+critic, architect). 6-10 min.
- 81-100 CRITICAL: 7 reviewers (+cross-provider critic using codex or gemini via
/ask). 10-15 min plus manual inspection.
Reviewer agents (dispatched per tier)
All agents are invoked via the Agent tool in parallel. Each receives:
BASE_SHA..HEAD_SHA— whole-feature diff scopePRD_PATH— path to the feature PRDSTORY_LIST— JSON list of stories executed with their IDs and statusINTENT_SNAPSHOT— verbatim user intent (fromquantum.json.userIntentif present)CHANGED_FILES— file-list manifest
Tier-core reviewers (always dispatched)
oh-my-claudecode:code-reviewer— severity-rated findings (CRITICAL / HIGH / MEDIUM / LOW) with line-level evidence.soliton:synthesizer— risk-adaptive PR-style review; contributes a reviewer-side risk score and a READY_TO_MERGE / NEEDS_REWORK / BLOCKED verdict.
Tier-MEDIUM additions
oh-my-claudecode:security-reviewer— OWASP Top 10 + secret exposure + input validation; hard-dispatched when sensitive-paths factor > 0.oh-my-claudecode:test-engineer— test-quality audit: AC-to-test mapping, over-mock detection (Hora & Robbes 2026), missing edge cases.
Tier-HIGH additions
oh-my-claudecode:critic— multi-perspective adversarial critique; self-audit + Realist Check.oh-my-claudecode:architect— architectural review: SOLID, layering, cross-cutting concerns.
Tier-CRITICAL additions
- Cross-provider critic — via
omc ask codex --agent-prompt critic(Codex reviews Claude's output) ORomc ask gemini. Different failure modes → higher catch rate.
Actionability filter (the Chowdhury 2604.03196 fix)
Every finding returned by a reviewer MUST include:
file(string, path)lineorline_start+line_end(integer)evidence_type: one ofcode-reference/command-output/spec-citation/test-failure/diff-hunkseverity:critical/high/medium/low/infoconfidence: 0-100
Findings missing any required field are moved to a suppressed[] array with reason "no actionable evidence." Surface count to the user; do not silently drop.
Synthesis
Dedup
Group findings by (file, line_start, severity); merge identical claims from different reviewers by concatenating agents array. Increases confidence when multiple reviewers agree (per MARS 2509.20502).
Conflict detection
Two findings on the same (file, line) with opposed verdicts (e.g., one says "introduce abstraction", another says "remove abstraction") are flagged in a conflicts[] block for user arbitration.
Hallucination check
For every finding that cites a file / symbol / API:
- Verify the file exists:
[ -f "$file" ]. - Verify the symbol is reachable (grep for declaration).
- Verify commands in
suggested_fixactually match project toolchain. Findings that fail this check move tosuppressed[]with reason "reviewer hallucinated target."
Meta-review
The orchestrator (or this skill's own synthesis step) produces:
- Overall verdict:
APPROVE/APPROVE_WITH_COMMENTS/REQUEST_CHANGES/BLOCKS_MERGE. - Critical blockers (severity=critical with confidence ≥80).
- High-priority issues (severity=high with confidence ≥70).
- Kudos (explicitly captured positive signals — what was done well).
- Suppressed findings count (transparency about what was dropped).
Output format
Emits a single JSON artifact at quantum.reviews[<feature-id>].deepReview:
{
"feature_id": "<prd-id or feature-slug>",
"base_sha": "<before-first-story-commit>",
"head_sha": "<after-last-story-commit>",
"files_changed": 12,
"stories_included": ["US-001", "US-002", ...],
"timestamp": "<ISO 8601>",
"risk_score": 47,
"tier": "MEDIUM",
"reviewers_dispatched": ["code-reviewer", "synthesizer", "security-reviewer", "test-engineer"],
"findings": [
{
"id": "F-001",
"agents": ["code-reviewer", "synthesizer"],
"severity": "high",
"confidence": 88,
"category": "correctness",
"file": "src/auth/session.ts",
"line_start": 42, "line_end": 48,
"evidence_type": "code-reference",
"description": "<what>",
"suggested_fix": "<how>",
"cites": ["PRD AC-3", "tests/auth.test.ts:100"]
}
],
"conflicts": [],
"suppressed": [{"agent": "architect", "reason": "no line citation", "count": 2}],
"kudos": ["Clean separation of concerns in the new token-refresh flow"],
"verdict": "APPROVE_WITH_COMMENTS",
"blockers": [],
"high_priority": ["F-001", "F-003"]
}
Also emits a human-readable markdown summary to docs/reviews/<feature-id>-deep-review.md.
Anti-rationalization guards
| The agent says… | The truth is… |
|---|---|
| "We already did per-story review, this is redundant" | Per-story review is story-LOCAL. Cross-story + whole-feature review catches different defects. Both are required. |
| "Risk score is LOW, skip the deep review" | Run LOW-tier anyway (2 reviewers, 2-3 min). The cost is a rounding error on a multi-hour autonomous run. |
| "Reviewer didn't cite evidence but it's clearly right" | Without evidence the finding is an opinion. Suppress it. Low-signal findings lower the whole reviewer distribution per Chowdhury 2604.03196. |
| "Conflict between two reviewers means one is wrong — pick the stronger" | No. Log the conflict and let the user arbitrate. Silent pick is a different |