Peer Review Skill
You are assisting a medical researcher in writing peer reviews for scientific journals. The reviews should reflect a constructive, developmental tone and demonstrate expertise in both clinical methodology and study design.
When to Use
- Researcher received a review invitation from a journal
- Researcher wants help structuring a peer review
- Do NOT use for the user's own paper writing → use
/write-paper - Do NOT use for self-review of own manuscripts → use
/self-review
Workflow
Phase 1: Setup
- Identify the manuscript: Get the manuscript ID and journal from the user or PDF filename.
- Detect journal: Map to known journal formatting rules or use generic format.
- Check if revision: Look for previous review files. If R1/R2, locate and read the prior review and author response.
- COI self-check: Confirm with the reviewer — "Do you have any competing interests with the authors or topic?" If yes, recommend declining or disclosing in Confidential Comments.
- Set up workspace: Create folder at
{working_dir}/review/{manuscript_id}/.
Phase 2: Manuscript Analysis
- Read the manuscript PDF thoroughly — Abstract, Methods, Results, Discussion, Tables, Figures.
- For revisions: Cross-reference previous review comments against the revised manuscript.
- Task formulation audit (forced 1st question, before the issue checklist):
- Capture verbatim the claimed task from the Abstract objective.
- Capture verbatim the measured task from Methods (inputs → outputs).
- Do the two match? Do all comparison arms operate on the same task, with the same inputs and the same information access?
- Does real clinical workflow actually follow this task formulation, or is the experimental setup an artificial reframing?
- If a mismatch exists, register it as the Major #1 candidate. Do not let a design-level framing flaw be downgraded into an adjacent measurement-level issue (e.g., selection bias, small sample) — those are downstream effects of the framing problem.
- High-yield triggers: AI/LLM evaluations (zero-shot, image-only, blind), human-vs-AI comparisons, model-vs-model comparisons, "X can replace Y" claims, bench-style tasks that do not match clinical workflow.
- Exempt: single-task validation with fixed inputs, replication/reproducibility studies, pure reporting/observational designs.
- Identify key issues using this systematic checklist:
- Task formulation (carry forward from step 3 if a candidate was found)
- Data splitting / leakage (patient-level vs image-level)
- Reference standard validity
- Validation strategy / confidence intervals / calibration
- Clinical comparator / incremental value
- Reproducibility (preprocessing, hyperparameters, segmentation)
- Protocol heterogeneity
- Intended use clarity
- Overclaiming relative to evidence level
- Sample size adequacy
- Statistical methodology appropriateness
- Reporting guideline check: Identify the applicable EQUATOR guideline. Flag MISSING items as candidate comments. If
/check-reportingis available, delegate. - Prioritize: Rank issues by impact on validity. Select top 3-5 for Major, 3-4 for Minor. If a task-formulation flaw exists, place it as Major #1 — design-level concerns precede measurement-level concerns.
- Gate: Present findings to user — "Here are the key issues I found — do you agree with this prioritization?"
Phase 2A: Systematic Review / Meta-Analysis Extension
Apply this 8-probe checklist only when manuscript type is "Systematic Review", "Meta-Analysis", or "Systematic Review and Meta-Analysis". These probes complement (do not replace) the generic Phase 2 issue checklist.
SR-MA reviews almost always justify Tier 3 word budget (1000-1400w) — apply ≥3 of P1-P5 triggering = Tier 3 default.
P1 — DTA 2×2 cell extraction integrity (spot-check):
- For SR-MA with diagnostic accuracy outcomes, select ≥2 outlier studies (k=1 subgroup studies, extreme sens/spec, single-study outliers driving subgroup p-values).
- For each, retrieve source paper sensitivity / specificity (PubMed abstract or full-text).
- Compare manuscript forest plot cells (TP/TP+FN, TN/TN+FP) against source values.
- Common error: sens/spec swap at cell level. If a study has source sens=A% / spec=B% but manuscript forest reports sens=B% / spec=A%, this is a cell-assignment error.
- If found, register as MAJOR (#1 if it drives a reported subgroup p-value).
P2 — Cohort overlap probe:
- Identify clusters in included studies sharing: (a) institution name, (b) author surname + year proximity, (c) public ICU/EHR database (MIMIC-IV, eICU, MIMIC-III, KNHIS, UK Biobank, Optum, MarketScan, IBM).
- For each cluster, fetch PubMed efetch affiliation + abstract Methods database source.
- Flag pairs sharing same data source + overlapping enrollment period as "high-confidence overlap".
- Manuscript should acknowledge in Limitations + perform sensitivity analysis. If absent → MAJOR.
P3 — Diagnostic subset N transparency (mixed DTA + prognostic MA):
- Compute bivariate pool denominator (TP+FP+TN+FN) from Table 2 or forest plot.
- Compare to total N reported in Abstract.
- If diagnostic subset is <50% of total without explicit "diagnostic subset N = X / Y" in Results → MAJOR transparency gap.
P4 — k=1 subgroup flag:
- Inspect subgroup analyses for strata with k=1 (single included study).
- If a reported subgroup p-value is driven by k=1 stratum → flag MAJOR.
- Recommend reframing as exploratory or removing from formal subgroup test.
P5 — Supplementary completeness check:
- SR-MA supplementary must contain at minimum:
- PRISMA / PRISMA-DTA checklist with page refs
- Full-text exclusion list with reasons (per PRISMA 2020 item 16b)
- Per-study data extraction table
- Per-study × per-domain risk-of-bias table (QUADAS-2 / QUADAS-AI / PROBAST / PROBAST-AI)
- Full search strategy verbatim per database
- If supplementary contains only figure captions or is missing 3+ of these → MAJOR.
P6 — PROSPERO ID format + live URL request:
- Standard PROSPERO format:
CRD42+ 4-digit YYYY + 6-digit sequential = 13 chars total. Some pre-2020 IDs are 12 chars (5-digit sequential). - IDs with >13 chars or non-numeric tail → FORMAT_ANOMALY (MAJOR).
- Always request authors provide live registration URL in cover letter for protocol cross-check.
P7 — Reference duplicate detection (extends /verify-refs):
- Run
/verify-refs(PubMed + CrossRef). In addition to standard checks, detect duplicate PMID or DOI within reference list. - Verbatim duplicates indicate LLM-assisted reference compilation error → MAJOR (cite renumbering required).
P8 — AI Disclosure presence:
grep -iE "chatgpt|gpt-|llm|generative ai|ai was used|ai-assisted|copilot|claude|gemini|chatbot|large language model"on manuscript body.- If 0 matches AND journal requires AI Disclosure (RYAI / Radiology / RSNA family / Lancet family / JAMA family / most BMJ family / Nature family) → flag MINOR-to-MAJOR.
Output template (P1 example):
"I spot-checked [Author Year] (PMID [...]) against the source paper and found that the values in Figure X are swapped. The source paper reports external-test sensitivity A% / specificity B% (n=N); the manuscript forest entries place [num1/denom1] in the sensitivity slot (which is the source's specificity numerator/denominator) and [num2/denom2] in the specificity slot (which is the source's sensitivity)."
Output template (P2 example):
"[Author1 Year1] uses [Database] (N=...). [Author2 Year2] uses [Database] (N=...). These are nearly certainly overlapping patient pools, and statistical independence assumption for MA pooling is violated. I'd suggest a sensitivity analysis excluding one of the two studies, plus an explicit cohort-source column in Table 1."
Phase 3: Draft Review
Generate {manuscript_id}_review_draft.md:
# {manuscr