Self-Review Skill

You are helping a medical researcher check their own manuscript before journal submission. The goal is to anticipate reviewer comments by applying the same critical lens used in peer review across medical journals.

This is NOT about writing a review. It's about producing an actionable list of anticipated reviewer comments with specific fix suggestions, so the manuscript can be strengthened before reviewers ever see it.

Optional Flags

--fix: After generating the review report, automatically apply fixes for all issues where fixable_by_ai is true. Edits the manuscript in place, then reports a diff summary. Does NOT fix issues marked fixable_by_ai: false (e.g., missing data, design flaws). Maximum 2 fix-and-re-review iterations.
--json: Output the structured JSON block (see Phase 3c below) in addition to the markdown report. Default when called from /write-paper Phase 7.

Severity Framing

When flagging issues, classify severity:

Fatal: Fundamental design flaw that cannot be fixed with existing data (e.g., data leakage that invalidates all results, absence of any reference standard, label-feature circularity). The manuscript likely needs redesign. Submission would likely result in Reject.
Fixable: Significant but addressable with existing data (e.g., missing calibration analysis, unclear exclusion criteria, absent CIs, incomplete reporting). These are the most actionable findings.

Most issues are Fixable. Reserve Fatal for true design-level problems.

Workflow

Phase 1: Intake

Get the manuscript -- PDF, Word doc, or pasted text.
Ask the user:
- Target journal? (affects reporting standards and scope expectations)
- Manuscript type? (original research / review / technical note / letter / meta-analysis / case report)
- Anything they're already worried about?
Read the full manuscript.

Phase 2: Systematic Check

Run the manuscript through each applicable category below. For each item, assess whether a reviewer would raise it as a Major or Minor comment.

Use the Research-Type Adaptation table (below) to determine which categories apply fully, partially, or not at all for the given manuscript type.

A. Study Design & Data Integrity

Check	What to look for
Patient-level splitting	Are train/val/test splits at the patient level? Is this explicitly stated?
Leakage risk	Any postoperative variable used in a preoperative model? Cohort-wide preprocessing before split?
Temporal independence	Random split within same institution = no temporal independence. Acknowledged?
Analysis unit clarity	Patient vs exam vs lesion vs image -- is the unit consistent throughout?
Sample size per class	For the test set specifically -- are there enough cases per class for stable metrics?

B. Reference Standard & Ground Truth

Check	What to look for
Definition specificity	Is the reference standard precisely defined? (e.g., "pathological T stage" vs vague "staging")
Timing	Interval between index test and reference standard reported?
Independence	Were ground truth annotators independent from the comparator readers?
Annotation protocol	Number of readers, consensus method, blinding, inter-reader agreement reported?

C. Validation & Statistical Reporting

Check	What to look for
Confidence intervals	All primary metrics have 95% CIs?
Calibration [CRITICAL]	Prediction models: calibration plot + Brier score or slope/intercept MUST be present. AUC alone is insufficient -- mark as Major if absent
Clinical comparator	Is there a clinical-only baseline to show incremental value?
DCA / net benefit	For clinical decision tools: decision curve analysis present?
Multiple comparisons	If many tests: acknowledged as exploratory, or correction applied?
Paired statistics	If same patients compared across modalities: paired tests used (McNemar, DeLong)?

D. Clinical Framing & Importance

Check	What to look for
Intended use	Is the clinical decision point clearly stated? (triage vs diagnosis vs prognosis vs monitoring)
Overclaiming	Does language match evidence? ("will improve" -> "may potentially"; "superior" with overlapping CIs?)
Terminology precision	Key terms defined? (e.g., "perioperative" = when exactly?)
Title-content alignment	Does the title accurately reflect what was actually done?
Novelty statement	What does this study add beyond existing literature? Is this explicitly stated?
Clinical importance	Would the findings change clinical practice or research direction? Is this articulated?

E. Reproducibility

Check	What to look for
Preprocessing details	All steps listed in order? Normalization, augmentation, resampling specified?
Model details	Architecture, optimizer, LR, batch size, epochs, early stopping reported?
Segmentation protocol	ROI definition, reader experience, blinding, tool used?
Hardware/software	Inference environment, software versions, code availability?
Scanner/protocol info	For imaging studies: scanner model, sequence parameters, contrast protocol?
Data/code availability	Is a data availability statement included? Code shared or reason for not sharing stated?

F. Reporting Completeness

Check	What to look for
Abstract-body consistency	Numbers in Abstract match Tables/Results?
Table/Figure accuracy	Cross-check key values between tables, figures, and text
Follow-up duration	For survival/prognosis: median follow-up with IQR reported?
Ethics	All participating institutions' IRB approval documented? Patient consent described?
Missing data	Handling of incomplete cases described?
CONSORT/STARD/TRIPOD flow	Appropriate flow diagram present with patient counts at each step?
Funding & COI	Funding sources and competing interests disclosed?

G. Reporting Guideline Compliance

Match the manuscript type to the appropriate checklist and verify key items:

Manuscript type	Checklist	Critical items to verify
Diagnostic accuracy	STARD / STARD-AI	Flow diagram, reference standard, spectrum
Prediction model (non-AI)	TRIPOD 2015	Model development vs validation, calibration, missing data
Prediction model (AI/ML)	TRIPOD+AI 2024	Model development vs validation, calibration, leakage, fairness
AI / Radiomics	CLAIM 2024 / CLEAR	Feature selection transparency, external validation
RCT	CONSORT / CONSORT-AI	Randomization, blinding, ITT
Systematic review (interventions)	PRISMA 2020	Search strategy, screening, risk of bias
Meta-analysis (observational)	MOOSE + PRISMA 2020	Confounding assessment, heterogeneity, publication bias
Observational	STROBE	Confounding, selection bias, missing data
Reliability / agreement	GRRAS	ICC model/type, rater description, measurement protocol
Educational	SQUIRE 2.0	Intervention description, outcome measures, context
Case report	CARE	Timeline, diagnostic reasoning, informed consent
Surgical	STROBE-Surgery	Surgeon experience, technique details, complications

For a full item-by-item audit, run /check-reporting on this manuscript. If it has already been run, reference its results and flag any MISSING items as Anticipated Major/Minor Comments. If not yet run, flag: "Full reporting guideline compliance not yet audited -- run /check-reporting before submission for item-level assessment."

H. Circularity

Check	What to look for
Label-feature overlap	Is the prediction label derived from the same data source as any input features? (e.g., NLP-extracted label + text-derived fea

self-review

How to add

Drop this on your repo README

Related skills

dev-browser

agent-browser

understand-chat

understand-dashboard

Get new Pesquisa e Web skills every Monday