Sync Submission
You help keep the canonical manuscript and journal-specific submission packages
from drifting apart. The skill treats submission/{journal}/ as derived output
and records whether it is current, stale, or frozen.
When to Use
- Before submitting a journal package.
- After a journal portal or Word editor changed a submission manuscript.
- After rejection, before retargeting to another journal.
- Before
/orchestrate --e2emarks a project as submission-ready.
Inputs
- Project root containing
project.yaml, or a direct canonical manuscript path. - Journal short name, e.g.
chest,ryai,academic_radiology. - Optional mode:
audit: compare existing submission against canonical source.build: copy canonical source intosubmission/{journal}/manuscript/and write metadata.freeze: mark a package as submitted/frozen.
Deterministic Script
python "${CLAUDE_SKILL_DIR}/scripts/sync_submission.py" audit --project-root . --journal chest
python "${CLAUDE_SKILL_DIR}/scripts/sync_submission.py" build --project-root . --journal chest
python "${CLAUDE_SKILL_DIR}/scripts/sync_submission.py" freeze --project-root . --journal chest --status submitted
For double-blind journals, sweep author identifiers across all upload artifacts:
python "${CLAUDE_SKILL_DIR}/scripts/blind_sweep.py" \
--registry _shared/authors/author_registry.yaml \
--files submission/{journal}/supplementary/*.md submission/{journal}/cover_letter.md \
--backup-dir .cache/blind_sweep_backup
The registry is a project-local YAML mapping author identifiers (full names, native scripts, initials with/without periods, email, ORCID) to role labels (e.g., "Reviewer 1"). See scripts/author_registry_example.yaml for schema. Never commit a populated registry to a public repository — keep it next to the manuscript.
Output Contract
| Artifact | Path | Purpose |
|---|---|---|
| Submission metadata | submission/{journal}/.journal_meta.json | Source hash, status, canonical path |
| Sync audit | qc/submission_sync_{journal}.json | Drift result consumed by orchestrator |
| Manifest update | artifact_manifest.json | Submission package registry |
Workflow
- Resolve canonical manuscript from
project.yamlor explicit input. - Run the script in the requested mode.
- If
auditreportsDRIFT, do not retarget or freeze until the user either patches the canonical manuscript or records the difference as journal-only. - If
buildsucceeds, run/verify-refsbefore final submission.
Quality Gates
- Gate 1: block freezing when canonical manuscript is missing.
- Gate 2: block retargeting when the previous submission has unresolved drift.
- Gate 3: require
/verify-refsaudit before marking a package submission-safe. - Gate 4: docx audits must use a recursive walk (paragraphs + tables + nested-table cells); a flat
document.paragraphsscan is insufficient. - Gate 5: before freeze, confirm portal free-text fields (cover letter, data availability, acknowledgements, abstract, author contributions) match the manuscript body.
- Gate 6 (double-blind journals): before freeze, export the portal's blinded review PDF and grep for all author identifiers across the entire upload set — manuscript, supplementary, cover letter, registry record PDFs (PROSPERO/ClinicalTrials), portal Letter-field text. A clean manuscript blind does not imply a clean portal blind.
- Gate 7 (text-only docx rebuilds): never use
pandoc --reference-doc=manuscript.docxfor response/cover/supplementary text-only docx — the reference docx ships its embedded media (figure files) into the new docx, bloating size 50–100×. Use plainpandoc input.md -o output.docxfor text-only artifacts. - Gate 5b (Phase 4 cover-letter free-text drift): before freeze, run
scripts/cover_letter_drift_check.pyto verify the cover letter's word-count / reference-count / table-figure-count claims still match the manuscript. Cover letters routinely go stale across v_N → v_(N+1) branching and are not covered by any docx-level audit. See "Phase 4 — Cover-letter free-text drift" below. - Gate 8 (Phase 5 cross-document N consistency): before freeze, run
scripts/cross_document_n_check.pyover the manuscript bundle (abstract, body, PROSPERO record, cover letter, supplementary, INDEX, PRISMA flow caption). Any N category with >1 distinct integer value is a P0 drift. When aFINAL_POOL_LOCK.yamlis present, supply--pool-lockto make the locked counts the authoritative baseline. See "Phase 5 — Cross-document N consistency" below. - Gate 9 (Phase 6 intra-manuscript scope drift): run
scripts/scope_drift_check.pyagainst the manuscript (and optionally the PROSPERO record). Numeric anchors (AUC, OR/HR/RR, sensitivity/specificity) appearing in Limitations / Discussion but absent from Methods + Results are P0 SCOPE_DRIFT. PROSPERO ↔ Methods synthesis-method disagreement is a P0 PROSPERO_DRIFT. - Gate 10 (Phase 7 v_(N+1) docx regeneration): when building a new submission from a frozen prior version, run
scripts/verify_package_integrity.py --assert-vN-docx-changed --vN-docx <prev>.docx --new-docx <next>.docx. Identical MD5 = unmodified seed copy = block submission. Defense-in-depth — required even when the upstream pipeline appears to have regenerated the docx.
Phase 4 — Cover-letter free-text drift
Cover letters live outside the submission docx files but are read by the
editor side-by-side with the manuscript. Their ## Article details
block — body word count, abstract word count, reference count,
table/figure count — is a sidecar SSOT that routinely goes stale when a
manuscript branches v_N → v_(N+1) (word limit retarget, abstract
restructure, late reference batch).
scripts/cover_letter_drift_check.py measures the manuscript truth and
compares it to the cover letter's numeric claims:
python "${CLAUDE_SKILL_DIR}/scripts/cover_letter_drift_check.py" \
--manuscript manuscript.md \
--cover-letter cover_letter.md \
--refs refs.bib \
--out qc/cover_letter_drift.json
Body words are matched with a 5% tolerance ("approximately N words" phrasing). Abstract words tolerate ±5. Reference / table / figure counts require exact match.
Output qc/cover_letter_drift.json:
{
"submission_safe": false,
"truth": {"body_words": 3036, "abstract_words": 319, "references": 12,
"tables": 3, "figures": 4},
"claims": {"body_words": 3790, "abstract_words": 250, "references": 12},
"drifts": [
{"field": "body_words", "truth": 3036, "cover_letter_claim": 3790,
"severity": "MAJOR",
"note": "|claim - truth| = 754 > tolerance 151"}
]
}
Drift resolution: regenerate the cover letter from the manuscript at v_(N+1) build time. The script never edits the cover letter — that is left to the manuscript build pipeline so the cover letter stays a deliberate authored artifact.
Phase 5 — Cross-document N consistency
Multi-document cohort-size drift is a high-frequency desk-reject pattern.
Manuscript abstracts, body prose, PROSPERO records, supplementary extraction
sheets, and PRISMA flow captions all repeat the same k included / k excluded
/ N patients totals — and any disagreement between them is read by reviewers
as either a data-integrity failure or a late-edit failure. Either reading
ends the round.
scripts/cross_document_n_check.py scans the submission package, extracts
every "N <noun>" claim by category (patients, cases, included, excluded,
nodules, tumors, studies_total), and groups them by category. A category with
more than one distinct integer value is a P0 drift.
python "${CLAUDE_SKILL_DIR}/scripts/cross_document_n_check.py" \
--root . \
--out qc/cross_document_n.json
When the project has frozen a 2_Data/FINAL_POOL_LOCK.yaml from /meta-analysis
Phase 3f.5, pass it as the authoritative anchor:
python "${CLAUDE_SKILL_DIR}/scripts/cross_document_n_check.py" \
--root .