Replication Package Scaffold
Heritage and attribution
The structural conventions in this skill (single-entry-point principle, compact vs. build/analyze layouts, figure/table crosswalk, paper-consistency check, correction workflow, pre-release checklist) come from Yusaku Horiuchi's replication-package-guide. Horiuchi's repository README explicitly authorizes AI consumption: it is "designed to be read by humans and by coding agents such as Codex or Claude Code before they prepare, audit, or repair a replication package."
This skill is a modification, not a copy.
- Repackaged as procedural guidance for Claude Code (frontmatter, step-by-step instructions, quality checks).
- Folded in the FAIR principles (Findable, Accessible, Interoperable, Reusable; Wilkinson et al. 2016; GO FAIR) so the scaffolded package is platform-neutral.
- Dropped platform-specific upload mechanics. This skill builds and audits the local package. Uploading to Harvard Dataverse, OSF, Zenodo, a journal repository, or an institutional archive is left to the user and the platform's tools.
- Reorganized templates and checklists into a single self-contained skill.
Horiuchi's own caveat applies: "AI is useful for checking, reorganizing, documenting, and catching inconsistencies, but it should not be treated as a substitute for the author's judgment about which files, scripts, data sources, and results are actually part of the replication record." Use this skill as an assistant, not as a substitute for the author's judgment about what belongs in the public package.
If you publish a package built with this skill, cite Horiuchi's guide as the methodological source.
Standard
A replication package is ready when a competent reader can download it, open the package root, run one documented command, and regenerate the published results without hidden manual steps.
Minimum standard:
- One public entry point (
master.Rby convention;run_replication.Racceptable when that is the project convention). - One authoritative
README.md. - Relative paths only.
- Public data inputs, or clear restricted-data instructions.
- Codebook or data dictionary for every analysis-ready dataset.
- Figure/table crosswalk in paper order.
- Logs that record inputs, sample sizes, warnings, and session information.
- Public scripts that are numbered or otherwise ordered.
- No personal files, caches, credentials, or obsolete exploratory scripts in the public path.
Instructions
Step 1. Resolve the target directory
Use $ARGUMENTS if provided. Treat the argument as the path to the replication folder (relative or absolute). If the argument is empty, ask the user once for a path. If they decline, default to ./replication relative to the current working directory.
Normalize the path. Confirm whether the directory exists and whether it is empty.
Step 2. Decide on structure
Ask the user one question. Is data construction complex (restricted sources, scraping, API pulls, or expensive upstream work that produces analysis-ready data)?
- No → use compact.
- Yes → use build/analyze.
When in doubt, choose compact. Build/analyze is justified only when the build stage creates real complexity for users.
Step 3. Decide between scaffold and audit
- If the target directory is empty or does not exist → scaffold mode. Create the directory if needed, write the full skeleton.
- If the target directory contains files → audit mode. Read everything, compare against the pre-release checklist, report what is present, partial, or missing. Offer to fill in only the missing scaffolding (files that do not yet exist). Never overwrite an existing file without explicit user confirmation.
Step 4. Scaffold the tree
Compact structure (default):
<root>/
|-- README.md
|-- master.R
|-- LICENSE
|-- .gitignore
|-- data/
|-- code/
|-- docs/
| `-- crosswalk.md
`-- outputs/
|-- figures/
|-- tables/
`-- logs/
Build/analyze structure:
<root>/
|-- README.md
|-- master.R
|-- LICENSE
|-- .gitignore
|-- build/
| |-- data/
| |-- scripts/
| `-- output/
`-- analyze/
|-- data/
|-- scripts/
|-- figures/
|-- tables/
|-- docs/
| `-- crosswalk.md
`-- logs/
Create the directories first, then write the template files in Step 5. Leave data/, code/, scripts/, figures/, tables/, and logs/ empty (the user fills them with project content).
Step 5. Write template files
Use the templates in the Templates section below. Fill in placeholder fields (<paper title>, <authors>, etc.) with values the user provides; if a placeholder cannot be resolved from context, leave it as written and flag it in the final report so the user knows what to edit.
Step 6. Report
After scaffolding, output a short report with:
- The directory tree created (or the audit diff for audit mode).
- A list of placeholder fields the user must fill in.
- The next three actions the user should take (typically: fill in README placeholders, drop data into
data/, add scripts undercode/orbuild/scripts/andanalyze/scripts/).
Templates
README.md
# <paper title>
**Authors.** <author 1>, <author 2>, ...
**Journal.** <journal name>, <year>. DOI: <article DOI>
**Data DOI.** <data archive DOI>
**Verified.** <YYYY-MM-DD>
## What this package reproduces
<one paragraph: which figures, tables, and in-text numbers this package generates from which data.>
## How to run
From a fresh R session in the package root:
```r
source("master.R")
master.R runs the full public path end-to-end and writes session information and per-script logs to outputs/logs/ (compact) or analyze/logs/ (build/analyze).
Software requirements
- R <version>
- Required packages: <list>
- Operating system tested on: <list>
- Approximate runtime on the listed environment: <time>
A session_info.log is written by master.R on a successful run and records the exact package versions used.
Folder structure
<paste the actual tree from tree -L 2 or list manually>
Data sources
- <dataset 1> — <source, license, public or restricted, citation>.
- <dataset 2> — ...
If any input is restricted, document how a reader with access can obtain it and which files in this package depend on it.
File descriptions
master.R— public entry point.code/01_*.R— <what it does>.code/02_*.R— <what it does>.data/<file>.csv— <one-line description; seedocs/codebook.mdfor variables>.docs/crosswalk.md— paper-order map from figures/tables to scripts and outputs.outputs/figures/,outputs/tables/,outputs/logs/— generated bymaster.R.
Figure and table crosswalk
See docs/crosswalk.md. Every figure and table in the paper and its appendix appears there with the script that generates it and the output path.
Citation
<paper citation in journal style.>License
See LICENSE. <one sentence: data license, code license, any restrictions>.
Attribution
This package follows the structural conventions in Yusaku Horiuchi's replication-package-guide and the FAIR principles (Wilkinson et al. 2016, doi:10.1038/sdata.2016.18).
### `master.R`
```r
# master.R — public entry point for <paper title> replication package.
# Running this script regenerates every figure, table, and reported number
# from the public input data.
# Reproducibility
set.seed(20260101) # change to the seed used in the paper
options(stringsAsFactors = FALSE)
# Capture the start time and prepare the log directory
.start_time <- Sys.time()
log_dir <- "outputs/logs" # change to "analyze/logs" if build/analyze
if (!dir.exists(log_dir)) dir.create(log_dir, recursive = TRUE)
# Run sc