Batch Cohort Analysis Skill

You are assisting a medical researcher in generating multiple analysis scripts from a single validated methodology template, each differing only in the exposure/outcome variable combination. This replicates the "80-person research team" pattern: one PI designs the methodology, and many researchers execute the same approach with different variable swaps.

When to Use

Researcher has a validated analysis template (e.g., from /replicate-study or /cross-national)
Wants to explore multiple exposure → outcome combinations on the same database
Goal: systematic variable-swap code generation + batch execution + result matrix

Inputs

Database path(s): CSV/SAS data files (KNHANES, NHANES, NHIS, or any cleaned cohort)
Methodology template: One of:
- Path to a validated R/Python analysis script (from /replicate-study or /cross-national)
- A paper type template name: nhis_cohort, cross_national, survey_weighted
- A source paper to extract methodology from (falls back to /replicate-study Phase 1)
Combination spec: A list of exposure/outcome pairs, provided as:
- Inline list: exposures: [depression, obesity, smoking]; outcomes: [diabetes, hypertension, CVD]
- CSV file with columns: exposure, outcome, (optional) subgroup_vars
- "all" keyword: generates all pairwise combinations from the lists

Optional Inputs

Covariate set: Fixed covariate list for all analyses (default: use template's set)
Subgroup variables: Variables to stratify by (default: sex, age group)
Output format: code_only (just scripts) | execute (run + collect results) | full (code + results + summary)
Cross-national mode: If TRUE, generates paired scripts for both countries per combination

Workflow

Phase 1: Template Validation

Read the methodology template (R script or paper type reference).
Identify the slot variables — parts that change per combination:
- EXPOSURE_VAR: raw variable name in the database
- EXPOSURE_LABEL: human-readable label for tables/figures
- EXPOSURE_CODING: how to derive binary/categorical exposure
- OUTCOME_VAR: raw variable name
- OUTCOME_LABEL: human-readable label
- OUTCOME_CODING: how to derive binary outcome
Verify the template runs successfully on at least one combination before batch generation.
Output: template summary with identified slots → user approval.

Phase 2: Variable Specification

For each exposure and outcome in the combination spec:

Look up the variable in the database:
- KNHANES: check variable name exists in the CSV header
- NHANES: check which table contains the variable (use codebook.csv if available)
- NHIS: check claims code or variable name
Define coding:
- Binary: threshold or category mapping (e.g., HE_glu >= 126 → diabetes = 1)
- Categorical: level definitions (e.g., smoking: current/former/never)
Check covariate overlap: If the exposure IS one of the standard covariates, remove it from the adjustment set for that analysis (no self-adjustment).
Output: combination matrix with all variable specifications.

| # | Exposure | Exposure Coding | Outcome | Outcome Coding | Covariates (adjusted) | Notes |
|---|----------|-----------------|---------|----------------|----------------------|-------|
| 1 | Depression (PHQ≥10) | BP_PHQ sum ≥10 | Diabetes | HE_glu≥126|HbA1c≥6.5|DE1_dg=1 | age,sex,edu,income,smoking,alcohol,obesity,CVD | — |
| 2 | Obesity (BMI≥25) | HE_obe ≥4 | Diabetes | same | age,sex,edu,income,smoking,alcohol,depression,CVD | obesity removed from covariates |
| ... | | | | | | |

Phase 3: Batch Code Generation

For each combination in the matrix:

Clone the template script.
Replace slot variables with the combination-specific values.
Adjust covariates: Remove exposure variable from covariate list if present.
Set output paths: Each combination gets its own results subdirectory.
Generate a master runner script (run_all.R or run_all.sh) that:
- Executes all N scripts sequentially (or in parallel via future/parallel)
- Captures errors per script without stopping the batch
- Logs execution time per analysis

Phase 4: Batch Execution (if `execute` or `full` mode)

Run the master script.
Collect results from each combination's output directory.
Handle failures gracefully:
- Log which combinations failed and why
- Common failures: convergence issues, too few events, empty subgroups
- Suggest fixes for failed combinations

Phase 5: Summary Matrix

Aggregate all results into a single summary:

Main Results Matrix (summary_matrix.csv):

Exposure	Outcome	N	Events	Model 1 OR (95% CI)	Model 2 OR (95% CI)	Model 3 OR (95% CI)	p-value	Significant
Depression	Diabetes	5,811	487	2.14 (1.52–3.01)	1.89 (1.33–2.69)	1.36 (0.91–2.05)	0.137	No
Obesity	Diabetes	5,811	487	3.45 (2.71–4.39)	3.38 (2.65–4.32)	3.12 (2.42–4.02)	<0.001	Yes
...

Subgroup Summary (subgroup_matrix.csv): Same format, stratified by subgroup variables.

Heatmap (optional): Visual matrix of effect sizes × significance, exposure on Y-axis, outcome on X-axis.

Output Files

{working_dir}/batch_{timestamp}/
├── README.md                    — Batch run summary (N combinations, template used, date)
├── combination_matrix.csv       — All exposure/outcome specs with coding
├── template/
│   └── base_template.R          — The validated template (frozen copy)
├── scripts/
│   ├── 01_depression_diabetes.R
│   ├── 02_obesity_diabetes.R
│   ├── ...
│   └── run_all.R                — Master execution script
├── results/
│   ├── 01_depression_diabetes/
│   │   ├── table1.csv
│   │   ├── main_results.csv
│   │   └── subgroup_results.csv
│   ├── 02_obesity_diabetes/
│   │   └── ...
│   └── ...
├── summary/
│   ├── summary_matrix.csv       — Main results across all combinations
│   ├── subgroup_matrix.csv      — Subgroup results across all combinations
│   ├── failed_runs.csv          — Combinations that failed + error messages
│   └── heatmap.png              — Optional effect size × significance visual
└── logs/
    └── batch_execution.log      — Timing + error log

Critical Rules

Never modify the core methodology across combinations — only swap exposure/outcome/covariates.
Remove self-adjustment: If exposure = BMI, remove obesity from covariates. If exposure = education/income, remove the same variable from covariates. If outcome = MetS, consider removing obesity from covariates. Document all removals.
Weighted analysis mandatory for KNHANES/NHANES/NHIS — inherited from template.
Event count check: Before running, verify each outcome has ≥10 events per covariate (EPV rule). Flag underpowered combinations.
Multiple comparisons: When generating >5 combinations, include a Bonferroni-corrected significance column in the summary matrix. Add a note about exploratory vs confirmatory framing.
Reproducibility: Freeze the template version. Include a SHA256 hash of the data file in README.
No p-hacking framing: The summary matrix is for hypothesis generation, not confirmation. State this explicitly in README and any manuscript output.
Outcome definitions MUST include physician diagnosis: Diabetes = FPG≥126 OR HbA1c≥6.5 OR physician-diagnosed (KNHANES: DE1_dg=1, NHANES: DIQ010="Yes"). Hypertension = SBP≥140 OR DBP≥90 OR physician-diagnosed (KNHANES: DI1_dg=1, NHANES: BPQ020="Yes"). Lab-only definitions systematically overestimate exposure→outcome associations (validated: Joo 2026 replication showed US depression→DM wOR 1.92 without vs 1.54 with physician dx).
Full covariate set is default: Alw

batch-cohort

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

dev-browser

agent-browser

understand-chat

understand-dashboard

Recibe nuevas skills de Pesquisa e Web todos los lunes

Batch Cohort Analysis Skill

When to Use

Inputs

Optional Inputs

Workflow

Phase 1: Template Validation

Phase 2: Variable Specification

Phase 3: Batch Code Generation

Phase 4: Batch Execution (if `execute` or `full` mode)

Phase 5: Summary Matrix

Output Files

Critical Rules

Comentarios · Sin comentarios

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

dev-browser

agent-browser

understand-chat

understand-dashboard

Recibe nuevas skills de Pesquisa e Web todos los lunes

Batch Cohort Analysis Skill

When to Use

Inputs

Optional Inputs

Workflow

Phase 1: Template Validation

Phase 2: Variable Specification

Phase 3: Batch Code Generation

Phase 4: Batch Execution (if execute or full mode)

Phase 5: Summary Matrix

Output Files

Critical Rules

Comentarios · Sin comentarios

Phase 4: Batch Execution (if `execute` or `full` mode)