Calc-Sample-Size Skill

You are assisting a medical researcher with sample size and power calculations. Guide the user through test selection using the decision tree, generate reproducible code in R (primary) and Python (alternative), interpret effect sizes clinically, and produce IRB-ready justification text.

Reference Files

Formulas: ${CLAUDE_SKILL_DIR}/references/formulas.md -- mathematical formulas, R/Python functions, effect size conventions
Existing R template: See analyze-stats skill at references/templates/sample_size.R for the 7 original tests

Read formulas.md before generating calculation code.

Cross-Skill References

design-study calls calc-sample-size when a sample size justification is needed during study design.
calc-sample-size output feeds into write-protocol and write-paper (Methods section).
Detailed formulas and references are in ${CLAUDE_SKILL_DIR}/references/formulas.md.

Decision Tree

When the user requests a sample size calculation, walk them through this tree interactively. Ask one question at a time. Do not assume answers.

What is your primary outcome?
|
+-- Binary (yes/no, positive/negative)
|   |
|   +-- Paired data (same subjects, two methods)?
|   |   +-- YES --> [5] McNemar test
|   |   +-- NO  --> How many groups?
|   |       +-- 2 groups, superiority     --> [4] Two-proportion comparison (chi-square)
|   |       +-- 2 groups, non-inferiority --> [10] Non-inferiority / equivalence
|   |       +-- Multivariable model       --> [9] Logistic regression
|   |
+-- Continuous (measurement, score)
|   |
|   +-- How many groups?
|       +-- 2 groups  --> [6] Independent t-test
|       +-- 3+ groups --> [8] One-way ANOVA
|
+-- Time-to-event (survival, recurrence)
|   |
|   +-- Two groups, unadjusted      --> [7] Log-rank test
|   +-- Multivariable / adjusted HR  --> [7] Log-rank (Schoenfeld) + [11] Cox EPV
|
+-- Agreement (inter-rater, reproducibility)
|   |
|   +-- Continuous measurements --> [2] ICC
|   +-- Categorical ratings     --> [3] Kappa
|
+-- Diagnostic accuracy (Se, Sp, AUC precision)
    |
    +--> [1] Diagnostic accuracy (precision-based)

Supported Tests

Test 1: Diagnostic Accuracy (Sensitivity/Specificity Precision)

When to use: Estimating required sample size for desired precision of sensitivity or specificity in a diagnostic accuracy study.

Required parameters (ask the user):

Parameter	Description	Default
`sensitivity_expected`	Expected sensitivity	0.85
`ci_half_width`	Desired half-width of 95% CI	0.05
`prevalence`	Disease prevalence in study population	0.30
`alpha`	Significance level	0.05
`attrition_rate`	Expected dropout/exclusion rate	0.15

Effect size interpretation: The CI half-width determines precision. A half-width of 0.05 means the 95% CI for sensitivity will be within +/-5 percentage points. Narrower CIs require larger samples.

Test 2: ICC Agreement (Bonett 2002)

When to use: Inter-rater or intra-rater agreement for continuous measurements (e.g., tumor size, angle measurement).

Required parameters:

Parameter	Description	Default
`icc_expected`	Expected ICC value	0.75
`icc_null`	Null hypothesis ICC (lower bound)	0.50
`n_raters`	Number of raters	2
`alpha`	Significance level	0.05
`power`	Desired power	0.80
`attrition_rate`	Expected dropout rate	0.10

Effect size interpretation: ICC < 0.50 = poor, 0.50-0.75 = moderate, 0.75-0.90 = good, > 0.90 = excellent (Koo & Li, 2016).

Test 3: Kappa Agreement (Donner & Eliasziw 1992)

When to use: Inter-rater agreement for categorical ratings (e.g., BI-RADS category, lesion present/absent).

Required parameters:

Parameter	Description	Default
`kappa_expected`	Expected kappa value	0.70
`kappa_null`	Null hypothesis kappa	0.40
`po_expected`	Expected proportion of agreement	0.75
`alpha`	Significance level	0.05
`power`	Desired power	0.80
`attrition_rate`	Expected dropout rate	0.10

Effect size interpretation: Kappa < 0.20 = slight, 0.21-0.40 = fair, 0.41-0.60 = moderate, 0.61-0.80 = substantial, 0.81-1.00 = almost perfect (Landis & Koch, 1977).

Test 4: Two-Proportion Comparison (Chi-Square)

When to use: Comparing proportions between two independent groups (e.g., AI detection rate vs. conventional detection rate).

Required parameters:

Parameter	Description	Default
`p1`	Proportion in group 1	--
`p2`	Proportion in group 2	--
`alpha`	Significance level	0.05
`power`	Desired power	0.80
`attrition_rate`	Expected dropout rate	0.15

Effect size interpretation: Cohen's h = 2 * arcsin(sqrt(p1)) - 2 * arcsin(sqrt(p2)). Small = 0.20, medium = 0.50, large = 0.80.

Test 5: McNemar Test (Paired Proportions)

When to use: Paired binary outcomes (e.g., two readers reading same cases, before/after on same patients).

Required parameters:

Parameter	Description	Default
`p01`	P(Method A negative, Method B positive)	--
`p10`	P(Method A positive, Method B negative)	--
`alpha`	Significance level	0.05
`power`	Desired power	0.80
`attrition_rate`	Expected dropout rate	0.10

Effect size interpretation: The ratio p10/p01 (discordant ratio) drives the required sample size. Larger asymmetry in discordant pairs means fewer subjects needed. Only discordant pairs contribute information.

Test 6: Independent t-Test

When to use: Comparing means between two independent groups (e.g., lesion size in malignant vs. benign).

Required parameters:

Parameter	Description	Default
`mean_diff`	Expected mean difference	--
`pooled_sd`	Pooled standard deviation (from literature/pilot)	--
`alpha`	Significance level	0.05
`power`	Desired power	0.80
`attrition_rate`	Expected dropout rate	0.15

Effect size interpretation: Cohen's d = mean_diff / pooled_sd. Small = 0.20, medium = 0.50, large = 0.80. In clinical terms, d = 0.50 means the groups differ by half a standard deviation.

Test 7: Survival / Log-Rank Test (Schoenfeld 1981)

When to use: Comparing survival or time-to-event between two groups (e.g., treatment vs. control, RFA vs. surgery).

Required parameters:

Parameter	Description	Default
`hr`	Expected hazard ratio	--
`median_ctrl`	Median survival in control arm (months)	--
`accrual_time`	Accrual period (months)	12
`follow_up`	Follow-up after accrual (months)	24
`drop_rate`	Annual dropout rate	0.05
`alpha`	Significance level	0.05
`power`	Desired power	0.80

Effect size interpretation: HR < 1 favors treatment. HR = 0.50 means treatment halves the hazard (strong effect). HR = 0.80 is a modest 20% reduction. The Schoenfeld formula calculates required number of events, then inflates for expected event probability and dropout.

Test 8: One-Way ANOVA (NEW)

When to use: Comparing means across 3 or more independent groups (e.g., comparing AI model performance across 3 architectures, comparing measurement accuracy across multiple readers).

Required parameters:

Parameter	Description	Default
`k`	Number of groups	--
`f`	Cohen's f effect size	--
`alpha`	Significance level	0.05
`power`	Desired power	0.80
`attrition_rate`	Expected dropout rate	0.15

Help user estimate Cohen's f:

If the user knows group means and pooled SD: f = sigma_means / pooled_SD
If the user knows et

calc-sample-size

How to add

Drop this on your repo README

Related skills

webapp-testing

brand-guidelines

frontend-design

web-artifacts-builder

Get new Design e Frontend skills every Monday

Calc-Sample-Size Skill

Reference Files

Cross-Skill References

Decision Tree

Supported Tests

Test 1: Diagnostic Accuracy (Sensitivity/Specificity Precision)

Test 2: ICC Agreement (Bonett 2002)

Test 3: Kappa Agreement (Donner & Eliasziw 1992)

Test 4: Two-Proportion Comparison (Chi-Square)

Test 5: McNemar Test (Paired Proportions)

Test 6: Independent t-Test

Test 7: Survival / Log-Rank Test (Schoenfeld 1981)

Test 8: One-Way ANOVA (NEW)

Comments · No comments