Calc-Sample-Size Skill
You are assisting a medical researcher with sample size and power calculations. Guide the user through test selection using the decision tree, generate reproducible code in R (primary) and Python (alternative), interpret effect sizes clinically, and produce IRB-ready justification text.
Reference Files
- Formulas:
${CLAUDE_SKILL_DIR}/references/formulas.md-- mathematical formulas, R/Python functions, effect size conventions - Existing R template: See
analyze-statsskill atreferences/templates/sample_size.Rfor the 7 original tests
Read formulas.md before generating calculation code.
Cross-Skill References
- design-study calls calc-sample-size when a sample size justification is needed during study design.
- calc-sample-size output feeds into write-protocol and write-paper (Methods section).
- Detailed formulas and references are in
${CLAUDE_SKILL_DIR}/references/formulas.md.
Decision Tree
When the user requests a sample size calculation, walk them through this tree interactively. Ask one question at a time. Do not assume answers.
What is your primary outcome?
|
+-- Binary (yes/no, positive/negative)
| |
| +-- Paired data (same subjects, two methods)?
| | +-- YES --> [5] McNemar test
| | +-- NO --> How many groups?
| | +-- 2 groups, superiority --> [4] Two-proportion comparison (chi-square)
| | +-- 2 groups, non-inferiority --> [10] Non-inferiority / equivalence
| | +-- Multivariable model --> [9] Logistic regression
| |
+-- Continuous (measurement, score)
| |
| +-- How many groups?
| +-- 2 groups --> [6] Independent t-test
| +-- 3+ groups --> [8] One-way ANOVA
|
+-- Time-to-event (survival, recurrence)
| |
| +-- Two groups, unadjusted --> [7] Log-rank test
| +-- Multivariable / adjusted HR --> [7] Log-rank (Schoenfeld) + [11] Cox EPV
|
+-- Agreement (inter-rater, reproducibility)
| |
| +-- Continuous measurements --> [2] ICC
| +-- Categorical ratings --> [3] Kappa
|
+-- Diagnostic accuracy (Se, Sp, AUC precision)
|
+--> [1] Diagnostic accuracy (precision-based)
Supported Tests
Test 1: Diagnostic Accuracy (Sensitivity/Specificity Precision)
When to use: Estimating required sample size for desired precision of sensitivity or specificity in a diagnostic accuracy study.
Required parameters (ask the user):
| Parameter | Description | Default |
|---|---|---|
sensitivity_expected | Expected sensitivity | 0.85 |
ci_half_width | Desired half-width of 95% CI | 0.05 |
prevalence | Disease prevalence in study population | 0.30 |
alpha | Significance level | 0.05 |
attrition_rate | Expected dropout/exclusion rate | 0.15 |
Effect size interpretation: The CI half-width determines precision. A half-width of 0.05 means the 95% CI for sensitivity will be within +/-5 percentage points. Narrower CIs require larger samples.
Test 2: ICC Agreement (Bonett 2002)
When to use: Inter-rater or intra-rater agreement for continuous measurements (e.g., tumor size, angle measurement).
Required parameters:
| Parameter | Description | Default |
|---|---|---|
icc_expected | Expected ICC value | 0.75 |
icc_null | Null hypothesis ICC (lower bound) | 0.50 |
n_raters | Number of raters | 2 |
alpha | Significance level | 0.05 |
power | Desired power | 0.80 |
attrition_rate | Expected dropout rate | 0.10 |
Effect size interpretation: ICC < 0.50 = poor, 0.50-0.75 = moderate, 0.75-0.90 = good, > 0.90 = excellent (Koo & Li, 2016).
Test 3: Kappa Agreement (Donner & Eliasziw 1992)
When to use: Inter-rater agreement for categorical ratings (e.g., BI-RADS category, lesion present/absent).
Required parameters:
| Parameter | Description | Default |
|---|---|---|
kappa_expected | Expected kappa value | 0.70 |
kappa_null | Null hypothesis kappa | 0.40 |
po_expected | Expected proportion of agreement | 0.75 |
alpha | Significance level | 0.05 |
power | Desired power | 0.80 |
attrition_rate | Expected dropout rate | 0.10 |
Effect size interpretation: Kappa < 0.20 = slight, 0.21-0.40 = fair, 0.41-0.60 = moderate, 0.61-0.80 = substantial, 0.81-1.00 = almost perfect (Landis & Koch, 1977).
Test 4: Two-Proportion Comparison (Chi-Square)
When to use: Comparing proportions between two independent groups (e.g., AI detection rate vs. conventional detection rate).
Required parameters:
| Parameter | Description | Default |
|---|---|---|
p1 | Proportion in group 1 | -- |
p2 | Proportion in group 2 | -- |
alpha | Significance level | 0.05 |
power | Desired power | 0.80 |
attrition_rate | Expected dropout rate | 0.15 |
Effect size interpretation: Cohen's h = 2 * arcsin(sqrt(p1)) - 2 * arcsin(sqrt(p2)). Small = 0.20, medium = 0.50, large = 0.80.
Test 5: McNemar Test (Paired Proportions)
When to use: Paired binary outcomes (e.g., two readers reading same cases, before/after on same patients).
Required parameters:
| Parameter | Description | Default |
|---|---|---|
p01 | P(Method A negative, Method B positive) | -- |
p10 | P(Method A positive, Method B negative) | -- |
alpha | Significance level | 0.05 |
power | Desired power | 0.80 |
attrition_rate | Expected dropout rate | 0.10 |
Effect size interpretation: The ratio p10/p01 (discordant ratio) drives the required sample size. Larger asymmetry in discordant pairs means fewer subjects needed. Only discordant pairs contribute information.
Test 6: Independent t-Test
When to use: Comparing means between two independent groups (e.g., lesion size in malignant vs. benign).
Required parameters:
| Parameter | Description | Default |
|---|---|---|
mean_diff | Expected mean difference | -- |
pooled_sd | Pooled standard deviation (from literature/pilot) | -- |
alpha | Significance level | 0.05 |
power | Desired power | 0.80 |
attrition_rate | Expected dropout rate | 0.15 |
Effect size interpretation: Cohen's d = mean_diff / pooled_sd. Small = 0.20, medium = 0.50, large = 0.80. In clinical terms, d = 0.50 means the groups differ by half a standard deviation.
Test 7: Survival / Log-Rank Test (Schoenfeld 1981)
When to use: Comparing survival or time-to-event between two groups (e.g., treatment vs. control, RFA vs. surgery).
Required parameters:
| Parameter | Description | Default |
|---|---|---|
hr | Expected hazard ratio | -- |
median_ctrl | Median survival in control arm (months) | -- |
accrual_time | Accrual period (months) | 12 |
follow_up | Follow-up after accrual (months) | 24 |
drop_rate | Annual dropout rate | 0.05 |
alpha | Significance level | 0.05 |
power | Desired power | 0.80 |
Effect size interpretation: HR < 1 favors treatment. HR = 0.50 means treatment halves the hazard (strong effect). HR = 0.80 is a modest 20% reduction. The Schoenfeld formula calculates required number of events, then inflates for expected event probability and dropout.
Test 8: One-Way ANOVA (NEW)
When to use: Comparing means across 3 or more independent groups (e.g., comparing AI model performance across 3 architectures, comparing measurement accuracy across multiple readers).
Required parameters:
| Parameter | Description | Default |
|---|---|---|
k | Number of groups | -- |
f | Cohen's f effect size | -- |
alpha | Significance level | 0.05 |
power | Desired power | 0.80 |
attrition_rate | Expected dropout rate | 0.15 |
Help user estimate Cohen's f:
- If the user knows group means and pooled SD: f = sigma_means / pooled_SD
- If the user knows et