Statistical Analysis

Overview

Statistical analysis is the systematic process of selecting appropriate tests, verifying assumptions, quantifying effect magnitudes, and reporting results. This knowhow guides test selection, assumption diagnostics, and APA-style reporting for frequentist and Bayesian analyses in academic research.

Key Concepts

Frequentist vs Bayesian Framework

Aspect	Frequentist	Bayesian
Core output	p-value, confidence interval	Posterior distribution, credible interval
Interpretation	"How likely is this data if H0 is true?"	"How likely is H1 given the data?"
Null support	Cannot support H0 (only fail to reject)	Can quantify evidence for H0 via Bayes Factor
Prior info	Not used	Incorporated via prior distributions
Sample size	Requires adequate power	Works with any sample size
Best for	Standard analyses, large samples	Small samples, prior info, complex models

Statistical vs Practical Significance

A statistically significant result (p < .05) may be trivially small in practice. Always report:

Effect size: Magnitude of the effect (Cohen's d, eta-squared, r, R-squared)
Confidence interval: Precision of the estimate
Context: Clinical/practical relevance in the domain

Common Effect Sizes

Test	Effect Size	Small	Medium	Large
t-test	Cohen's d	0.20	0.50	0.80
t-test (small n)	Hedges' g	0.20	0.50	0.80
ANOVA	eta-squared partial	0.01	0.06	0.14
ANOVA	omega-squared	0.01	0.06	0.14
Correlation	r	0.10	0.30	0.50
Regression	R-squared	0.02	0.13	0.26
Regression	f-squared	0.02	0.15	0.35
Chi-square	Cramer's V	0.07	0.21	0.35
Chi-square 2x2	phi coefficient	0.10	0.30	0.50

Cohen's benchmarks are guidelines, not rigid thresholds -- domain context always matters.

Assumptions Overview

Most parametric tests require:

Independence: Observations are independent of each other
Normality: Data (or residuals) are approximately normally distributed
Homogeneity of variance: Groups have similar variances (for group comparisons)
Linearity: Relationship between variables is linear (for regression)

When assumptions are violated:

Normality violated, n > 30: Proceed -- parametric tests are robust with large samples
Normality violated, n < 30: Use non-parametric alternative
Variance heterogeneity: Use Welch's correction (t-test) or Welch's ANOVA
Linearity violated: Add polynomial terms, transform variables, or use GAMs

Test-Specific Assumption Workflows

T-test assumptions: (1) Check normality per group with Shapiro-Wilk + Q-Q plots. (2) Check homogeneity with Levene's test. (3) If normality violated: Mann-Whitney U (independent) or Wilcoxon signed-rank (paired). If variance heterogeneity: use Welch's t-test.

ANOVA assumptions: (1) Normality per group. (2) Homogeneity via Levene's test. (3) For repeated measures: check sphericity (Mauchly's test); if violated, apply Greenhouse-Geisser (epsilon < 0.75) or Huynh-Feldt (epsilon > 0.75) correction. (4) If normality violated: Kruskal-Wallis (independent) or Friedman (repeated).

Linear regression assumptions: (1) Linearity via residuals-vs-fitted plot. (2) Independence via Durbin-Watson test (1.5-2.5 acceptable). (3) Homoscedasticity via Breusch-Pagan test + scale-location plot. (4) Normality of residuals via Q-Q plot + Shapiro-Wilk. (5) Multicollinearity via VIF (>10 = severe, >5 = moderate).

Logistic regression assumptions: (1) Independence. (2) Linearity of log-odds with continuous predictors (Box-Tidwell test). (3) No perfect multicollinearity (VIF). (4) Adequate sample size (10-20 events per predictor minimum).

Specialized Test Categories

Beyond the main decision flowchart, several specialized test families address specific data types:

Survival / time-to-event analysis:

Log-rank test: Compares survival curves between groups (non-parametric)
Cox proportional hazards: Models time-to-event with covariates; assumes proportional hazards
Parametric survival models: Weibull, exponential, log-normal for known distributional forms
Use when outcome is time until an event (death, relapse, failure) with possible censoring

Count outcome models:

Poisson regression: For count data where mean approximately equals variance
Negative binomial regression: For overdispersed counts (variance > mean)
Zero-inflated models: For excess zeros beyond what Poisson/NB predicts
Use when outcome is a count (number of events, incidents, occurrences)

Agreement and reliability:

Cohen's kappa: Inter-rater agreement for categorical ratings (2 raters)
Fleiss' kappa / Krippendorff's alpha: Agreement for >2 raters
Intraclass correlation coefficient (ICC): Continuous ratings reliability
Cronbach's alpha: Internal consistency of multi-item scales
Bland-Altman analysis: Agreement between two measurement methods (continuous)
Use when assessing measurement reliability or inter-rater consistency

Categorical data extensions:

McNemar's test: Paired binary outcomes (2x2)
Cochran's Q test: Paired binary outcomes (3+ conditions)
Cochran-Armitage trend test: Ordered categories in contingency tables

Decision Framework

Test Selection Flowchart

What is your research question?
|
+-- Comparing GROUPS on a continuous outcome?
|   |
|   +-- How many groups?
|   |   +-- 2 groups
|   |   |   +-- Independent -> Independent t-test (or Mann-Whitney U)
|   |   |   +-- Paired/repeated -> Paired t-test (or Wilcoxon signed-rank)
|   |   +-- 3+ groups
|   |      +-- Independent -> One-way ANOVA (or Kruskal-Wallis)
|   |      +-- Repeated -> Repeated-measures ANOVA (or Friedman)
|   |
|   +-- Multiple factors? -> Factorial ANOVA / Mixed ANOVA
|   +-- With covariates? -> ANCOVA
|
+-- Testing a RELATIONSHIP between variables?
|   |
|   +-- Both continuous?
|   |   +-- Normal -> Pearson correlation
|   |   +-- Non-normal or ordinal -> Spearman correlation
|   |
|   +-- Predicting continuous outcome?
|   |   +-- 1 predictor -> Simple linear regression
|   |   +-- Multiple predictors -> Multiple linear regression
|   |
|   +-- Predicting categorical outcome?
|   |   +-- Binary -> Logistic regression
|   |   +-- Ordinal -> Ordinal logistic regression
|   |
|   +-- Predicting count outcome?
|   |   +-- Equidispersed -> Poisson regression
|   |   +-- Overdispersed -> Negative binomial regression
|   |   +-- Excess zeros -> Zero-inflated Poisson/NB
|   |
|   +-- Time-to-event outcome?
|       +-- Compare survival curves -> Log-rank test
|       +-- With covariates -> Cox proportional hazards
|
+-- Testing ASSOCIATION between categorical variables?
|   +-- Expected cell count >= 5 -> Chi-square test
|   +-- Expected cell count < 5 -> Fisher's exact test
|   +-- Ordered categories -> Cochran-Armitage trend test
|   +-- Paired categories -> McNemar's test
|
+-- Assessing AGREEMENT / RELIABILITY?
    +-- Categorical, 2 raters -> Cohen's kappa
    +-- Categorical, >2 raters -> Fleiss' kappa
    +-- Continuous ratings -> ICC
    +-- Two measurement methods -> Bland-Altman analysis
    +-- Internal consistency -> Cronbach's alpha

Quick Reference Table

Research Question	Data Type	Normal?	Test	Non-parametric Alternative
2 independent groups	Continuous	Yes	Independent t-test	Mann-Whitney U
2 paired groups	Continuous	Yes	Paired t-test	Wilcoxon signed-rank
3+ independent groups	Continuous	Yes	One-way ANOVA	Kruskal-Wallis
3+ repeated groups	Continuous	Yes	Repeated-measures ANOVA	Friedman test
2 variables	Continuous	Yes	Pearson r	Spearman rho
Predict continu

statistical-analysis

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

claude-api

skill-creator

oh-my-issues

claude-mem

Recibe nuevas skills de Desenvolvimento todos los lunes