Statistical Analysis
Overview
Statistical analysis is the systematic process of selecting appropriate tests, verifying assumptions, quantifying effect magnitudes, and reporting results. This knowhow guides test selection, assumption diagnostics, and APA-style reporting for frequentist and Bayesian analyses in academic research.
Key Concepts
Frequentist vs Bayesian Framework
| Aspect | Frequentist | Bayesian |
|---|---|---|
| Core output | p-value, confidence interval | Posterior distribution, credible interval |
| Interpretation | "How likely is this data if H0 is true?" | "How likely is H1 given the data?" |
| Null support | Cannot support H0 (only fail to reject) | Can quantify evidence for H0 via Bayes Factor |
| Prior info | Not used | Incorporated via prior distributions |
| Sample size | Requires adequate power | Works with any sample size |
| Best for | Standard analyses, large samples | Small samples, prior info, complex models |
Statistical vs Practical Significance
A statistically significant result (p < .05) may be trivially small in practice. Always report:
- Effect size: Magnitude of the effect (Cohen's d, eta-squared, r, R-squared)
- Confidence interval: Precision of the estimate
- Context: Clinical/practical relevance in the domain
Common Effect Sizes
| Test | Effect Size | Small | Medium | Large |
|---|---|---|---|---|
| t-test | Cohen's d | 0.20 | 0.50 | 0.80 |
| t-test (small n) | Hedges' g | 0.20 | 0.50 | 0.80 |
| ANOVA | eta-squared partial | 0.01 | 0.06 | 0.14 |
| ANOVA | omega-squared | 0.01 | 0.06 | 0.14 |
| Correlation | r | 0.10 | 0.30 | 0.50 |
| Regression | R-squared | 0.02 | 0.13 | 0.26 |
| Regression | f-squared | 0.02 | 0.15 | 0.35 |
| Chi-square | Cramer's V | 0.07 | 0.21 | 0.35 |
| Chi-square 2x2 | phi coefficient | 0.10 | 0.30 | 0.50 |
Cohen's benchmarks are guidelines, not rigid thresholds -- domain context always matters.
Assumptions Overview
Most parametric tests require:
- Independence: Observations are independent of each other
- Normality: Data (or residuals) are approximately normally distributed
- Homogeneity of variance: Groups have similar variances (for group comparisons)
- Linearity: Relationship between variables is linear (for regression)
When assumptions are violated:
- Normality violated, n > 30: Proceed -- parametric tests are robust with large samples
- Normality violated, n < 30: Use non-parametric alternative
- Variance heterogeneity: Use Welch's correction (t-test) or Welch's ANOVA
- Linearity violated: Add polynomial terms, transform variables, or use GAMs
Test-Specific Assumption Workflows
T-test assumptions: (1) Check normality per group with Shapiro-Wilk + Q-Q plots. (2) Check homogeneity with Levene's test. (3) If normality violated: Mann-Whitney U (independent) or Wilcoxon signed-rank (paired). If variance heterogeneity: use Welch's t-test.
ANOVA assumptions: (1) Normality per group. (2) Homogeneity via Levene's test. (3) For repeated measures: check sphericity (Mauchly's test); if violated, apply Greenhouse-Geisser (epsilon < 0.75) or Huynh-Feldt (epsilon > 0.75) correction. (4) If normality violated: Kruskal-Wallis (independent) or Friedman (repeated).
Linear regression assumptions: (1) Linearity via residuals-vs-fitted plot. (2) Independence via Durbin-Watson test (1.5-2.5 acceptable). (3) Homoscedasticity via Breusch-Pagan test + scale-location plot. (4) Normality of residuals via Q-Q plot + Shapiro-Wilk. (5) Multicollinearity via VIF (>10 = severe, >5 = moderate).
Logistic regression assumptions: (1) Independence. (2) Linearity of log-odds with continuous predictors (Box-Tidwell test). (3) No perfect multicollinearity (VIF). (4) Adequate sample size (10-20 events per predictor minimum).
Specialized Test Categories
Beyond the main decision flowchart, several specialized test families address specific data types:
Survival / time-to-event analysis:
- Log-rank test: Compares survival curves between groups (non-parametric)
- Cox proportional hazards: Models time-to-event with covariates; assumes proportional hazards
- Parametric survival models: Weibull, exponential, log-normal for known distributional forms
- Use when outcome is time until an event (death, relapse, failure) with possible censoring
Count outcome models:
- Poisson regression: For count data where mean approximately equals variance
- Negative binomial regression: For overdispersed counts (variance > mean)
- Zero-inflated models: For excess zeros beyond what Poisson/NB predicts
- Use when outcome is a count (number of events, incidents, occurrences)
Agreement and reliability:
- Cohen's kappa: Inter-rater agreement for categorical ratings (2 raters)
- Fleiss' kappa / Krippendorff's alpha: Agreement for >2 raters
- Intraclass correlation coefficient (ICC): Continuous ratings reliability
- Cronbach's alpha: Internal consistency of multi-item scales
- Bland-Altman analysis: Agreement between two measurement methods (continuous)
- Use when assessing measurement reliability or inter-rater consistency
Categorical data extensions:
- McNemar's test: Paired binary outcomes (2x2)
- Cochran's Q test: Paired binary outcomes (3+ conditions)
- Cochran-Armitage trend test: Ordered categories in contingency tables
Decision Framework
Test Selection Flowchart
What is your research question?
|
+-- Comparing GROUPS on a continuous outcome?
| |
| +-- How many groups?
| | +-- 2 groups
| | | +-- Independent -> Independent t-test (or Mann-Whitney U)
| | | +-- Paired/repeated -> Paired t-test (or Wilcoxon signed-rank)
| | +-- 3+ groups
| | +-- Independent -> One-way ANOVA (or Kruskal-Wallis)
| | +-- Repeated -> Repeated-measures ANOVA (or Friedman)
| |
| +-- Multiple factors? -> Factorial ANOVA / Mixed ANOVA
| +-- With covariates? -> ANCOVA
|
+-- Testing a RELATIONSHIP between variables?
| |
| +-- Both continuous?
| | +-- Normal -> Pearson correlation
| | +-- Non-normal or ordinal -> Spearman correlation
| |
| +-- Predicting continuous outcome?
| | +-- 1 predictor -> Simple linear regression
| | +-- Multiple predictors -> Multiple linear regression
| |
| +-- Predicting categorical outcome?
| | +-- Binary -> Logistic regression
| | +-- Ordinal -> Ordinal logistic regression
| |
| +-- Predicting count outcome?
| | +-- Equidispersed -> Poisson regression
| | +-- Overdispersed -> Negative binomial regression
| | +-- Excess zeros -> Zero-inflated Poisson/NB
| |
| +-- Time-to-event outcome?
| +-- Compare survival curves -> Log-rank test
| +-- With covariates -> Cox proportional hazards
|
+-- Testing ASSOCIATION between categorical variables?
| +-- Expected cell count >= 5 -> Chi-square test
| +-- Expected cell count < 5 -> Fisher's exact test
| +-- Ordered categories -> Cochran-Armitage trend test
| +-- Paired categories -> McNemar's test
|
+-- Assessing AGREEMENT / RELIABILITY?
+-- Categorical, 2 raters -> Cohen's kappa
+-- Categorical, >2 raters -> Fleiss' kappa
+-- Continuous ratings -> ICC
+-- Two measurement methods -> Bland-Altman analysis
+-- Internal consistency -> Cronbach's alpha
Quick Reference Table
| Research Question | Data Type | Normal? | Test | Non-parametric Alternative |
|---|---|---|---|---|
| 2 independent groups | Continuous | Yes | Independent t-test | Mann-Whitney U |
| 2 paired groups | Continuous | Yes | Paired t-test | Wilcoxon signed-rank |
| 3+ independent groups | Continuous | Yes | One-way ANOVA | Kruskal-Wallis |
| 3+ repeated groups | Continuous | Yes | Repeated-measures ANOVA | Friedman test |
| 2 variables | Continuous | Yes | Pearson r | Spearman rho |
| Predict continu |