Causal Hypothesis Architect
Instructions
1. The Identification Challenge
- Verify FPCI Resolution: Confirm that random assignment (or the identification strategy) solves the Fundamental Problem of Causal Inference for this design (Druckman 2022).
- Four Prerequisites for Experiments: Before proceeding, verify that the design meets the four prerequisites for causal inference from experiments: (1) random assignment to conditions, (2) exclusion restriction (the only difference between conditions is the treatment itself), (3) SUTVA (Stable Unit Treatment Value Assumption -- one subject's treatment does not affect another's outcome), and (4) noninterference between units (Druckman 2022).
- Define the Data Generating Process (DGP): Before drafting the hypothesis, describe the set of rules that governs how the data is created. What are the underlying mechanics of the world being studied?
- Map the Causal Diagram: Where appropriate, draw a DAG. Identify backdoor paths and confirm whether randomization closes them (Mutz 2011).
- Close the Backdoors: State which variables must be controlled for to isolate the treatment effect. If using an experiment, explain how random assignment closes these paths (Mutz 2011).
- SATE vs. PATE: Distinguish between the Sample Average Treatment Effect (SATE) and the Population Average Treatment Effect (PATE). A convenience-sample experiment estimates a SATE; a population-based experiment on a representative sample estimates the PATE directly by design, without requiring statistical modeling or extrapolation (Mutz 2011; Druckman 2022).
2. Hypothesis Formulation
- Popperian Falsifiability: Frame the hypothesis as a "basic statement", or a specific observation that, if found to be false, would invalidate the theory.
- The Counterfactual Logic: Every hypothesis must specify a comparison. Define the "untreated" world. If the hypothesis is that X causes Y, what is the specific state of the world where X is absent? Note that in many survey experiments there may be no "pure control" -- each condition provides information, just different information (Druckman 2022). Distinguish between active control groups (which receive different information on the same topic, controlling for the act of receiving information) and passive control groups (no information), as this choice defines the counterfactual and thus the estimand (Stantcheva 2023).
- Directional Clarity: Avoid "existence" claims (e.g., "there is an effect"). Use "ordinal" claims that specify the direction (higher/lower) and, where possible, the expected functional form.
- Beat Credible Competitors: The goal of a hypothesis test is not merely to reject the null of "no effect" but to beat credible alternative explanations. Design experiments that adjudicate between competing theories -- "the point of the experiment is explication, not demonstration" (Sniderman 2018). A hypothesis is stronger when it specifies which competing theoretical account would be undermined by the predicted result.
- Null-by-Design Thinking: If the theory predicts no effect below a threshold of treatment intensity, specify (a) the intensity threshold, (b) expert-panel review of treatment strength before fielding, and (c) the equivalence bounds for the statistical test. Route such designs to the equivalence test in §3 (Sniderman 2018).
- Estimand Specification: Every hypothesis must map to a specific estimand -- the statistical quantity that, if estimated, would test the hypothesis. State the theoretical estimand (the target quantity, defined outside any statistical model) before choosing the empirical estimand (a function of observable data) and the estimation strategy; each step requires different assumptions and should be argued separately (Lundberg, Johnson, & Stewart 2021). For experimental designs, this typically means specifying: (a) the treatment contrast (what is compared to what), (b) the outcome metric (probability, scale score, etc.), and (c) the model that produces the estimate (e.g., AMCE from a conjoint, ATE from a vignette experiment). A hypothesis without a named estimand is not pre-registrable. Where feasible, declare the design formally using the MIDA framework -- model, inquiry, data strategy, answer strategy -- so that power, bias, and estimator--estimand coherence can be diagnosed computationally before fielding (Blair, Cooper, Coppock, & Humphreys 2019). For information/pedagogical experiments, distinguish between first-stage estimands (the belief or knowledge the treatment shifts) and second-stage estimands (the policy views influenced by those beliefs), with the causal chain explicit (Stantcheva 2023).
- Information Equivalence: In survey experiments, the exclusion restriction manifests as "information equivalence" -- the assumption that a manipulation only affects the intended construct and not background beliefs. If a treatment shifts respondents' perceptions of multiple constructs simultaneously, the estimand becomes ambiguous. Name the information equivalence assumption for each treatment contrast and discuss what would violate it (Stantcheva 2023).
- SESOI Requirement: For every hypothesis test, state the Smallest Effect Size of Interest (SESOI) -- the smallest effect that would be theoretically or practically meaningful. Justify the SESOI based on (a) theoretical predictions, (b) practical significance thresholds, or (c) benchmarks from the literature. A hypothesis without a SESOI cannot be rigorously evaluated (Lakens 2025).
- Disconfirming Evidence: Beyond falsifiability in the abstract, specify concretely what pattern of results would constitute evidence against the hypothesis. For illustration, in a group-threat paradigm: "If the interaction coefficients are jointly significant and indicate that procedural effects vanish when group threat is activated, this would favor group-centric accounts over the normative baseline model."
- Three-Level Specification: Specify each hypothesis at three levels (Lakens 2025): (1) conceptual (the theoretical claim in plain language), (2) operationalized (the specific measures and contrasts), and (3) statistical (the exact test, estimand, and decision rule). The pre-analysis plan should bridge all three levels.
3. Hypothesis Testing Logic
- Choose the Test Type: Select among NHST, interval, equivalence, and minimum-effect tests based on the theoretical claim (Lakens 2025). Not every hypothesis calls for NHST.
- Equivalence Testing for Null Predictions: When a hypothesis predicts "no meaningful effect," use the TOST (Two One-Sided Tests) procedure rather than interpreting a non-significant p-value as evidence of absence. Specify the equivalence bounds in raw effect size units and justify them. The R
TOSTERpackage implements this procedure (Lakens 2025). - The Four-Outcome Grid: When combining NHST with equivalence testing, state in advance which of the four outcomes (inconclusive, effect present, effect absent, trivially small) would corroborate, falsify, or leave the hypothesis inconclusive (Lakens 2025).
- Severity as Evaluation Standard: Ensure the preregistered analysis has high power and the prediction is specific enough to be wrong in multiple ways (Lakens 2025).
- Compromise Power for Fixed N: When sample size is constrained by resources or population size, use a compromise power analysis that minimizes the combined Type I + Type II error rate. An alpha > 0.05 may be defensible if it reduces total error (Lakens 2025).
4. Scope and Generalization
- Defining the Target Population: A hypothesis is not universal. Explicitly name the population for whom the theory should hold. Distinguish between the target population (who the theory applies to) and the accessible population (who can be sampled). If using a convenience sample, acknowledge that the estimand is a SATE, and specify what assumptions would