Pre-Analysis Plan Writer
Standards anchor. This skill operationalizes the pre-data-collection component of DA-RT (Data Access and Research Transparency; see Druckman-Green 2021 ch. 18 §18.1.3 for the canonical political-science framing and the APSA Guide to Professional Ethics). DA-RT obligates researchers to facilitate evaluation of evidence-based claims through data access, production transparency, and analytic transparency; a PAP is how analytic transparency is established before the data arrive. Reporting downstream of the PAP is the domain of the methods-reporting skill (APSA Experimental Section guidelines via Gerber et al. 2014; JARS).
Instructions
1. Registry Selection
- OSF Registries (Open Science Framework): Use for maximum flexibility. Supports free-form documents, file attachments (analysis code, stimuli), version control, and optional embargo periods. Registration is timestamped and immutable once confirmed. Best for complex designs that require supplementary materials. Also the default destination for political-science PAPs since the EGAP registry closed: OSF offers an "EGAP Registration" form template that mirrors the old EGAP fields.
- EGAP (Evidence in Governance and Politics) -- CLOSED: EGAP stopped accepting new registrations on October 15, 2023. Existing EGAP registrations remain searchable through OSF. Researchers should now submit new registrations to OSF Registries (using the EGAP form template) or, for randomized experiments, to the AEA RCT Registry. Do not direct a user to "register at EGAP" -- the registry is closed.
- AEA RCT Registry (American Economic Association): Use for randomized controlled trials, particularly in economics, development, and governance. The form is tightly structured around RCT fields (intervention, outcomes, randomization unit, power) and is the EGAP-successor destination endorsed alongside OSF for field experiments.
- AsPredicted: Use for simple designs requiring fast registration. The structured 9-question format enforces brevity and is completable in under an hour. Registrations are private until the authors choose to make them public. Best for straightforward experiments with few analytical degrees of freedom. Requires an academic email for access.
- Registered Reports: A distinct format where the journal peer-reviews the introduction and methods before data collection. In-principle acceptance is contingent on design quality, not results; final acceptance is contingent on the authors following through on the registered methods. Pursue registered reports when the research question is important but the expected results are uncertain or likely null -- this eliminates publication bias by design. Registered reports require substantially more lead time than standard pre-registration.
2. PAP Document Structure
- Recommended Section Order: Organize the PAP into: (1) study information (title, authors, timeline, registry), (2) theoretical motivation and hypotheses, (3) research design (experimental conditions, randomization, sample), (4) sampling and recruitment, (5) variable definitions and measurement, (6) analysis plan (models, tests, decision rules), and (7) contingency plans. This order mirrors the research process and makes the document navigable for reviewers.
- Cross-Reference, Don't Repeat: For hypothesis specification, reference the three-level specification framework (conceptual, operationalized, statistical) from the hypothesis-building skill. For reporting elements, reference the JARS six elements from the methods-reporting skill. For experiment-type-specific design fields, reference the conjoint-design or survey-design skills (e.g., attribute architecture, wording protocols, attention checks). The PAP should implement these frameworks, not redefine them.
- Write for a Reader: The PAP is a communication document, not a private notebook. Every analytical decision must be (a) decidable from the PAP alone, (b) expressed in formal notation or code where possible, and (c) unambiguous to a reader without prior knowledge of the authors' local practices. Avoid shorthand, undefined acronyms, and references to "the usual approach."
- Version Control: Use the registry's built-in versioning. If amendments are needed after initial registration, create a new version rather than editing the original. Each version is timestamped, preserving the audit trail.
3. Specifying the Analytical Strategy
- Three-Tier Classification: Classify every planned analysis as locked (primary hypothesis tests -- cannot be changed), conditional (executed only if a pre-specified condition is met, e.g., "if the manipulation check passes, estimate the interaction model"), or exploratory (clearly labeled hypothesis-generating analyses with uncontrolled error rates). This mirrors the JARS primary/secondary/exploratory distinction (Lakens 2025 §13.4) and the confirmatory-exploratory continuum in Waldron & Allen (2022); it generalizes the conjoint-specific version from conjoint-design to all experimental designs.
- Decision Rules: For each confirmatory hypothesis, state in advance what constitutes support, falsification, or an inconclusive result. Specify: the test statistic, the alpha level, the SESOI, and the decision mapping (e.g., "If p < 0.05 and the coefficient exceeds 3 percentage points in the predicted direction, the hypothesis is supported; if the equivalence test rejects effects larger than 3 percentage points, the hypothesis is falsified; otherwise, the result is inconclusive"). Illustrative thresholds (e.g., the 3-percentage-point SESOI used above) must be justified from the user's own design, prior literature, and decision context -- they are not defaults.
- Exact Model Specifications: Write out every primary model in formal notation or code. For regression models, specify: the dependent variable, all independent variables, interaction terms, fixed effects, clustering structure, and the estimator. Ambiguous prose descriptions ("we will control for demographics") are insufficient -- name every variable. This is the remedy for the "garden of forking paths" problem (Gelman & Loken 2014): implicit analytical choices, even when made in good faith and before seeing results, inflate the false-positive rate unless pinned down in advance.
- Multiple Testing Corrections: Pre-specify the correction procedure and define which tests belong to the same family. For families of related tests (e.g., AMCEs across attributes within a single hypothesis), specify Benjamini-Hochberg (FDR control) or Bonferroni. Document the family groupings and the rationale for each.
4. Analysis Code Pre-Registration
- Simulated Data Approach: Generate a mock dataset that matches the expected data structure (variable names, types, distributions, sample size, missingness patterns). Write all analysis code -- data cleaning, primary models, robustness checks, planned figures -- to run on this simulated dataset. Register the code alongside the PAP. Lakens (2025 §13.6) calls this "the gold standard for a preregistration."
- Tooling for Simulated Data. For formal declare-design-diagnose workflows, use DeclareDesign (Blair, Cooper, Coppock, & Humphreys 2019, cited in Druckman-Green 2021 Table 18.1), which lets researchers specify the data-generating model, the inquiry, the data strategy, and the answer strategy and then diagnose the design before running it. Simpler simulations can use
fauxorsimstudyin R ornumpy+pandasin Python. The point is to produce a runnable pipeline, not a polished simulation. - Benefits: Code pre-registration eliminates ambiguity about analytical decisions that prose alone cannot resolve (e.g., how exactly are covariates centered? What happens to observations with missing values on one covariate but not others?). It also catches specification errors before data collection -- if the code does n