El comando exacto puede variar según el repositorio. Consulta el README en GitHub.
Para el autor de la skill
Pega en el README de tu repo
Muestra que tu skill está catalogada en Skillteca, genera backlink y tráfico rastreable.
[](https://www.skillteca.com.br/skills/survey-design-scdenney?utm_source=badge&utm_medium=readme&utm_campaign=badge)
Recibe nuevas skills de Design e Frontend todos los lunes
Un email corto con solo las skills nuevas de Design e Frontend. 4 minutos de lectura, sin spam, te das de baja con un clic.
Confirmas tu email en el primer envío. Sin spam. Te das de baja con un clic.
Survey Instrument Designer
Instructions
1. Question Construction
Item-Specific Wording: Frame questions with item-specific response options rather than agree/disagree, true/false, or yes/no formats. Instead of "Do you agree that immigration benefits the economy?" use "How much does immigration benefit or harm the economy?" with a substantive scale. This reduces acquiescence bias and forces respondents to process the item content (Stantcheva 2023).
Open-Ended vs. Closed-Ended: Use open-ended questions to discover respondent frames and vocabulary before designing closed-ended items. Deploy open-ended items in pilots to generate response categories, then convert to closed-ended for the main study. In the main survey, reserve open-ended items for exploratory or manipulation-check purposes.
Behavioral vs. Attitudinal: Prefer behavioral measures (what respondents would do) over attitudinal measures (what respondents feel) when the research question concerns real-world consequences. Attitudinal items are appropriate when the construct of interest is itself an attitude, but note that attitude-behavior gaps are well documented.
Avoid Double-Barreled Questions: Each item should measure exactly one construct. "Do you support increased immigration and refugee resettlement?" conflates two distinct policy domains. Split into separate items.
Avoid Leading and Loaded Language: Avoid terms that signal a "correct" answer or carry strong normative connotations. Pilot-test whether question framing shifts responses -- if it does, the wording is a treatment, not a measure (Stantcheva 2023).
Numeric vs. Qualitative Response Options: For cross-country or cross-group comparisons, prefer qualitative response options ("a lot," "somewhat," "not at all") over exact numeric quantities. Specific numbers carry different informational weight across contexts -- "$50,000 income" means different things in the US and South Korea (Stantcheva 2023).
2. Scale Design
Number of Scale Points: 5- to 7-point scales are a common convention for attitudinal items, trading off discrimination against cognitive load. Fewer points can lose meaningful variance; more points may not add measurement precision. Reliability depends more on whether each point is meaningfully labeled than on the raw count; assess test-retest and internal consistency for the target construct rather than defaulting to a fixed number. For knowledge or factual questions, binary or categorical formats are often sufficient.
Labeled vs. Endpoint-Only: Label all scale points when feasible. Fully labeled scales reduce respondent uncertainty about the meaning of intermediate values and improve cross-respondent comparability.
Unipolar vs. Bipolar: Match scale polarity to the construct. Bipolar scales (oppose--support) suit constructs with a natural midpoint. Unipolar scales (not at all--extremely) suit constructs with a natural zero point (e.g., frequency, intensity).
Feeling Thermometers: Use with caution. Feeling thermometers (0--100) introduce measurement noise because respondents interpret the scale differently. They are useful for relative comparisons across targets within respondents but unreliable for absolute-level interpretation across respondents.
Index Construction: When combining multiple items into an index, assess internal consistency (Cronbach's alpha, McDonald's omega, or composite reliability) and report the method used alongside the chosen threshold and its rationale. The historical alpha > 0.70 convention is a starting point, not a standard; omega is generally preferred for multidimensional scales. For population-based survey experiments specifically, multi-item indices of the dependent variable are strongly preferred over single-item measures because heterogeneous samples inflate within-group variance (Mutz 2011). Pre-specify index construction rules in the pre-analysis plan; do not construct indices after seeing the data (see also methods-reporting).
Balanced Scales: Include equal numbers of positively and negatively worded options or directional anchors. Unbalanced scales (three positive options, one negative) bias responses toward the overrepresented direction.
3. Survey Flow and Organization
Ordering Effects: Question order affects responses through priming, anchoring, and context effects. Place general questions before specific ones when measuring broad attitudes; reverse this when specific experiences are the construct of interest.
Warm-Up Items: Begin the survey with non-sensitive, low-stakes items to build respondent engagement before introducing experimental blocks or sensitive questions. Demographic items can serve this purpose but should not precede treatment blocks if they could prime identity salience.
Treatment Placement: Place experimental treatments after warm-up items but before primary outcome measures. Separate treatment exposure from outcome elicitation with buffer items to reduce experimenter demand effects (Stantcheva 2023). The timing between treatment and outcome measurement is itself a design choice: too close risks unintentionally signaling the treatment-outcome link, too distant risks the treatment "wearing off" before the outcome is captured (Mutz 2011).
Treatment-Outcome Separation: Insert unrelated items or a brief distractor block between treatment and outcome to reduce the salience of the treatment-outcome link. This mitigates demand effects without introducing significant respondent burden. When the construct of interest is itself a short-lived priming effect, shorten the separation; when the worry is experimenter demand, lengthen it (Mutz 2011).
Block Randomization: Randomize the order of thematic blocks (e.g., policy attitudes, demographic items, secondary measures) across respondents to prevent systematic ordering effects. Within blocks, randomize item order for nominal response sets. Caveat: if a context effect is the estimand (e.g., you are studying how question order shapes response), do not randomize it away; instead manipulate order as a factor.
Manipulation Check Placement: Place manipulation checks after outcome elicitation, not immediately after treatment. Post-treatment manipulation checks placed before outcomes can signal the study's purpose and inflate demand effects (Stantcheva 2023; Mutz 2011). Distinguish among: (a) attention checks / instructed-response items that catch satisficing, (b) comprehension checks that verify respondents understood treatment content, and (c) manipulation checks that verify the independent variable moved — these serve different purposes and need not all be placed identically.
4. Pretesting and Cognitive Interviewing
Cognitive Interview Protocols: Conduct cognitive interviews using think-aloud protocols (respondents verbalize their reasoning while answering) and/or retrospective probing (follow-up questions about interpretation and processing). A common rule of thumb is 5--10 respondents per round; the substantive rule is to iterate until no new comprehension problems emerge (saturation).
Pilot Studies vs. Soft Launches: These serve distinct purposes. Pilot studies test content: comprehension, response distributions, treatment uptake, manipulation check performance. Soft launches test logistics: survey flow, skip logic, display rendering, timing, and platform-specific issues. Conduct both, in sequence (Stantcheva 2023). Treatment pretesting is especially important in population-based experiments because heterogeneous samples weaken the statistical signal of any given manipulation (Mutz 2011).
Pilot Timing: Pilot after instrument draft, before IRB submission when possible (so findings can inform the registered design), and before full deployment. Budget for at least two rounds of piloting.
What to Test: Assess comprehension (do respondents interpret items as intended?), info