AlterLab GameForge -- Structured Playtest Analysis

Playtesting is not asking players if they had fun. It is the disciplined observation of player behavior to identify where the design succeeds and where it fails. The player's mouth lies -- their hands do not. Nintendo has known this for decades: Miyamoto famously watches players silently, trusting their confusion over their compliments. Larian ran thousands of community playtests during BG3's Early Access, and every major system change traced back to behavioral data, not forum polls. This workflow provides a rigorous behavioral observation framework that transforms raw playtest sessions into actionable design insights.

Purpose & Triggers

Invoke this workflow when:

A build is ready for external eyes and you need structured feedback, not just reactions
Specific design questions need answering: "Do players understand the crafting system?" not "Is the game good?"
Onboarding flow needs validation -- can new players learn the core mechanic without a tutorial?
Difficulty curve assessment -- are players in the flow channel or oscillating between boredom and frustration?
A new feature has been integrated and its impact on the overall experience is unknown
Pre-release polish pass needs data on which rough edges matter most to players
Competitive analysis requires side-by-side comparison with a reference game

Do NOT use this workflow when:

You need to test a raw mechanic in isolation (use game-prototype instead)
The build is so broken that testers will spend most of their time hitting bugs (fix critical bugs first, then playtest)
You want marketing quotes or positive testimonials (that is PR, not playtesting)

Critical Rules

Define questions before inviting testers. Every playtest answers specific questions. "Is it fun?" is not a question -- it is a prayer. "Can players complete the first dungeon without dying more than twice?" is a question. Celeste's playtests asked "can players learn the dash mechanic within the first three screens?" -- specific, observable, actionable.
The facilitator does not play. You observe. You take notes. You do not help, explain, suggest, or react. Your poker face is a scientific instrument.
Minimum 5 testers per session. Fewer than 5 and you are collecting anecdotes, not data. Individual player quirks dominate small samples. At 5+ testers, patterns emerge.
Never test with the development team. They know too much. Their muscle memory, mental models, and context make them incapable of experiencing the game as a new player. Nintendo's internal playtesting teams are deliberately kept away from development discussions so they approach each session cold. Your developers are blind to every onboarding problem they have already internalized.
Behavioral data outranks verbal data. If a player says "the controls feel fine" but you observed them pressing the wrong button 11 times in a 10-minute session, the behavioral data wins. Always. Larian tracked BG3 playtester behavior at the input level -- they knew which dialogue options players hovered over before choosing, and that hesitation data informed their rewrite of Act 1.
Separate observation from interpretation. During the session, record what happened. After the session, interpret what it means. Mixing the two in real-time creates confirmation bias.
Reference docs/game-design-theory.md for Flow Theory and MDA Framework when analyzing player engagement and emotional responses.

Workflow

Step 1: Pre-Playtest Preparation

Define the test objectives. Write 3-5 specific questions this playtest will answer. Each question should be:

Observable (you can determine the answer by watching, not just asking)
Actionable (the answer directly informs a design decision)
Scoped (answerable within a single play session)

Good test questions:

"Do players discover the dodge-roll mechanic organically within the first two encounters?"
"At what point in the progression curve do players stop voluntarily exploring and start rushing to objectives?"
"Does the resource scarcity in Act 2 create tension or frustration?"

Prepare the observation sheet. For each test question, define:

What specific player behaviors indicate success (positive signals)
What specific player behaviors indicate failure (negative signals)
Where in the game session to watch most closely (critical observation windows)

Create the per-player tracking form:

Player ID: ___
Session Date: ___
Session Duration: ___
Test Build Version: ___

Timestamped Observations:
[MM:SS] [Observation] [Category: Action/Hesitation/Confusion/Emotion/Verbal]

Post-Session Survey Responses:
Q1: ___
Q2: ___
Q3: ___

Set up recording infrastructure:

Screen capture with audio (mandatory -- you will miss things in real-time that the recording catches)
Face camera if available (facial micro-expressions reveal engagement, confusion, and frustration that players will never verbalize)
Input logging if your engine supports it (heatmaps of where players click, where they die, where they spend time)
Ensure recordings are timestamped and synchronized so you can cross-reference player expression with game events

Prepare the test environment:

Use a consistent hardware setup across all testers (different frame rates and input devices contaminate results)
Remove development overlays, debug menus, and console access
Disable any developer shortcuts or god-mode toggles
Have a clean save state ready so every tester starts from the same point
Test the recording setup with a dry run before the first tester arrives

Brief your facilitators (if you have helpers):

Their only job is to observe and record. Not to help. Not to explain. Not to react.
If a tester asks "What do I do?" the correct response is: "What do you think you should do?"
If a tester is completely stuck for more than 90 seconds on a non-critical path, they may offer a single neutral hint ("Have you tried interacting with the glowing object?"). Log this as a critical finding.
Facilitators should not sit directly next to the player. Peripheral awareness of being watched changes behavior. Sit behind and to the side.

Step 2: During the Playtest -- Silent Observation Protocol

This is where discipline matters most. You are a scientist. Your personal feelings about the game are irrelevant during this phase.

Real-time observation categories:

Actions -- What is the player doing?

Record moment-to-moment decisions. Not just "player fought the boss" but "player circled the boss for 15 seconds before attacking, suggesting they were looking for a weak point or building courage."
Track navigation patterns. Do players go where you intended? Where do they go instead? Unintended exploration paths reveal what the environment is actually communicating versus what you think it communicates.
Note input patterns. Button mashing (panic or boredom), deliberate presses (strategic engagement), repeated failed inputs (control confusion).

Hesitations -- Where does the player pause?

A pause before a door means the player is anticipating what is behind it (good -- you created tension).
A pause at a menu means the player does not understand the options (bad -- your UI is unclear).
A pause in combat means the player is either strategizing (good) or overwhelmed (bad). Their facial expression and subsequent action disambiguate.

Confusions -- Where does the player misunderstand?

Track "expectation mismatches" -- moments where the player clearly expected one outcome and got another. These are the highest-value findings in any playtest.
Note instances where the player uses a mechanic incorrectly but thinks they are using it correctly. This reveals that your feedback systems are not communicating state clearly.
Watch for players reading the same tooltip or sign multiple times -- it means the information was unclear or they do

game-playtest

How to add

Drop this on your repo README

Related skills

webapp-testing

brand-guidelines

frontend-design

mcp-builder

Get new Design e Frontend skills every Monday

AlterLab GameForge -- Structured Playtest Analysis

Purpose & Triggers

Critical Rules

Workflow

Comments · No comments