Assumption Testing

Every solution rests on assumptions. Test the riskiest ones first with the lightest method possible.

Preflight: Read target canvas file(s) before any Write/Edit

Hard rule. Before issuing Write or Edit against any .claude/canvas/*.yml, use the Read tool on that file in this session. Claude Code's Read-before-Write check requires the Read tool specifically — cat/head/grep via Bash do NOT satisfy it.

Edit vs Write — different cost profiles (verified 2026-05-14):

Edit (exact-string replacement): Read with limit: 1 satisfies the check at ~50 tokens. State-tracking is per-file, not per-byte — subsequent Edit calls work anywhere in the file. Use this for partial updates against large canvas files (e.g., purpose.yml at 800+ lines).
Write (full replacement): do a full Read first. Write obliterates the file; you should see what you're about to replace. The limit:1 shortcut is not appropriate here.

ID-bearing entries — scan the ID space before assigning (added 2026-05-15, v0.23.19): When adding a new component, opportunity, solution, or any other ID-bearing entry to a canvas file, run a Bash grep first to confirm the next ID in your prefix sequence is actually free:

grep "^  - id: <prefix>-" .claude/canvas/<file>.yml | sort -u

Replace <prefix> with the canvas's ID prefix (comp for landscape, opp for opportunities, sol for solutions, ht for human-tasks, etc.). Then pick the next free integer. validate_canvas.py has a duplicate-ID check (lines 230-239) that catches the failure on CI, but a duplicate can persist in the working tree for days if CI isn't run between edit and discovery — see roadmap-repo corrections.md 2026-05-15 "Duplicate canvas ID created in landscape.yml" for the worked example.

Original failure mode: anti-pattern #7 instance #5, 2026-05-09 — agent conflated Bash head with the Read tool, lost ~14k tokens to a Write-fail → remedial-full-Read → re-Write loop. The limit:1 discipline (graduated 2026-05-14, v0.23.18) prevents the second-order cost where the agent correctly follows the rule but full-Reads every time. The ID-scan discipline (graduated 2026-05-15, v0.23.19) prevents the related class where the agent reads enough of the file to satisfy the Edit check but not enough to see existing ID assignments — kin to anti-pattern #8 (Stale State Read).

If this skill writes to multiple canvas files, register each one first (limit:1 for Edit-only paths; full Read for Write paths) AND ID-scan any prefix you intend to assign.

See CLAUDE.md Canvas writes — Read before Write for the canonical rule.

Assumption Types (Torres / Cagan)

Type	Question	Example
Desirability	Will users want this?	"Users will switch from current tool"
Usability	Can users figure it out?	"Users can complete onboarding in < 5 min"
Feasibility	Can we build this?	"We can process 10K requests/sec"
Viability	Should we build this?	"Unit economics work at scale"
Ethical	Should we build this? (morally)	"This doesn't exploit user vulnerabilities"

Step 1: Map Assumptions

For the target solution, list ALL assumptions. Be honest -- most "obvious" things are actually assumptions.

Couple the test to open canvas gaps (per engine/canvas-guidance.yml#learning_target_coupling): before finalizing the assumption list, scan the canvas for entries already waiting on evidence — ON HOLD / RE-GATED action flags, in-progress human-tasks naming a MISSING SIGNAL, low-confidence entries with an un-validated assumption. If this test touches any of them, add that gap to the list explicitly and tag it [target → <file>#<anchor>] so /mycelium:log-evidence routes the result back. A test that retires no open gap spends scarce feedback capacity without advancing the canvas. NUDGE-tier — zero-target tests are allowed, but make it a choice.

Step 2: Prioritize (2x2 Matrix)

Plot each assumption on:

X-axis: How much evidence do we have? (low to high)
Y-axis: How important is this to the solution's success? (low to high)

Test first: High importance + Low evidence (top-left quadrant)

Step 3: Choose the Lightest Test

Organized by Gilad's AFTER model (Assessment → Fact-Finding → Tests → Experiments → Release Results). Always start from the top and pick the lightest test that produces meaningful signal. Don't build a prototype when a survey would suffice.

Assessment (internal, cheapest — hours)

Test Type	Effort	Signal Quality	When to Use
Goals alignment	Minutes	Low	Check if the idea serves a current strategic goal
Business modeling	Hours	Low-Medium	Sketch unit economics or revenue model
ICE analysis	Hours	Low-Medium	Score Impact/Confidence/Ease (see `/mycelium:ice-score`)
Assumption mapping	Hours	Medium	List and prioritize all assumptions (Step 1-2 above)
Stakeholder review	Hours	Low	Internal expert judgment (beware organizational mythology — Brown)

Fact-Finding (external evidence — hours to days)

Test Type	Effort	Signal Quality	When to Use
Data analysis	Hours	Variable	You have existing behavioral data
Surveys	Hours	Low-Medium	Quick pulse on a specific question
Competitive analysis	Hours	Medium	Map alternatives users already use
User interviews	Days	High	Story-based interviews about past behavior (see `/mycelium:user-interview`)
Field research	Days	High	Observe users in their natural context

Tests (controlled artifacts — days to weeks)

Test Type	Effort	Signal Quality	When to Use
Smoke/fake door test	Days	Medium	Test demand before building
Concierge test	Days	High	Manually deliver the service
Wizard of Oz	Days	High	Fake the backend, real frontend
Usability test	Days	High	Test usability with interactive mockup (see `/mycelium:usability-check`)
Early adopters	Days-Weeks	High	Give access to known enthusiasts, observe behavior
Labs	Days-Weeks	Medium-High	Internal prototype environment for structured exploration
Fishfood	Days-Weeks	Medium-High	Internal-only release (your team uses it)
Dogfood	Weeks	High	Broader internal release (adjacent teams use it)
Alpha	Weeks	High	Controlled external release with selected users, known bugs expected
Beta	Weeks	High	Broader external release, feature-complete, collecting feedback
Preview	Weeks	High	Feature-flagged release to opted-in users
Longitudinal study	Weeks	Very High	Track same users over time for behavior change

Experiments (statistical comparisons — weeks)

Test Type	Effort	Signal Quality	When to Use
A/B test	2+ weeks	Very High	Test one change with real users at scale
A/B/n test	2+ weeks	Very High	Test multiple variants simultaneously
Multivariate test	2+ weeks	Very High	Test combinations of changes

Release Results (staged release — weeks)

Test Type	Effort	Signal Quality	When to Use
% Launch	Weeks	Very High	Roll out to a percentage of users, measure
Holdback	Weeks	Very High	Keep a control group on the old experience
Post-launch analysis	Ongoing	Very High	Measure outcomes after full release

Source: Gilad (AFTER model, Evidence-Guided / Testing Product Ideas Handbook). 28 techniques across 5 stages, ordered by cost and confidence.

Session-counter primitive (for shadow logs / longitudinal tests)

Tests in

assumption-test

How to add

Drop this on your repo README

Related skills

claude-api

skill-creator

oh-my-issues

claude-mem

Get new Desenvolvimento skills every Monday