Assumption Testing
Every solution rests on assumptions. Test the riskiest ones first with the lightest method possible.
Preflight: Read target canvas file(s) before any Write/Edit
Hard rule. Before issuing Write or Edit against any .claude/canvas/*.yml, use the Read tool on that file in this session. Claude Code's Read-before-Write check requires the Read tool specifically — cat/head/grep via Bash do NOT satisfy it.
Edit vs Write — different cost profiles (verified 2026-05-14):
Edit(exact-string replacement):Readwithlimit: 1satisfies the check at ~50 tokens. State-tracking is per-file, not per-byte — subsequentEditcalls work anywhere in the file. Use this for partial updates against large canvas files (e.g.,purpose.ymlat 800+ lines).Write(full replacement): do a full Read first. Write obliterates the file; you should see what you're about to replace. Thelimit:1shortcut is not appropriate here.
ID-bearing entries — scan the ID space before assigning (added 2026-05-15, v0.23.19): When adding a new component, opportunity, solution, or any other ID-bearing entry to a canvas file, run a Bash grep first to confirm the next ID in your prefix sequence is actually free:
grep "^ - id: <prefix>-" .claude/canvas/<file>.yml | sort -u
Replace <prefix> with the canvas's ID prefix (comp for landscape, opp for opportunities, sol for solutions, ht for human-tasks, etc.). Then pick the next free integer. validate_canvas.py has a duplicate-ID check (lines 230-239) that catches the failure on CI, but a duplicate can persist in the working tree for days if CI isn't run between edit and discovery — see roadmap-repo corrections.md 2026-05-15 "Duplicate canvas ID created in landscape.yml" for the worked example.
Original failure mode: anti-pattern #7 instance #5, 2026-05-09 — agent conflated Bash head with the Read tool, lost ~14k tokens to a Write-fail → remedial-full-Read → re-Write loop. The limit:1 discipline (graduated 2026-05-14, v0.23.18) prevents the second-order cost where the agent correctly follows the rule but full-Reads every time. The ID-scan discipline (graduated 2026-05-15, v0.23.19) prevents the related class where the agent reads enough of the file to satisfy the Edit check but not enough to see existing ID assignments — kin to anti-pattern #8 (Stale State Read).
If this skill writes to multiple canvas files, register each one first (limit:1 for Edit-only paths; full Read for Write paths) AND ID-scan any prefix you intend to assign.
See CLAUDE.md Canvas writes — Read before Write for the canonical rule.
Assumption Types (Torres / Cagan)
| Type | Question | Example |
|---|---|---|
| Desirability | Will users want this? | "Users will switch from current tool" |
| Usability | Can users figure it out? | "Users can complete onboarding in < 5 min" |
| Feasibility | Can we build this? | "We can process 10K requests/sec" |
| Viability | Should we build this? | "Unit economics work at scale" |
| Ethical | Should we build this? (morally) | "This doesn't exploit user vulnerabilities" |
Step 1: Map Assumptions
For the target solution, list ALL assumptions. Be honest -- most "obvious" things are actually assumptions.
Couple the test to open canvas gaps (per engine/canvas-guidance.yml#learning_target_coupling): before finalizing the assumption list, scan the canvas for entries already waiting on evidence — ON HOLD / RE-GATED action flags, in-progress human-tasks naming a MISSING SIGNAL, low-confidence entries with an un-validated assumption. If this test touches any of them, add that gap to the list explicitly and tag it [target → <file>#<anchor>] so /mycelium:log-evidence routes the result back. A test that retires no open gap spends scarce feedback capacity without advancing the canvas. NUDGE-tier — zero-target tests are allowed, but make it a choice.
Step 2: Prioritize (2x2 Matrix)
Plot each assumption on:
- X-axis: How much evidence do we have? (low to high)
- Y-axis: How important is this to the solution's success? (low to high)
Test first: High importance + Low evidence (top-left quadrant)
Step 3: Choose the Lightest Test
Organized by Gilad's AFTER model (Assessment → Fact-Finding → Tests → Experiments → Release Results). Always start from the top and pick the lightest test that produces meaningful signal. Don't build a prototype when a survey would suffice.
Assessment (internal, cheapest — hours)
| Test Type | Effort | Signal Quality | When to Use |
|---|---|---|---|
| Goals alignment | Minutes | Low | Check if the idea serves a current strategic goal |
| Business modeling | Hours | Low-Medium | Sketch unit economics or revenue model |
| ICE analysis | Hours | Low-Medium | Score Impact/Confidence/Ease (see /mycelium:ice-score) |
| Assumption mapping | Hours | Medium | List and prioritize all assumptions (Step 1-2 above) |
| Stakeholder review | Hours | Low | Internal expert judgment (beware organizational mythology — Brown) |
Fact-Finding (external evidence — hours to days)
| Test Type | Effort | Signal Quality | When to Use |
|---|---|---|---|
| Data analysis | Hours | Variable | You have existing behavioral data |
| Surveys | Hours | Low-Medium | Quick pulse on a specific question |
| Competitive analysis | Hours | Medium | Map alternatives users already use |
| User interviews | Days | High | Story-based interviews about past behavior (see /mycelium:user-interview) |
| Field research | Days | High | Observe users in their natural context |
Tests (controlled artifacts — days to weeks)
| Test Type | Effort | Signal Quality | When to Use |
|---|---|---|---|
| Smoke/fake door test | Days | Medium | Test demand before building |
| Concierge test | Days | High | Manually deliver the service |
| Wizard of Oz | Days | High | Fake the backend, real frontend |
| Usability test | Days | High | Test usability with interactive mockup (see /mycelium:usability-check) |
| Early adopters | Days-Weeks | High | Give access to known enthusiasts, observe behavior |
| Labs | Days-Weeks | Medium-High | Internal prototype environment for structured exploration |
| Fishfood | Days-Weeks | Medium-High | Internal-only release (your team uses it) |
| Dogfood | Weeks | High | Broader internal release (adjacent teams use it) |
| Alpha | Weeks | High | Controlled external release with selected users, known bugs expected |
| Beta | Weeks | High | Broader external release, feature-complete, collecting feedback |
| Preview | Weeks | High | Feature-flagged release to opted-in users |
| Longitudinal study | Weeks | Very High | Track same users over time for behavior change |
Experiments (statistical comparisons — weeks)
| Test Type | Effort | Signal Quality | When to Use |
|---|---|---|---|
| A/B test | 2+ weeks | Very High | Test one change with real users at scale |
| A/B/n test | 2+ weeks | Very High | Test multiple variants simultaneously |
| Multivariate test | 2+ weeks | Very High | Test combinations of changes |
Release Results (staged release — weeks)
| Test Type | Effort | Signal Quality | When to Use |
|---|---|---|---|
| % Launch | Weeks | Very High | Roll out to a percentage of users, measure |
| Holdback | Weeks | Very High | Keep a control group on the old experience |
| Post-launch analysis | Ongoing | Very High | Measure outcomes after full release |
Source: Gilad (AFTER model, Evidence-Guided / Testing Product Ideas Handbook). 28 techniques across 5 stages, ordered by cost and confidence.
Session-counter primitive (for shadow logs / longitudinal tests)
Tests in