webtest-orch
End-to-end testing orchestrator for web applications. Splits into first-run exploratory (LLM-driven via Playwright MCP) and nth-run deterministic replay (npx playwright test, ~zero LLM tokens). Emits regression specs, normalized bugs.json, markdown + HTML report.
Project state (auto-injected at skill load)
- Working dir: !
pwd - Tests dir: !
test -d tests && echo yes || echo no - Playwright deps: !
test -f node_modules/.bin/playwright && echo yes || echo no - Config: !
test -f playwright.config.ts && echo yes || echo no - Auth state: !
test -f playwright/.auth/user.json && echo present || echo missing - Listening servers: !
bash -c 'command -v lsof >/dev/null && lsof -iTCP:3000,5173,8000,8080,8081 -sTCP:LISTEN -P -n 2>/dev/null | tail -n +2 || (command -v ss >/dev/null && ss -tlnp 2>/dev/null | grep -E ":3000|:5173|:8000|:8080|:8081") || echo none' - Last run id: !
bash -c 'r=$(ls -1t reports 2>/dev/null | head -1); echo "${r:-never}"' - Last bugs JSON: !
bash -c 'b=$(ls -t reports/*/bugs.json 2>/dev/null | head -1); echo "${b:-none}"' - Isolation verified: !
test -f "${CLAUDE_SKILL_DIR}/.isolation-verified" && echo yes || echo no - Test creds file: !
test -f .env.test && echo yes || echo missing
Image budget protection — READ FIRST, MANDATORY
The problem: Claude Code has two independent context limits — text tokens (large)
and inline-image blocks (~50–100 per session). Screenshots returned inline burn
the image budget far faster than the text budget; once exhausted, the user must
/compact even at 20% text-context usage.
Distinction that matters:
- ❌ Inline image returns to parent context burn the budget. This includes
browser_take_screenshotdefault output (image returned to caller),Readon a.png/.jpg/.webp/.gif/.bmp/.svg, markdown report with![]()shown to parent. - ✅ On-disk artefacts that nobody Reads are FREE. Playwright's failure
screenshots go to
test-results/, MCP browser tools may save.pngs to a cache dir — none of these cost the parent context UNLESS youReadthem.
The hard rule, enforced by you (not by frontmatter):
NEVER return screenshots to the parent skill context. ALWAYS dispatch a Task subagent (general-purpose) for anything that produces or consumes images. Subagent returns ONLY text — paths, descriptions, verdicts.
This contract was attempted via context: fork frontmatter but Claude Code 2.1.x on Windows does not honor that field, so enforcement is delegated to you reading these instructions. Verified empirically 2026-04-28 (sub-agent isolation works; context: fork does not parse). See ${CLAUDE_SKILL_DIR}/.isolation-verified.
Forbidden in this skill's parent context:
- ❌
Playwright:browser_take_screenshot(default returns image inline) — wrap it in a Task subagent - ❌
Readon*.png/.jpg/.webp/.gif/.bmp/.svgfrom any path — Task subagent reads, summarizes - ❌ Markdown reports with
shown to parent — only print absolute filesystem paths - ❌
chrome-devtools:take_screenshot— same Task wrapper rule
Approved patterns:
PATTERN A — text-only browser exploration (default 90% of work)
Playwright:browser_navigate / browser_snapshot (ARIA tree → text)
Playwright:browser_evaluate (DOM scrape → JSON)
axe-core via spawned npx process → JSON violations
console / network listeners → JSON
→ ALL outputs are text. No image budget cost.
PATTERN B — vision genuinely required (max 3-5 times per run)
Task tool, subagent_type: "general-purpose", prompt:
"Read ONE image at <absolute path>. Output: <severity>: <symptom> in <selector> at <viewport>.
One line. No preamble. Do not return the image."
→ subagent burns its own image cap, parent stays clean.
PATTERN C — pixel-diff baseline (deterministic, scriptable)
Spec uses toHaveScreenshot() — Playwright reports diff% as TEXT in JSON output.
Diff > threshold → run Pattern B on the failed image only.
If you ever feel tempted to call browser_take_screenshot from this skill's parent context "just to check" — STOP. That single call costs the user a future /compact. Use browser_snapshot (ARIA tree) instead. If that's not enough, dispatch Pattern B.
If ${CLAUDE_SKILL_DIR}/.isolation-verified is missing, run Step 0 before any browser work.
Step 0 — Image isolation self-test (once per install)
Skip if Isolation verified: yes above. Otherwise:
bash -c 'python "${CLAUDE_SKILL_DIR}/scripts/_image_isolation_check.py" --gen-fixtures'- Dispatch a Task subagent with this exact prompt:
"Read these 3 files with the Read tool and return one short text description per file:
${CLAUDE_SKILL_DIR}/fixtures/iso-test/a.png,${CLAUDE_SKILL_DIR}/fixtures/iso-test/b.png,${CLAUDE_SKILL_DIR}/fixtures/iso-test/c.png. Output 3 lines, no preamble." - Verify response is 3 lines of text (no inline images leaked back).
bash -c 'python "${CLAUDE_SKILL_DIR}/scripts/_image_isolation_check.py" --mark-verified'
If step 3 returns inline images instead of text → STOP, escalate to user, do not run any further test work.
Workflow
Copy this checklist into TodoWrite at session start; tick as you go.
-
1. State probe. Read the auto-injected table above. Identify mode:
- No
tests/AND noplaywright.config.ts→ BOOTSTRAP - Both present, requested flow is covered by existing specs → REPLAY
- Both present, requested flow is new → HYBRID
- No
-
2. (BOOTSTRAP only) Scaffold from
${CLAUDE_SKILL_DIR}/templates/:- Auth detection first: read
.env.test. IfTEST_USER_EMAILandTEST_USER_PASSWORDare present → AUTHED FLOW; if both missing → PUBLIC FLOW. - AUTHED FLOW:
playwright.config.ts.tmpl→playwright.config.ts(has setup project + storageState)auth.setup.ts.tmpl→tests/auth.setup.tsfixture.ts.tmpl→tests/fixtures/index.ts- Run
tests/auth.setup.tsonce →playwright/.auth/user.json
- PUBLIC FLOW:
playwright.config.public.ts.tmpl→playwright.config.ts(no setup, no storageState)- Skip
auth.setup.tsandfixtures/. Specs import directly from@playwright/test.
- Substitute
__PROJECT_BASE_URL__etc. from probe or.env.test npm i -D @playwright/test @axe-core/playwright dotenvnpx playwright install chromium webkit
- Auth detection first: read
-
3. Scope. Decide what to test:
- Specific URL passed by user → that route only
- "test the app" → discover from sitemap/
git diff HEAD~1for changed routes - First run → minimal critical-path: home + auth + one main flow
-
4. Dev server up.
python "${CLAUDE_SKILL_DIR}/scripts/with_server.py" --help. Use it; do not read its source unless--helpdoesn't cover the case. -
5a. EXPLORATORY (BOOTSTRAP / new flow in HYBRID): use Playwright MCP with
Playwright:browser_snapshot(ARIA tree, text). Walk the flow, generate POM intests/pages/<Page>.ts, generate spec intests/specs/<flow>.spec.ts. Generate locators from ARIA tree refs you actually saw — do NOT use generic regex likegetByPlaceholder(/john doe|name|имя/i), they cause strict-mode violations on first run. Either use exact strings from the snapshot OR add.first()explicitly. Run the spec once to confirm green.🔴 SPEC GENERATION CONTRACT — non-negotiable. Even if you skip the template and write a spec from scratch (when product context is rich), every generated
*.spec.tsMUST contain ALL of these:- Console listeners attached BEFORE
page.goto():consoleErrors[]frompage.on('pageerror')andpage.on('console', m => m.type() === 'error'). - Network listeners attached BEFORE
page.goto():failedRequests[]frompage.on('response', r => r.status() >= 400 && ...)andpage.on('requestfailed'). - **
AxeBuilder
- Console listeners attached BEFORE