SSkilltecabyclaudinhocode
Enviar skill
← Voltar para o catálogo

webtest-orch

Desenvolvimento

End-to-end web app testing. Use when user says "test the app", "run e2e", "smoke test", "regression run", "check the login/onboarding/chat flow", "audit accessibility", "test responsive", or "find bugs in <url>" — even when Playwright is not named. Bootstraps Playwright + axe-core, runs LLM exploration on first run, deterministic replay afterward, emits markdown report + bugs.json + *.spec.ts file

3estrelas
Ver no GitHub ↗Autor: CreatmanCEOLicença: MIT

webtest-orch

End-to-end testing orchestrator for web applications. Splits into first-run exploratory (LLM-driven via Playwright MCP) and nth-run deterministic replay (npx playwright test, ~zero LLM tokens). Emits regression specs, normalized bugs.json, markdown + HTML report.

Project state (auto-injected at skill load)

  • Working dir: !pwd
  • Tests dir: !test -d tests && echo yes || echo no
  • Playwright deps: !test -f node_modules/.bin/playwright && echo yes || echo no
  • Config: !test -f playwright.config.ts && echo yes || echo no
  • Auth state: !test -f playwright/.auth/user.json && echo present || echo missing
  • Listening servers: !bash -c 'command -v lsof >/dev/null && lsof -iTCP:3000,5173,8000,8080,8081 -sTCP:LISTEN -P -n 2>/dev/null | tail -n +2 || (command -v ss >/dev/null && ss -tlnp 2>/dev/null | grep -E ":3000|:5173|:8000|:8080|:8081") || echo none'
  • Last run id: !bash -c 'r=$(ls -1t reports 2>/dev/null | head -1); echo "${r:-never}"'
  • Last bugs JSON: !bash -c 'b=$(ls -t reports/*/bugs.json 2>/dev/null | head -1); echo "${b:-none}"'
  • Isolation verified: !test -f "${CLAUDE_SKILL_DIR}/.isolation-verified" && echo yes || echo no
  • Test creds file: !test -f .env.test && echo yes || echo missing

Image budget protection — READ FIRST, MANDATORY

The problem: Claude Code has two independent context limits — text tokens (large) and inline-image blocks (~50–100 per session). Screenshots returned inline burn the image budget far faster than the text budget; once exhausted, the user must /compact even at 20% text-context usage.

Distinction that matters:

  • Inline image returns to parent context burn the budget. This includes browser_take_screenshot default output (image returned to caller), Read on a .png/.jpg/.webp/.gif/.bmp/.svg, markdown report with ![]() shown to parent.
  • On-disk artefacts that nobody Reads are FREE. Playwright's failure screenshots go to test-results/, MCP browser tools may save .pngs to a cache dir — none of these cost the parent context UNLESS you Read them.

The hard rule, enforced by you (not by frontmatter):

NEVER return screenshots to the parent skill context. ALWAYS dispatch a Task subagent (general-purpose) for anything that produces or consumes images. Subagent returns ONLY text — paths, descriptions, verdicts.

This contract was attempted via context: fork frontmatter but Claude Code 2.1.x on Windows does not honor that field, so enforcement is delegated to you reading these instructions. Verified empirically 2026-04-28 (sub-agent isolation works; context: fork does not parse). See ${CLAUDE_SKILL_DIR}/.isolation-verified.

Forbidden in this skill's parent context:

  • Playwright:browser_take_screenshot (default returns image inline) — wrap it in a Task subagent
  • Read on *.png/.jpg/.webp/.gif/.bmp/.svg from any path — Task subagent reads, summarizes
  • ❌ Markdown reports with ![](path.png) shown to parent — only print absolute filesystem paths
  • chrome-devtools:take_screenshot — same Task wrapper rule

Approved patterns:

PATTERN A — text-only browser exploration (default 90% of work)
  Playwright:browser_navigate / browser_snapshot (ARIA tree → text)
  Playwright:browser_evaluate (DOM scrape → JSON)
  axe-core via spawned npx process → JSON violations
  console / network listeners → JSON
  → ALL outputs are text. No image budget cost.

PATTERN B — vision genuinely required (max 3-5 times per run)
  Task tool, subagent_type: "general-purpose", prompt:
    "Read ONE image at <absolute path>. Output: <severity>: <symptom> in <selector> at <viewport>.
     One line. No preamble. Do not return the image."
  → subagent burns its own image cap, parent stays clean.

PATTERN C — pixel-diff baseline (deterministic, scriptable)
  Spec uses toHaveScreenshot() — Playwright reports diff% as TEXT in JSON output.
  Diff > threshold → run Pattern B on the failed image only.

If you ever feel tempted to call browser_take_screenshot from this skill's parent context "just to check" — STOP. That single call costs the user a future /compact. Use browser_snapshot (ARIA tree) instead. If that's not enough, dispatch Pattern B.

If ${CLAUDE_SKILL_DIR}/.isolation-verified is missing, run Step 0 before any browser work.

Step 0 — Image isolation self-test (once per install)

Skip if Isolation verified: yes above. Otherwise:

  1. bash -c 'python "${CLAUDE_SKILL_DIR}/scripts/_image_isolation_check.py" --gen-fixtures'
  2. Dispatch a Task subagent with this exact prompt:

    "Read these 3 files with the Read tool and return one short text description per file: ${CLAUDE_SKILL_DIR}/fixtures/iso-test/a.png, ${CLAUDE_SKILL_DIR}/fixtures/iso-test/b.png, ${CLAUDE_SKILL_DIR}/fixtures/iso-test/c.png. Output 3 lines, no preamble."

  3. Verify response is 3 lines of text (no inline images leaked back).
  4. bash -c 'python "${CLAUDE_SKILL_DIR}/scripts/_image_isolation_check.py" --mark-verified'

If step 3 returns inline images instead of text → STOP, escalate to user, do not run any further test work.

Workflow

Copy this checklist into TodoWrite at session start; tick as you go.

  • 1. State probe. Read the auto-injected table above. Identify mode:

    • No tests/ AND no playwright.config.tsBOOTSTRAP
    • Both present, requested flow is covered by existing specs → REPLAY
    • Both present, requested flow is new → HYBRID
  • 2. (BOOTSTRAP only) Scaffold from ${CLAUDE_SKILL_DIR}/templates/:

    • Auth detection first: read .env.test. If TEST_USER_EMAIL and TEST_USER_PASSWORD are present → AUTHED FLOW; if both missing → PUBLIC FLOW.
    • AUTHED FLOW:
      • playwright.config.ts.tmplplaywright.config.ts (has setup project + storageState)
      • auth.setup.ts.tmpltests/auth.setup.ts
      • fixture.ts.tmpltests/fixtures/index.ts
      • Run tests/auth.setup.ts once → playwright/.auth/user.json
    • PUBLIC FLOW:
      • playwright.config.public.ts.tmplplaywright.config.ts (no setup, no storageState)
      • Skip auth.setup.ts and fixtures/. Specs import directly from @playwright/test.
    • Substitute __PROJECT_BASE_URL__ etc. from probe or .env.test
    • npm i -D @playwright/test @axe-core/playwright dotenv
    • npx playwright install chromium webkit
  • 3. Scope. Decide what to test:

    • Specific URL passed by user → that route only
    • "test the app" → discover from sitemap/git diff HEAD~1 for changed routes
    • First run → minimal critical-path: home + auth + one main flow
  • 4. Dev server up. python "${CLAUDE_SKILL_DIR}/scripts/with_server.py" --help. Use it; do not read its source unless --help doesn't cover the case.

  • 5a. EXPLORATORY (BOOTSTRAP / new flow in HYBRID): use Playwright MCP with Playwright:browser_snapshot (ARIA tree, text). Walk the flow, generate POM in tests/pages/<Page>.ts, generate spec in tests/specs/<flow>.spec.ts. Generate locators from ARIA tree refs you actually saw — do NOT use generic regex like getByPlaceholder(/john doe|name|имя/i), they cause strict-mode violations on first run. Either use exact strings from the snapshot OR add .first() explicitly. Run the spec once to confirm green.

    🔴 SPEC GENERATION CONTRACT — non-negotiable. Even if you skip the template and write a spec from scratch (when product context is rich), every generated *.spec.ts MUST contain ALL of these:

    1. Console listeners attached BEFORE page.goto(): consoleErrors[] from page.on('pageerror') and page.on('console', m => m.type() === 'error').
    2. Network listeners attached BEFORE page.goto(): failedRequests[] from page.on('response', r => r.status() >= 400 && ...) and page.on('requestfailed').
    3. **AxeBuilder

Como adicionar

/plugin marketplace add CreatmanCEO/webtest-orch

O comando exato pode variar conforme o repositório. Confira o README no GitHub.

Comentários · Nenhum comentário

Entre para comentar. Entrar

  • Ainda não há comentários. Seja o primeiro.