Test Runner Skill

Project-instruction file resolution: CLAUDE.md and AGENTS.md (Codex CLI) are transparent aliases — see skills/_shared/instruction-file-resolution.md. Wherever this skill mentions CLAUDE.md, the alias rule applies.

Soul

Before anything else, read and internalize soul.md in this skill directory. It defines WHO you are — your role as an orchestrator, your delegation boundaries, and your non-negotiable constraints.

Phase 0: Bootstrap Gate

Read skills/_shared/bootstrap-gate.md and execute the gate check. If GATE_CLOSED, invoke skills/bootstrap/SKILL.md and wait for completion. If GATE_OPEN, continue to Phase 1.

<HARD-GATE> Do NOT proceed past Phase 0 if GATE_CLOSED. There is no bypass. Refer to `skills/_shared/bootstrap-gate.md` for the full HARD-GATE constraints. </HARD-GATE>

Phase 1: Read Session Config + Resolve Target / Profile

Read and parse Session Config per skills/_shared/config-reading.md. Store result as $CONFIG.

Test-runner specific fields (parse these specifically):

test-runner.default-profile (default: smoke)
test-runner.retention-days (default: 30)
test-command, typecheck-command, lint-command (used for context only — not driven here)

Target / Profile Resolution

Resolution order (first match wins):

CLI argument --target <name> --profile <name> (explicit, highest priority)
Policy file lookup — .orchestrator/policy/test-profiles.json by target name (if present)
Convention-based detection (marker files):
- playwright.config.{ts,js} present → target type web, dispatch playwright-driver
- Package.swift present → target type mac, dispatch peekaboo-driver (see skills/peekaboo-driver/SKILL.md)

Fallback → emit error and halt:

Error: Cannot resolve target — provide --target or add .orchestrator/policy/test-profiles.json

Run ID

Generate a run ID immediately after target resolution:

import { makeRunId } from 'scripts/lib/test-runner/artifact-paths.mjs';
const runId = makeRunId(); // e.g. "your-target-app-1715688000123"

All artifact paths in subsequent phases derive from this run ID. Never use ad-hoc paths.

Status Report

After resolution, emit: Test Runner: target=[name] profile=[name] run_id=[runId] driver=[driver]

Phase 2: Driver Dispatch

Determine ${RUN_DIR} from artifact-paths.mjs:runDirPath(runId) before dispatching any driver. All drivers write artifacts under ${RUN_DIR}/.

--since Filtering (when `since_ref` is provided)

When since_ref is set (passed from the /test --since <git-ref> handoff contract):

Import and call changedFilesSince(since_ref) from scripts/lib/discovery/helpers.mjs.
If the helper throws (ref unresolvable), surface the error to the user and halt.
If the result is [] (no files changed since the ref), emit:
```
No files changed since <since_ref>. Skipping test run.
```
and exit with status 0. Do NOT fall back to a full-repo test run.
If the result is a non-empty array, JSON-stringify it and set TEST_CHANGED_FILES in the driver subprocess environment (see driver invocations below). Driver-side filtering is deferred — drivers receive the env var but do not yet filter by it in this wave.

For each resolved driver:

Web (playwright-driver)

Dispatch via Bash per skills/playwright-driver/SKILL.md. Pass ${RUN_DIR} so the driver writes all artifacts (screenshots, AX dumps, HAR) under it.

# Example invocation shape (exact flags defined by playwright-driver SKILL.md)
TEST_CHANGED_FILES="${CHANGED_FILES_JSON}" node scripts/lib/playwright-driver/runner.mjs \
  --run-dir "${RUN_DIR}" \
  --profile "${PROFILE}" \
  --target "${TARGET}"

Where ${CHANGED_FILES_JSON} is JSON.stringify(changedFiles) when --since was provided, or an empty string otherwise.

Capture exit code. A non-zero exit from Playwright means test failures — these become findings for the UX evaluator. They are NOT a fatal error for the orchestrator. Continue to Phase 3 regardless of exit code.

Log: playwright-driver exited [code] — [N] test files captured under ${RUN_DIR}

macOS (peekaboo-driver)

See skills/peekaboo-driver/SKILL.md for the full dispatch contract, permission probe, and artifact layout.

Pre-dispatch platform check: The driver's Phase 1 gate handles the platform and version checks (darwin + macOS 15.0+) and exits 0 (non-fatal skip) on incompatible systems. The orchestrator does not need to replicate these checks.

Permission probe: The driver runs its own Phase 2 permission probe via peekaboo permissions status --json. If required permissions (Screen Recording, Accessibility) are not granted, the driver surfaces an AUQ and exits 2 on failure. The orchestrator treats exit 2 as a driver-framework error, not a test failure.

Invocation:

# All inputs via environment variables — no positional arguments
RUN_DIR="${RUN_DIR}" TARGET="${TARGET}" PROFILE="${PROFILE}" bash skills/peekaboo-driver/SKILL.md

Outputs the orchestrator must parse:

Artifact	Description
`${RUN_DIR}/exit_code`	Plain integer file written by driver before exit
`${RUN_DIR}/results.json`	Driver summary: `exit_code`, `scenarios_attempted`, `scenarios_passed`, `scenarios_failed`
`${RUN_DIR}/ax-snapshots/<scenario>.json`	peekaboo AX-tree output per scenario
`${RUN_DIR}/ax-snapshots/glass-modifiers-<ts>.json`	Liquid Glass conformance artifact (consumed by ux-evaluator Check 4)
`${RUN_DIR}/screenshots/<step>-<ts>.png`	Per-step screenshots (evidence for ux-evaluator findings)
`${RUN_DIR}/console.ndjson`	Driver log events as NDJSON

Exit-code semantics:

Code	Meaning	Orchestrator Action
0	All captures succeeded (or platform skip)	Record pass or skip, continue to Phase 3
1	At least one capture failed	Failures become findings (non-fatal) — continue to Phase 3
2	Framework error (missing binary, permission denied, OS mismatch)	Surface as driver error; still continue to Phase 3 with available artifacts

Capture exit code. Exit 1 (capture failures) produces findings for the UX evaluator — it is NOT fatal for the orchestrator. Exit 2 (framework error) is surfaced in the report but does not halt Phase 3. Continue to Phase 3 regardless of exit code.

Log: peekaboo-driver exited [code] — [N] scenarios captured under ${RUN_DIR}

Phase 3: UX Evaluator Dispatch

Invoke the ux-evaluator agent (agents/ux-evaluator.md) via the Agent tool. The agent reads driver artifacts under ${RUN_DIR}/ and applies skills/test-runner/rubric-v1.md (4 checks). The agent writes findings.jsonl directly to ${RUN_DIR}/findings.jsonl — the coordinator does NOT need to forward findings through prompt context.

Agent({
  description: `UX evaluate run ${runId}`,
  prompt: `<scope: ${RUN_DIR}, rubric: skills/test-runner/rubric-v1.md, output: ${RUN_DIR}/findings.jsonl>`,
  subagent_type: "ux-evaluator",
  run_in_background: false
})

run_in_background: false is mandatory — Phase 4 depends on findings.jsonl being fully written before reconciliation begins.

After the agent completes, verify ${RUN_DIR}/findings.jsonl exists. If missing, emit a warning and skip Phase 4 (no findings to reconcile).

Phase 4: Issue Reconciliation

Read ${RUN_DIR}/findings.jsonl. Use the helpers in scripts/lib/test-runner/issue-reconcile.mjs for all glab/gh interactions — never call glab or gh directly.

Available Helper Functions (issue-reconcile.mjs)

Function	Purpose
`listExistingFindings({glabPath, project, label, maxBuffer})`	Query the tracker for all open `from:test-runner` issues; returns `{ok, issues[], fingerprints: Set}`
`createFinding({glabPath, project, fingerprint, title, body, labels, dryRun, maxBuffer})`	Create a new issue; returns `{ok, action:

test-runner

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

dev-browser

agent-browser

understand-chat

understand-dashboard

Recibe nuevas skills de Pesquisa e Web todos los lunes

Test Runner Skill

Soul

Phase 0: Bootstrap Gate

Phase 1: Read Session Config + Resolve Target / Profile

Target / Profile Resolution

Run ID

Status Report

Phase 2: Driver Dispatch

--since Filtering (when `since_ref` is provided)

Web (playwright-driver)

macOS (peekaboo-driver)

Phase 3: UX Evaluator Dispatch

Phase 4: Issue Reconciliation

Available Helper Functions (issue-reconcile.mjs)

Comentarios · Sin comentarios

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

dev-browser

agent-browser

understand-chat

understand-dashboard

Recibe nuevas skills de Pesquisa e Web todos los lunes

Test Runner Skill

Soul

Phase 0: Bootstrap Gate

Phase 1: Read Session Config + Resolve Target / Profile

Target / Profile Resolution

Run ID

Status Report

Phase 2: Driver Dispatch

--since Filtering (when since_ref is provided)

Web (playwright-driver)

macOS (peekaboo-driver)

Phase 3: UX Evaluator Dispatch

Phase 4: Issue Reconciliation

Available Helper Functions (issue-reconcile.mjs)

Comentarios · Sin comentarios

--since Filtering (when `since_ref` is provided)