Test Runner Skill
Project-instruction file resolution:
CLAUDE.mdandAGENTS.md(Codex CLI) are transparent aliases — see skills/_shared/instruction-file-resolution.md. Wherever this skill mentionsCLAUDE.md, the alias rule applies.
Soul
Before anything else, read and internalize soul.md in this skill directory. It defines WHO you are — your role as an orchestrator, your delegation boundaries, and your non-negotiable constraints.
Phase 0: Bootstrap Gate
Read skills/_shared/bootstrap-gate.md and execute the gate check. If GATE_CLOSED, invoke skills/bootstrap/SKILL.md and wait for completion. If GATE_OPEN, continue to Phase 1.
Phase 1: Read Session Config + Resolve Target / Profile
Read and parse Session Config per skills/_shared/config-reading.md. Store result as $CONFIG.
Test-runner specific fields (parse these specifically):
test-runner.default-profile(default:smoke)test-runner.retention-days(default:30)test-command,typecheck-command,lint-command(used for context only — not driven here)
Target / Profile Resolution
Resolution order (first match wins):
- CLI argument
--target <name> --profile <name>(explicit, highest priority) - Policy file lookup —
.orchestrator/policy/test-profiles.jsonby target name (if present) - Convention-based detection (marker files):
playwright.config.{ts,js}present → target typeweb, dispatchplaywright-driverPackage.swiftpresent → target typemac, dispatchpeekaboo-driver(seeskills/peekaboo-driver/SKILL.md)
- Fallback → emit error and halt:
Error: Cannot resolve target — provide --target or add .orchestrator/policy/test-profiles.json
Run ID
Generate a run ID immediately after target resolution:
import { makeRunId } from 'scripts/lib/test-runner/artifact-paths.mjs';
const runId = makeRunId(); // e.g. "your-target-app-1715688000123"
All artifact paths in subsequent phases derive from this run ID. Never use ad-hoc paths.
Status Report
After resolution, emit: Test Runner: target=[name] profile=[name] run_id=[runId] driver=[driver]
Phase 2: Driver Dispatch
Determine ${RUN_DIR} from artifact-paths.mjs:runDirPath(runId) before dispatching any driver. All drivers write artifacts under ${RUN_DIR}/.
--since Filtering (when since_ref is provided)
When since_ref is set (passed from the /test --since <git-ref> handoff contract):
- Import and call
changedFilesSince(since_ref)fromscripts/lib/discovery/helpers.mjs. - If the helper throws (ref unresolvable), surface the error to the user and halt.
- If the result is
[](no files changed since the ref), emit:
and exit with status 0. Do NOT fall back to a full-repo test run.No files changed since <since_ref>. Skipping test run. - If the result is a non-empty array, JSON-stringify it and set
TEST_CHANGED_FILESin the driver subprocess environment (see driver invocations below). Driver-side filtering is deferred — drivers receive the env var but do not yet filter by it in this wave.
For each resolved driver:
Web (playwright-driver)
Dispatch via Bash per skills/playwright-driver/SKILL.md. Pass ${RUN_DIR} so the driver writes all artifacts (screenshots, AX dumps, HAR) under it.
# Example invocation shape (exact flags defined by playwright-driver SKILL.md)
TEST_CHANGED_FILES="${CHANGED_FILES_JSON}" node scripts/lib/playwright-driver/runner.mjs \
--run-dir "${RUN_DIR}" \
--profile "${PROFILE}" \
--target "${TARGET}"
Where ${CHANGED_FILES_JSON} is JSON.stringify(changedFiles) when --since was provided, or an empty string otherwise.
Capture exit code. A non-zero exit from Playwright means test failures — these become findings for the UX evaluator. They are NOT a fatal error for the orchestrator. Continue to Phase 3 regardless of exit code.
Log: playwright-driver exited [code] — [N] test files captured under ${RUN_DIR}
macOS (peekaboo-driver)
See
skills/peekaboo-driver/SKILL.mdfor the full dispatch contract, permission probe, and artifact layout.
Pre-dispatch platform check: The driver's Phase 1 gate handles the platform and version checks (darwin + macOS 15.0+) and exits 0 (non-fatal skip) on incompatible systems. The orchestrator does not need to replicate these checks.
Permission probe: The driver runs its own Phase 2 permission probe via peekaboo permissions status --json. If required permissions (Screen Recording, Accessibility) are not granted, the driver surfaces an AUQ and exits 2 on failure. The orchestrator treats exit 2 as a driver-framework error, not a test failure.
Invocation:
# All inputs via environment variables — no positional arguments
RUN_DIR="${RUN_DIR}" TARGET="${TARGET}" PROFILE="${PROFILE}" bash skills/peekaboo-driver/SKILL.md
Outputs the orchestrator must parse:
| Artifact | Description |
|---|---|
${RUN_DIR}/exit_code | Plain integer file written by driver before exit |
${RUN_DIR}/results.json | Driver summary: exit_code, scenarios_attempted, scenarios_passed, scenarios_failed |
${RUN_DIR}/ax-snapshots/<scenario>.json | peekaboo AX-tree output per scenario |
${RUN_DIR}/ax-snapshots/glass-modifiers-<ts>.json | Liquid Glass conformance artifact (consumed by ux-evaluator Check 4) |
${RUN_DIR}/screenshots/<step>-<ts>.png | Per-step screenshots (evidence for ux-evaluator findings) |
${RUN_DIR}/console.ndjson | Driver log events as NDJSON |
Exit-code semantics:
| Code | Meaning | Orchestrator Action |
|---|---|---|
| 0 | All captures succeeded (or platform skip) | Record pass or skip, continue to Phase 3 |
| 1 | At least one capture failed | Failures become findings (non-fatal) — continue to Phase 3 |
| 2 | Framework error (missing binary, permission denied, OS mismatch) | Surface as driver error; still continue to Phase 3 with available artifacts |
Capture exit code. Exit 1 (capture failures) produces findings for the UX evaluator — it is NOT fatal for the orchestrator. Exit 2 (framework error) is surfaced in the report but does not halt Phase 3. Continue to Phase 3 regardless of exit code.
Log: peekaboo-driver exited [code] — [N] scenarios captured under ${RUN_DIR}
Phase 3: UX Evaluator Dispatch
Invoke the ux-evaluator agent (agents/ux-evaluator.md) via the Agent tool. The agent reads driver artifacts under ${RUN_DIR}/ and applies skills/test-runner/rubric-v1.md (4 checks). The agent writes findings.jsonl directly to ${RUN_DIR}/findings.jsonl — the coordinator does NOT need to forward findings through prompt context.
Agent({
description: `UX evaluate run ${runId}`,
prompt: `<scope: ${RUN_DIR}, rubric: skills/test-runner/rubric-v1.md, output: ${RUN_DIR}/findings.jsonl>`,
subagent_type: "ux-evaluator",
run_in_background: false
})
run_in_background: false is mandatory — Phase 4 depends on findings.jsonl being fully written before reconciliation begins.
After the agent completes, verify ${RUN_DIR}/findings.jsonl exists. If missing, emit a warning and skip Phase 4 (no findings to reconcile).
Phase 4: Issue Reconciliation
Read ${RUN_DIR}/findings.jsonl. Use the helpers in scripts/lib/test-runner/issue-reconcile.mjs for all glab/gh interactions — never call glab or gh directly.
Available Helper Functions (issue-reconcile.mjs)
| Function | Purpose |
|---|---|
listExistingFindings({glabPath, project, label, maxBuffer}) | Query the tracker for all open from:test-runner issues; returns {ok, issues[], fingerprints: Set} |
createFinding({glabPath, project, fingerprint, title, body, labels, dryRun, maxBuffer}) | Create a new issue; returns `{ok, action: |