Paths: File paths (
references/,../ln-*) are relative to this skill directory.
Type: L2 Coordinator Category: 3XX Planning
Multi-Agent Validator
Evaluation-platform coordinator for:
mode=storymode=plan_review
This skill uses the evaluation platform for:
- mandatory official-doc, MCP Ref, Context7, and current-web research
- parallel read-only evidence lanes
- sequential documentation, repair, merge, refinement, and approval
- runtime-backed worker plans, worker summaries, agent sync, and cleanup verification
Inputs
| Input | Required | Source | Description |
|---|---|---|---|
storyId | mode=story | args, git branch, kanban, user | Story to validate |
plan {file} | mode=plan_review | args or auto | Plan file to validate |
Mode detection:
planorplan {file}->mode=plan_review- otherwise ->
mode=story
Mandatory Read
MANDATORY READ: Load references/environment_state_contract.md, references/storage_mode_detection.md, references/input_resolution_pattern.md
MANDATORY READ: Load references/evaluation_coordinator_runtime_contract.md, references/evaluation_summary_contract.md, references/evaluation_parallelism_policy.md, references/evaluation_research_contract.md
MANDATORY READ: Load references/agent_delegation_pattern.md
MANDATORY READ: Load references/penalty_points.md
MANDATORY READ: Load references/researchgraph_mcp_usage.md when researchgraph files changed or the target claims hypothesis, goal, benchmark, or proposal readiness.
Conditional read: load references/phase2_research_audit.md only when the coordinator performs inline criteria mapping instead of consuming ln-312 findings summaries.
Agent review policy: run health check, record skipped reason when no advisor is available, verify every advisor claim before merge, and treat transport/auth/tool failures as operator evidence rather than domain findings. Load references/agent_review_workflow.md only when debugging lifecycle/liveness details outside the evaluation runtime.
Worker Set
The coordinator uses these evaluation workers:
ln-311-review-research-workerln-312-review-findings-workerln-313-review-docs-workerln-314-review-repair-workerln-315-review-merge-workerln-316-review-refinement-worker
Worker Invocation (MANDATORY)
Host Skill Invocation: Skill(skill: "...", args: "...") is mandatory delegation.
- Claude: call the Skill tool exactly as shown.
- Codex: if no Skill tool exists, locate the named skill in available skills, read its
SKILL.md, treatargsas$ARGUMENTS, execute that skill workflow, then return here with its result/artifact. - Do not inline worker logic or mark the worker complete without executing the target skill.
Use the Skill tool for delegated workers. Do not inline worker logic inside the coordinator.
TodoWrite format (mandatory):
Resolve target and build runtime manifestLoad target artifacts and metadataLaunch external agents and verify healthRun research and findings workers in parallelGenerate documentation updatesApply accepted low-risk repairsSync agents and merge all evidenceRun refinement (MANDATORY in ALL modes when advisor available — do NOT skip)Compute verdict and write review outputVerify runtime cleanup and self-check
Representative invocations:
Skill(skill: "ln-311-review-research-worker", args: "{identifier} research")
Skill(skill: "ln-312-review-findings-worker", args: "{identifier} findings")
Skill(skill: "ln-313-review-docs-worker", args: "{identifier} docs")
Skill(skill: "ln-314-review-repair-worker", args: "{identifier} repair")
Skill(skill: "ln-315-review-merge-worker", args: "{identifier} merge")
Skill(skill: "ln-316-review-refinement-worker", args: "{identifier} refinement")
Runtime Contract
MANDATORY READ: Load references/loop_health_contract.md
Runtime family:
evaluation-runtime
Identifier:
story-{storyId}for story modeplan-{slug}for plan review
Phase order:
PHASE_0_CONFIGPHASE_1_DISCOVERYPHASE_2_AGENT_LAUNCHPHASE_3_EVIDENCE_LANESPHASE_4_DOCSPHASE_5_REPAIRPHASE_6_MERGEPHASE_7_REFINEMENTPHASE_8_APPROVALPHASE_9_SELF_CHECK
Phase policy:
delegate_phases = [PHASE_3_EVIDENCE_LANES, PHASE_4_DOCS, PHASE_5_REPAIR, PHASE_6_MERGE, PHASE_7_REFINEMENT]aggregate_phase = PHASE_6_MERGEreport_phase = PHASE_8_APPROVALcleanup_phase = PHASE_9_SELF_CHECKself_check_phase = PHASE_9_SELF_CHECKagent_resolve_before = [PHASE_6_MERGE]required_phases_when_advisor_available = [PHASE_7_REFINEMENT]
Parallelism Rules
Allowed overlap:
- external agents
ln-311ln-312- local repo inspection and evidence gathering
Sequential only:
ln-313ln-314ln-315ln-316- approval and status mutation
Workflow
Phase 0: Config
- Resolve
mode, identifier, and storage mode. - Resolve story or plan target.
- Build evaluation runtime manifest with:
expected_agentsrequired_research=true- exact
phase_order phase_policy- report path
- Start runtime:
node references/scripts/evaluation-runtime/cli.mjs start \
--skill ln-310 \
--identifier {identifier} \
--manifest-file .hex-skills/evaluation/{identifier}_manifest.json
- Checkpoint Phase 0.
Phase 1: Discovery
- Materialize the exact target artifact.
- Load only the metadata needed for the current mode.
- In
mode=story, resolve Story and child tasks. - In
mode=plan_review, resolve the plan file. - If researchgraph files changed or the target cites
H##,G##, run IDs, benchmark manifests, or readiness claims, run read-only researchgraph verification/audits and attach the result as validation evidence. - Checkpoint Phase 1 with resolved refs.
Phase 2: Agent Launch
- Run agent health check.
- Exclude disabled agents from
.hex-skills/environment_state.json. - If no agents are available:
- record
agents_skipped_reason - checkpoint Phase 2
- continue
- record
- Otherwise:
- build per-agent prompts
- launch each available agent
- register each launched agent:
node references/scripts/evaluation-runtime/cli.mjs register-agent \
--skill ln-310 \
--identifier {identifier} \
--agent {name} \
--prompt-file {promptPath} \
--result-file {resultPath} \
--metadata-file {metadataPath}
- Checkpoint Phase 2 with
health_check_done,agents_available,agents_required, and optionalagents_skipped_reason. - Classify each external agent result before domain verdict:
rate_limited,tool_missing,auth_missing,permission_denial, andasked_questionare transport/operator states.- Do not convert them into
NO-GOwithout domain evidence from artifacts or findings. - Record loop health for repeated advisor/session failures and pause when retry usefulness is exhausted.
Phase 3: Evidence Lanes
This phase is the mandatory parallel evidence barrier.
- Build
worker_planwith:ln-311laneresearch(mandatory)ln-312lanefindings(mandatory)
- Launch all planned workers in parallel.
- While those workers run, continue local repo inspection and collect additional evidence.
- Sync agents opportunistically, but do not block on them until merge.
- Record each worker summary with:
node references/scripts/evaluation-runtime/cli.mjs record-worker-result \
--skill ln-310 \
--identifier {identifier} \
--payload-file {childSummaryArtifactPath}
Research is mandatory in every mode:
- official documentation or standards
- MCP Ref
- Context7 when a library or framework is involved
- current web best-practice research
For mode=story, findings must still produce penalty-point evidence and coverage analysis.
Phase 4: Docs
- In
mode=story, runln-313-review-docs-workerwhen documentatio