a* (autostar)
A generalised autonomous optimisation loop — soft RLVR for the masses. The user defines a goal; the system runs structured experiments, evaluates progress across independent tracks, reflects at strategic checkpoints, and learns from every attempt — including learning how to learn better the next time.
If you can measure it, you can improve it.
Experimental-first principle
a* is an experimental optimisation loop. Do not reach for external mathematical
optimisers or solvers (e.g. scipy.optimize, cvxpy, linear/quadratic
programming solvers, evolutionary algorithm libraries, Bayesian optimisation
frameworks, or any other off-the-shelf optimisation package) as a shortcut to
improving the artifact. The value of a* is in the structured
explore-evaluate-reflect cycle, not in delegating the search to a solver.
If at any point during onboarding, pre-run analysis, or execution you believe the problem is well-suited to a closed-form or mathematical optimisation approach, you must ask the user first before pursuing it. Present it as an alternative:
"This problem looks like it could be approached with a mathematical optimiser (e.g. [specific method]). Would you like me to try that instead of running the experimental loop, or would you prefer to proceed with a*?"
Do not silently install, import, or invoke an external optimiser. Do not reframe the a* loop as a wrapper around a solver. If the user explicitly opts for a mathematical approach, that is a different workflow — not an a* run.
Concepts
Before running, ensure you understand these terms precisely:
| Term | Meaning |
|---|---|
| Step | One execution with one parameter set. Atomic unit of work. |
| Play | A named bundle of parameters that move together (optional; disable with plays: false). |
| Lap | A set of steps sharing the same parameter family. Establishes statistical confidence in a direction. |
| Round | A set of laps. Ends with a mandatory reflection: worth pursuing? ask user? pivot? |
| Run | One user-initiated process. Lasts until budget is exhausted or goal is met. |
| Track | One independently verifiable sub-goal. Has its own verifier and ratchet. |
| Disposition | A learned prior on how to approach a (problem class, action intent) pair. Stored in long-term memory; conditions all significant actions. |
Runtime capability contract
Before Phase 1, detect the host runtime's capabilities and map them onto the
abstract adapter contract in references/runtime-capabilities.md.
Use abstract capabilities first:
structured_choicefor bounded approvalsfreeform_inputfor open-ended elicitationfile_presentation/local_htmlfor rubric builder and visualisersubprocessfor external-tool verifiers and render scriptspause_resumefor human gates and round escalations
Claude-specific tools are examples of adapters, not the specification:
- Claude Code:
ask_user+ shell + browser/file paths - Claude.ai: structured chat +
present_files
If a capability is missing, follow the fallback policy in
references/runtime-capabilities.md before onboarding the mission.
Concrete runtime profiles and adapters live in:
runtime-profiles/claude-code.jsonruntime-profiles/codex.jsonruntime-profiles/gemini.jsonruntime-profiles/claude-ai.jsonruntime-profiles/pi.jsonruntime-profiles/chat-only.jsonruntime-profiles/template.jsonreferences/adapter-claude-code.mdreferences/adapter-codex.mdreferences/adapter-gemini.mdreferences/adapter-claude-ai.mdreferences/adapter-pi.mdreferences/adapter-chat-only.mdreferences/adapter-template.mdscripts/runtime_profile.py
Before detailed verifier/rubric work, check that the active runtime can support
the proposed mission. Use scripts/runtime_profile.py check-mission with the
current runtime profile and planned verifier types. If it fails, stop and
reconfigure before proceeding.
Phase 1: Onboarding
Do not begin execution until onboarding is complete and the user has approved the mission.
Onboarding is an interactive dialogue, not a monologue. At every decision point you
must stop and ask the user rather than inferring and proceeding. Use the host
runtime's structured_choice capability for bounded decisions; in Claude Code
this maps to ask_user. Use open prose questions for genuinely open-ended inputs
(e.g. goal description, rubric wording).
The mandatory user-confirmation checkpoints are:
- Goal decomposition confirmed — present inferred tracks as choices; user approves, removes, or adds before proceeding
- Required vs preferred — for each track, explicitly ask; do not infer
- Verifier type per track — present options; user selects
- Hard constraints confirmed — present inferred list; user amends
- Budget — present three concrete options; user selects
- Plays — enabled/disabled, and approval of proposed bundles
- Final mission confirmation — full summary; explicit go/no-go before any step runs
Never skip a checkpoint. If the user's initial message contained enough information to pre-populate an answer, present it as a pre-selected option and ask them to confirm or change it. Do not silently accept it.
Rubric builder: When configuring LLM judge tracks (onboarding checkpoint 2+),
surface the bundled rubric builder through the runtime's local_html or
file_presentation capability so the user can describe score anchors
interactively and get a generated rubric they can edit and confirm:
# Claude Code / terminal
open assets/rubric-builder.html # macOS
xdg-open assets/rubric-builder.html # Linux
start assets/rubric-builder.html # Windows
If running in Claude.ai, use present_files on assets/rubric-builder.html instead.
If the runtime cannot surface local HTML, fall back to manual rubric elicitation as
defined in references/runtime-capabilities.md. The user exports a tracks.md
from the tool; load that as the confirmed track configuration. Only fall back to
manual elicitation for tracks the tool did not cover (external_tool,
deterministic, human_gate types do not need a rubric).
Read references/onboarding.md for the full dialogue flow, question wording, and
decision trees at each checkpoint. Read references/runtime-capabilities.md
before adapting this flow to a non-Claude host.
Rubric builder UI: When Phase B (verifier elicitation) reaches an llm_judge
or hybrid track, present assets/rubric-builder.html to the user before
configuring that track. The builder calls Claude to generate the rubric from the
user's anchor descriptions, lets them review and edit it inline, and exports
a tracks.md file you can use directly. Tell the user:
"I'm opening the rubric builder for the [track name] track. Describe the score anchors, and it will draft the rubric for you to review and confirm."
After the user exports tracks.md from the builder, read it and use it as the
track configuration. Do not re-elicit rubrics that are already confirmed there.
The onboarding produces four documents, all stored in the run directory:
mission.md
GOAL: [plain language description of success]
ARTIFACT: [what is being mutated and where it lives]
PLAYS: enabled | disabled
BUDGET: [strategy + ceiling — see references/budgeting.md]
STOPPING_CRITERIA: [score threshold | plateau_n | budget_exhausted]
REPORTING: [what the final report must contain]
tracks.md
One block per track. See Verification taxonomy below for verifier types.
TRACK: <name>
required: true | false
weight: 0.0–1.0 (weights across non-required tracks must sum to 1.0)
verifier: <see taxonomy>
threshold: <pass/fail cutoff or target score>
ratchet: independent | composite (default: independent)
constraints.md
HARD: