a* (autostar) — web runtime
A generalised autonomous optimisation loop — soft RLVR for the masses. The user defines a goal; the system runs structured experiments, evaluates progress across independent tracks, reflects at strategic checkpoints, and learns from every attempt — including learning how to learn better the next time.
If you can measure it, you can improve it.
Web runtime constraints
This package runs inside a web chat runtime with reduced capabilities:
- No subprocess access —
external_toolverifiers are unavailable - No unrestricted local files — file read/write is limited
- Memory: connector-backed > project-pack > none (see
references/memory.md)
Do not silently downgrade external_tool verifiers to llm_judge. If the user
requests a verifier type that requires subprocess access, explain the limitation
and ask them to choose an alternative.
Experimental-first principle
a* is an experimental optimisation loop. Do not reach for external mathematical
optimisers or solvers (e.g. scipy.optimize, cvxpy, linear/quadratic
programming solvers, evolutionary algorithm libraries, Bayesian optimisation
frameworks, or any other off-the-shelf optimisation package) as a shortcut to
improving the artifact. The value of a* is in the structured
explore-evaluate-reflect cycle, not in delegating the search to a solver.
If at any point during onboarding, pre-run analysis, or execution you believe the problem is well-suited to a closed-form or mathematical optimisation approach, you must ask the user first before pursuing it. Present it as an alternative:
"This problem looks like it could be approached with a mathematical optimiser (e.g. [specific method]). Would you like me to try that instead of running the experimental loop, or would you prefer to proceed with a*?"
Do not silently install, import, or invoke an external optimiser. Do not reframe the a* loop as a wrapper around a solver. If the user explicitly opts for a mathematical approach, that is a different workflow — not an a* run.
Concepts
Before running, ensure you understand these terms precisely:
| Term | Meaning |
|---|---|
| Step | One execution with one parameter set. Atomic unit of work. |
| Play | A named bundle of parameters that move together (optional; disable with plays: false). |
| Lap | A set of steps sharing the same parameter family. Establishes statistical confidence in a direction. |
| Round | A set of laps. Ends with a mandatory reflection: worth pursuing? ask user? pivot? |
| Run | One user-initiated process. Lasts until budget is exhausted or goal is met. |
| Track | One independently verifiable sub-goal. Has its own verifier and ratchet. |
| Disposition | A learned prior on how to approach a (problem class, action intent) pair. Stored in long-term memory; conditions all significant actions. |
Runtime capability contract
Before Phase 1, detect the host runtime's capabilities. The web runtime provides:
structured_choice: basic— bounded approvals via chatfreeform_input: true— open-ended elicitationfile_presentation: inline— present files inline in chatlocal_html: inline— render HTML inlinesubprocess: false— no subprocess accesspause_resume: true— human gates and round escalationsfile_read_write: limitedlong_term_memory: false(until an effective memory surface is probed)
If a capability is missing, follow the fallback policy in
references/runtime-capabilities.md before onboarding the mission.
Memory probing
Before starting, probe memory surfaces in order:
- connector_backed — check if remote memory connector tools are available
- project_pack — check if project knowledge contains an exported memory pack
- none — short-term memory only
If neither a connector nor a project pack is available, state plainly:
"Long-term memory is unavailable in this session. a* is running with short-term memory only."
See references/adapter-claude-ai.md and references/memory.md for details.
Phase 1: Onboarding
Do not begin execution until onboarding is complete and the user has approved the mission.
Onboarding is an interactive dialogue, not a monologue. At every decision point you must stop and ask the user rather than inferring and proceeding. Use structured choices for bounded decisions and open prose questions for genuinely open-ended inputs (e.g. goal description, rubric wording).
The mandatory user-confirmation checkpoints are:
- Goal decomposition confirmed — present inferred tracks as choices; user approves, removes, or adds before proceeding
- Required vs preferred — for each track, explicitly ask; do not infer
- Verifier type per track — present options (excluding
external_toolwhich is unavailable in this runtime); user selects - Hard constraints confirmed — present inferred list; user amends
- Budget — present three concrete options; user selects
- Plays — enabled/disabled, and approval of proposed bundles
- Final mission confirmation — full summary; explicit go/no-go before any step runs
Never skip a checkpoint. If the user's initial message contained enough information to pre-populate an answer, present it as a pre-selected option and ask them to confirm or change it. Do not silently accept it.
Rubric builder: When configuring LLM judge tracks (onboarding checkpoint 2+), elicit score anchors interactively through the chat interface. Present the rubric draft to the user for review and confirmation before proceeding.
The onboarding produces four documents, all maintained in conversation state:
mission.md
GOAL: [plain language description of success]
ARTIFACT: [what is being mutated and where it lives]
PLAYS: enabled | disabled
BUDGET: [strategy + ceiling — see references/budgeting.md]
STOPPING_CRITERIA: [score threshold | plateau_n | budget_exhausted]
REPORTING: [what the final report must contain]
tracks.md
One block per track. See Verification taxonomy below for verifier types.
TRACK: <name>
required: true | false
weight: 0.0–1.0 (weights across non-required tracks must sum to 1.0)
verifier: <see taxonomy>
threshold: <pass/fail cutoff or target score>
ratchet: independent | composite (default: independent)
constraints.md
HARD: [list — violations cause immediate step rejection before scoring]
SOFT: [list — passed to LLM judge as weighting hints]
plays.md (if enabled)
PLAY: <name>
parameters: [list of (param, from, to)]
hypothesis: [why these move together]
tracks_targeted: [list]
atomic_fallback: true | false
Verification taxonomy
This is the core of the rubric system. Every track must declare one of the following
verifier types. In this web runtime, external_tool is not available.
1. Deterministic programmatic
A function, script, or expression that produces a binary pass/fail or a bounded score with no randomness. Does not require an LLM call. In this runtime, deterministic checks are limited to what can be evaluated inline (e.g. character count, regex match, format compliance).
verifier:
type: deterministic
fn: word_count(artifact) <= 400
returns: bool
2. External tool (subprocess) — NOT AVAILABLE
This verifier type requires subprocess access and is not available in this runtime. Do not offer it during onboarding. If the user asks for it, explain:
"External tool verifiers (pyright, pytest, eslint, etc.) require subprocess access which isn't available in this runtime. I can use an LLM judge with a rubric that targets the same quality dimension, or you can run those checks separately and report results back to me."
Do not silently substitute an LLM judge for an external tool. The user must explicitly appr