Labloop — Autonomous Experiment Loop
You are managing an autonomous experiment loop. The user defines what to optimize; you run experiments forever until interrupted.
First, locate the skill directory by finding this SKILL.md file:
- Use Glob with pattern
**/skills/**/labloop/SKILL.mdto find its path, then derive SKILL_DIR. - The project config file is
labloop.mdin the current working directory.
Core philosophy
Extracted from Karpathy's autoresearch and generalized:
- One metric rules all — every experiment is judged by a single number
- Fixed budget per experiment — fair comparison, no "it would have been better with more time"
- Keep or discard — binary decision after each run; the branch only moves forward
- Never stop — once started, the agent runs indefinitely until the human interrupts
- Simplicity wins — equal results with less complexity = improvement
Command parsing
Parse $ARGUMENTS into one of these subcommands:
| User says (examples) | Subcommand |
|---|---|
init, setup, 初始化, 配置实验 | init |
go, start, run, 开始, 跑起来, 启动 | go |
status, 进度, 状态, 跑到哪了 | status |
history, 历史, 记录, 实验记录 | history |
rewind, rewind abc1234, 回退, 回到最佳 | rewind |
Subcommand: init
Interactive setup. Collect info to generate labloop.md in the project root.
Step 1 — Understand the project
Read the current directory structure. Ask the user:
- 研究目标:你想优化什么?(一句话描述)
- 可修改文件:agent 可以改哪些文件?(支持 glob,如
src/*.py) - 不可修改文件:哪些文件不能动?(评估逻辑、数据加载等)
Step 2 — Define the experiment
- 运行命令:跑一次实验的完整命令(如
python train.py,npm test,cargo bench) - 评估指标:
- 指标名称(如
val_loss,latency_ms,accuracy) - 方向:
lower_is_betterorhigher_is_better - 提取方式:一个 grep/regex 能从输出中抓到指标值(如
grep "^accuracy:" run.log)
- 指标名称(如
- 超时时间:单次实验最长运行时间(默认 5 分钟)
Step 3 — Constraints and hints
- 约束条件(可选):不能装新依赖、内存限制、不能改接口签名等
- 研究方向提示(可选):给 agent 的领域知识或优先尝试的方向
Step 4 — Generate config
- Read
SKILL_DIR/assets/labloop-template.mdas the template - Fill in all collected values
- Write to
./labloop.mdin the project root - Show summary and ask user to confirm
- Tell user: "配置完成!运行
/labloop go启动实验循环。"
Subcommand: go
Pre-flight checks
- Verify
labloop.mdexists in the current directory. If not → tell user to runinitfirst. - Read
labloop.mdfully — this is your operating manual for this project. - Read all files listed in the "可修改文件" and "不可修改文件" sections to build full context.
- Check if the project is a git repo. If not, run
git initand make an initial commit of all current files. - Verify the run command works: do a quick dry-run or check dependencies.
Baseline run
- Create experiment branch:
git checkout -b labloop/<tag>where<tag>is today's date (e.g.,jun21). If it exists, append a number (jun21-2). - Run the experiment command as-is to establish the baseline.
- Redirect output:
<run_command> > labloop-run.log 2>&1 - Extract the metric using the configured extraction command.
- If extraction fails → the run crashed. Read
tail -n 50 labloop-run.logfor diagnosis. - Initialize
labloop-results.tsvwith header and baseline row. - Commit baseline: this is experiment #0.
The experiment loop
LOOP FOREVER:
- Analyze history: Read
labloop-results.tsvto understand what's been tried, what worked, what failed. - Form hypothesis: Based on the code, results history, constraints, and research hints — decide what to try next. Write a one-line description of the experiment.
- Modify code: Edit only files listed in "可修改文件". Make targeted, reviewable changes.
- Commit:
git add <modified files> && git commit -m "experiment: <description>" - Run experiment:
<run_command> > labloop-run.log 2>&1- Set a timeout based on the configured limit. If exceeded, kill and treat as failure.
- Extract metric: Use the configured extraction command.
- If empty output → crash. Run
tail -n 50 labloop-run.logto read the error.
- If empty output → crash. Run
- Record results: Append to
labloop-results.tsv(do NOT commit this file):commit metric_value status description- status:
keep,discard, orcrash
- status:
- Decision:
- If metric improved (respecting direction) →
keep. The branch advances. - If metric is equal or worse →
discard. Rungit reset --hard HEAD~1to revert. - If crash → attempt a quick fix (typo, import error). If unfixable after 2 attempts → discard and move on.
- If metric improved (respecting direction) →
- Continue — go back to step 1. Do NOT ask the user anything. Do NOT stop.
Experiment strategy
When choosing what to try, follow this priority:
- Low-hanging fruit first — obvious improvements, known best practices
- One variable at a time — don't change 5 things at once; isolate the effect
- Learn from history — if increasing X helped, try increasing it more; if a direction consistently fails, stop trying it
- Diminishing returns — when small tweaks plateau, try a bigger structural change
- Combine near-misses — two changes that each barely missed improvement might work together
- Read the code deeply — the answer is often in the code, not in guessing
Crash handling
- Simple fix (typo, missing import, wrong variable name): fix and re-run, counts as same experiment
- Fundamental issue (approach doesn't work, OOM): log as crash, discard, move on
- Repeated crashes (3+ in a row): step back, re-read the code, try a completely different direction
Output discipline
- ALWAYS redirect experiment output to
labloop-run.log— never let it flood context - Extract only the metric line(s) from the log
- For crash diagnosis, read only
tail -n 50of the log - Never
catthe entire log file
Browser & desktop automation
When the experiment loop involves visual evaluation (screenshots, browser comparison), use background automation to avoid disrupting the user's active window.
Screenshot capture priority:
- CDP Proxy (browser-control skill) — open dev server in a background tab, screenshot without any user disruption:
TAB_ID=$(curl -s "http://localhost:3456/new?url=http://localhost:3000" | jq -r '.targetId') curl -s "http://localhost:3456/screenshot?target=$TAB_ID&file=/tmp/labloop-screenshot.png" - macOS Desktop Control MCP — capture a specific window via
targetparameter, no focus stealing:mcp__macos-desktop-control__screenshot(target: { app: "Google Chrome", title: "localhost" }) - Foreground screenshot — proceed directly if background methods are unavailable. Do NOT ask for user permission — autonomous execution must not be blocked.
Browser control priority:
- CDP Proxy — open/navigate/scroll/interact in background tabs via HTTP API
- macOS Desktop Control — click/scroll/type with
targetparameter for background operation - AppleScript/osascript — direct Chrome control (may briefly flash the window)
Key rule: The user and the agent work simultaneously. Never steal focus for routine operations. All screenshot, navigation, and comparison operations should default to background mode.
If companion skills (browser-control, chrome-control) or MCP servers (macos-desktop-control) are not available, log a one-time suggestion for the user to install them, then continue with whatever method works.
Subcommand: status
- Read
labloop-results.tsv - Show a compact summary:
- Total experiments run
- Current best metric and which commit
- Last 5 experiments (commit, metric, status, description)
- Improvement from baseline: percentage or absolute delta
- Show current git branch and HEAD commit
Subcommand: history
- Read and display the full
labloop-results.tsvas a formatted table - Add a summary row at the bottom: total experiments, keeps, discards, crashes, best metric