Autopilot
The ultimate autonomous shipping skill. User gave vision. They are asleep, away, or working on something else. Job: ship a real, production-ready, reviewable result before they look at it next. No babysitting. No technical questions. No slop.
The contract
- User gave the vision. Treat as gospel. If something is missing, think harder, spawn research, write a test, read the codebase. Do not interrupt them with a question reasoning + agents can answer.
- Ship to the current branch. No worktrees. No feature branches. Atomic commits straight on the branch you start on.
- Never push.
git pushis forbidden. User reviews diff and pushes themselves. - Never block except for one of the hard stops in references/decision-rules.md.
- Prod-ready or it does not ship. Real tests. Real edge cases. Real security review. Match references/prod-readiness.md.
- No playwright. Real unit + integration tests against real services. See references/testing.md.
- Always /frontend-design for any UI work. See references/ui-protocol.md.
- No AI slop, no padding, no half-shipped state, no dead code, no feature flags hiding incomplete work.
If a rule above conflicts with anything else, the rule above wins.
When to invoke
Trigger when user says any of:
- "autopilot" / "auto pilot" / "/autopilot"
- "ship this for me"
- "build this while I sleep"
- "you handle the whole thing"
- "take it and run"
- "make it real, I will review when I am back"
The arc
0. Bootstrap — capture vision, init run dir, snapshot state
1. Plan — invoke /autoplan, get reviewed plan back
2. Research blitz — parallel subagents (only when truly needed)
3. UX & design — /frontend-design (skip if no UI)
4. TDD build — failing test first, then minimum code, then refactor
5. Adversarial — try to break it: edges, security, races, failures
6. Prod gate — /code-review high, /cso, real prod checklist
7. Commit — atomic commits to current branch (NO push)
8. Notify — push + desktop banner + REVIEW_SUMMARY.md
Each phase is detailed in references/workflow.md. Read it once at start of run.
Boot sequence (run these in order, immediately)
- Confirm session memory loaded — if you have a memory/context MCP tool configured, call its session-start fetch now. Skip if not configured.
- Capture vision — re-read user's full message. Pull every requirement, constraint, brand cue, UX hint. Save verbatim to
$AUTOPILOT_RUN_DIR/vision.md. - Run init script —
bash ${SKILL_DIR}/scripts/init_run.sh "<short-slug>". Creates run dir, snapshots git state, createsRUN_LOG.md,DECISION_LOG.md. ExportsAUTOPILOT_RUN_DIRpath on stdout. Capture and export it. - Acknowledge briefly — one line to user. Example:
Autopilot engaged. Will notify on review-ready or hard stop.Then go silent until done. - Begin phase 1.
Phase 1 — Plan
- Invoke
autoplanSkill via the Skill tool. Pass the vision as input. /autoplanruns CEO + design + eng + devex review pipeline.- Save reviewed plan to
$AUTOPILOT_RUN_DIR/plan.md. - If
/autoplansurfaces a taste decision you can resolve with research/reasoning, resolve it autonomously and log toDECISION_LOG.md. Only escalate if it matches a hard-stop.
Phase 2 — Research blitz (only when needed)
Do NOT spawn research subagents reflexively. Spawn them only when the plan has a concrete unknown reasoning cannot close. Examples warranting research:
- Library API surface or breaking change in last 18 months → context7
- Existing implementation pattern for a tricky algorithm → Nia (github / tracer mode) if available, else web search
- Research paper or technique → Nia papers if available, else web search
- Latest world state (model versions, pricing, SDK changes) → web search MCP / live-fetch MCP
- Prior decisions for this user → your memory MCP tool, if configured
When in doubt: spawn research, do not interrupt the user. See references/research-protocol.md.
Hard rule on parallelism: When 2+ independent research questions exist, fire in a single message with multiple Agent tool blocks. Never serial.
Phase 3 — UX & design
If work touches any UI surface:
- Invoke
frontend-designSkill. Read its full guidance. - If greenfield UI with no established design language, also invoke
design-shotgunto explore variants. Pick the best one autonomously using heuristics in references/ui-protocol.md. Log pick + rationale toDECISION_LOG.md. - Generate design assets (HTML mocks or component sketches) into
$AUTOPILOT_RUN_DIR/design/.
If no UI: skip phase 3 entirely.
Phase 4 — TDD build
Invoke superpowers:test-driven-development Skill and follow rigorously.
Build loop per feature slice:
- Write failing test first (real test, real assertions, real fixtures, real services where possible).
- Run test → confirm it fails for the right reason.
- Write minimum code to pass.
- Run all tests. Confirm green.
- Refactor if needed. Re-run.
- Atomic commit with real conventional-commit message.
Test stack rules: references/testing.md. Banned: Playwright. Allowed: vitest, bun test, jest, pytest, real-service integration tests, contract tests, fuzz tests where appropriate.
For server/full-stack behavior verification, use /browse (gstack) to take screenshots and assert page state — that is verification, not the test suite.
Phase 5 — Adversarial break
Once green, spawn a general-purpose Agent with the prompt: "You are an adversarial QA + security engineer. Your job is to break the code in $AUTOPILOT_RUN_DIR. Try every angle from [references/prod-readiness.md] and [references/security.md]. List every break + repro + suggested fix. Do not propose fixes — just find breaks."
Run it. Fix every legitimate break. Re-run all tests. Repeat until adversarial agent finds nothing real (cosmetic-only findings OK to defer with a logged decision).
Phase 6 — Prod gate
Run these in parallel as Agents and/or Skills:
code-reviewSkill at high effort across the diffcsoSkill (security audit, daily mode)bash ${SKILL_DIR}/scripts/prod_check.sh— typecheck, lint, build, full test suite, dependency audit, secret scan, RLS verification (if Supabase/Postgres), bundle size, observability hooks present, health checks honest
Every red item gets fixed. Re-run. Only proceed when prod_check.sh exits zero.
Phase 7 — Commit
- Atomic conventional commits on current branch.
- NO
Co-Authored-By. Ever. - NO
git push. Ever. - One feature = several small commits is fine. One commit per feature is fine. One giant commit is not fine.
- Write final
REVIEW_SUMMARY.mdto$AUTOPILOT_RUN_DIR/REVIEW_SUMMARY.mdusingassets/REVIEW_SUMMARY.md.template.
Phase 8 — Notify
bash ${SKILL_DIR}/scripts/notify.sh done writes a structured .notify-queue.json describing the message for each channel. The skill then invokes whatever notify MCP tools the user has configured (push, TTS, HUD). The local desktop banner fires directly via osascript (macOS) or notify-send (Linux).
Configured channels (substitute whichever MCP tools the user has):
push: a phone-reaching tool (e.g. Telegram bot, Pushover, ntfy, Discord webhook)tts: a text-to-speech tool (optional)hud: a wearable/HUD tool (optional)desktop: always on, local banner
Notification body MUST link to REVIEW_SUMMARY.md and the commit range.
After notify: if a persistent-memory MCP tool is configured, write a one-paragraph summary of what shipped + run dir path. Otherwise skip.
Hard-stop protocol
If a hard-stop fires (see references/decision-rules.md):
- Stop work immediately. Commit whatever