Autopilot

The ultimate autonomous shipping skill. User gave vision. They are asleep, away, or working on something else. Job: ship a real, production-ready, reviewable result before they look at it next. No babysitting. No technical questions. No slop.

The contract

User gave the vision. Treat as gospel. If something is missing, think harder, spawn research, write a test, read the codebase. Do not interrupt them with a question reasoning + agents can answer.
Ship to the current branch. No worktrees. No feature branches. Atomic commits straight on the branch you start on.
Never push. git push is forbidden. User reviews diff and pushes themselves.
Never block except for one of the hard stops in references/decision-rules.md.
Prod-ready or it does not ship. Real tests. Real edge cases. Real security review. Match references/prod-readiness.md.
No playwright. Real unit + integration tests against real services. See references/testing.md.
Always /frontend-design for any UI work. See references/ui-protocol.md.
No AI slop, no padding, no half-shipped state, no dead code, no feature flags hiding incomplete work.

If a rule above conflicts with anything else, the rule above wins.

When to invoke

Trigger when user says any of:

"autopilot" / "auto pilot" / "/autopilot"
"ship this for me"
"build this while I sleep"
"you handle the whole thing"
"take it and run"
"make it real, I will review when I am back"

The arc

0. Bootstrap        — capture vision, init run dir, snapshot state
1. Plan             — invoke /autoplan, get reviewed plan back
2. Research blitz   — parallel subagents (only when truly needed)
3. UX & design      — /frontend-design (skip if no UI)
4. TDD build        — failing test first, then minimum code, then refactor
5. Adversarial      — try to break it: edges, security, races, failures
6. Prod gate        — /code-review high, /cso, real prod checklist
7. Commit           — atomic commits to current branch (NO push)
8. Notify           — push + desktop banner + REVIEW_SUMMARY.md

Each phase is detailed in references/workflow.md. Read it once at start of run.

Boot sequence (run these in order, immediately)

Confirm session memory loaded — if you have a memory/context MCP tool configured, call its session-start fetch now. Skip if not configured.
Capture vision — re-read user's full message. Pull every requirement, constraint, brand cue, UX hint. Save verbatim to $AUTOPILOT_RUN_DIR/vision.md.
Run init script — bash ${SKILL_DIR}/scripts/init_run.sh "<short-slug>". Creates run dir, snapshots git state, creates RUN_LOG.md, DECISION_LOG.md. Exports AUTOPILOT_RUN_DIR path on stdout. Capture and export it.
Acknowledge briefly — one line to user. Example: Autopilot engaged. Will notify on review-ready or hard stop. Then go silent until done.
Begin phase 1.

Phase 1 — Plan

Invoke autoplan Skill via the Skill tool. Pass the vision as input.
/autoplan runs CEO + design + eng + devex review pipeline.
Save reviewed plan to $AUTOPILOT_RUN_DIR/plan.md.
If /autoplan surfaces a taste decision you can resolve with research/reasoning, resolve it autonomously and log to DECISION_LOG.md. Only escalate if it matches a hard-stop.

Phase 2 — Research blitz (only when needed)

Do NOT spawn research subagents reflexively. Spawn them only when the plan has a concrete unknown reasoning cannot close. Examples warranting research:

Library API surface or breaking change in last 18 months → context7
Existing implementation pattern for a tricky algorithm → Nia (github / tracer mode) if available, else web search
Research paper or technique → Nia papers if available, else web search
Latest world state (model versions, pricing, SDK changes) → web search MCP / live-fetch MCP
Prior decisions for this user → your memory MCP tool, if configured

When in doubt: spawn research, do not interrupt the user. See references/research-protocol.md.

Hard rule on parallelism: When 2+ independent research questions exist, fire in a single message with multiple Agent tool blocks. Never serial.

Phase 3 — UX & design

If work touches any UI surface:

Invoke frontend-design Skill. Read its full guidance.
If greenfield UI with no established design language, also invoke design-shotgun to explore variants. Pick the best one autonomously using heuristics in references/ui-protocol.md. Log pick + rationale to DECISION_LOG.md.
Generate design assets (HTML mocks or component sketches) into $AUTOPILOT_RUN_DIR/design/.

If no UI: skip phase 3 entirely.

Phase 4 — TDD build

Invoke superpowers:test-driven-development Skill and follow rigorously.

Build loop per feature slice:

Write failing test first (real test, real assertions, real fixtures, real services where possible).
Run test → confirm it fails for the right reason.
Write minimum code to pass.
Run all tests. Confirm green.
Refactor if needed. Re-run.
Atomic commit with real conventional-commit message.

Test stack rules: references/testing.md. Banned: Playwright. Allowed: vitest, bun test, jest, pytest, real-service integration tests, contract tests, fuzz tests where appropriate.

For server/full-stack behavior verification, use /browse (gstack) to take screenshots and assert page state — that is verification, not the test suite.

Phase 5 — Adversarial break

Once green, spawn a general-purpose Agent with the prompt: "You are an adversarial QA + security engineer. Your job is to break the code in $AUTOPILOT_RUN_DIR. Try every angle from [references/prod-readiness.md] and [references/security.md]. List every break + repro + suggested fix. Do not propose fixes — just find breaks."

Run it. Fix every legitimate break. Re-run all tests. Repeat until adversarial agent finds nothing real (cosmetic-only findings OK to defer with a logged decision).

Phase 6 — Prod gate

Run these in parallel as Agents and/or Skills:

code-review Skill at high effort across the diff
cso Skill (security audit, daily mode)
bash ${SKILL_DIR}/scripts/prod_check.sh — typecheck, lint, build, full test suite, dependency audit, secret scan, RLS verification (if Supabase/Postgres), bundle size, observability hooks present, health checks honest

Every red item gets fixed. Re-run. Only proceed when prod_check.sh exits zero.

Phase 7 — Commit

Atomic conventional commits on current branch.
NO Co-Authored-By. Ever.
NO git push. Ever.
One feature = several small commits is fine. One commit per feature is fine. One giant commit is not fine.
Write final REVIEW_SUMMARY.md to $AUTOPILOT_RUN_DIR/REVIEW_SUMMARY.md using assets/REVIEW_SUMMARY.md.template.

Phase 8 — Notify

bash ${SKILL_DIR}/scripts/notify.sh done writes a structured .notify-queue.json describing the message for each channel. The skill then invokes whatever notify MCP tools the user has configured (push, TTS, HUD). The local desktop banner fires directly via osascript (macOS) or notify-send (Linux).

Configured channels (substitute whichever MCP tools the user has):

push: a phone-reaching tool (e.g. Telegram bot, Pushover, ntfy, Discord webhook)
tts: a text-to-speech tool (optional)
hud: a wearable/HUD tool (optional)
desktop: always on, local banner

Notification body MUST link to REVIEW_SUMMARY.md and the commit range.

After notify: if a persistent-memory MCP tool is configured, write a one-paragraph summary of what shipped + run dir path. Otherwise skip.

Hard-stop protocol

If a hard-stop fires (see references/decision-rules.md):

Stop work immediately. Commit whatever

autopilot