The job: turn workflows that need a human in the inner loop into workflows the LLM closes itself. The two halves are removing the trigger gate and opening observability.
Surface the gate first
Before proposing changes, name the trigger gate explicitly:
- What action requires a human right now? (button click, screenshot inspection, terminal interaction, web-form submission)
- What signal does the human provide that the LLM cannot get on its own? (visual confirmation, copy-paste, secret value, eyeball verdict)
- Where does the result go? (chat memory, screenshot, mental note)
Most loops have one or two gates that, removed, collapse the cycle to seconds. Pick the smallest gate first.
Structural fixes
Web-UI trigger → CLI trigger
If the workflow is gated by clicking in a web app, find or build the equivalent CLI command. Webhooks, REST endpoints, gh / aws / gcloud CLI subcommands, internal just targets — anything programmatically invokable. The LLM can then loop without leaving its session.
Stdout-only output → file-based output
If the workflow's result lives in chat memory or a screenshot, redirect to a file the LLM can read back: structured JSON dumps, markdown reports, append-only logs with addressable offsets. Why: file outputs survive compaction, support diff, and are inspectable by future sessions without replaying context.
Dashboards → structured logs
If verification requires eyeballing a Grafana / Datadog dashboard, surface the same metrics through a CLI query (PromQL, Datadog API, log aggregation tail). Anything that produces a pass/fail/warn verdict the LLM can read.
Eyeball verdicts → contract assertions
If the human's role is "looks right to me", encode the criterion as a test, schema, or assertion. The contract becomes the loop's done-criterion (pair with strict-validation-setup for the bootstrap of those gates).
Trap-or-abandon decision
After the structural fixes above, some steps still cannot be made autonomous — they involve genuine human judgment, external compliance, or capability the LLM lacks. For each remaining gate, apply this rule:
- Trap — if the step can be wrapped in a verification-and-iteration loop where the LLM proposes, the human approves once, and the LLM iterates until the contract passes, keep it. The human is at the outer loop, not the inner.
- Abandon — if a step requires the human in the inner loop and resists wrapping (e.g., new SOC2 review per iteration, real-time customer chat, hardware-mediated test), do not babysit. Either remove the step from the LLM's loop entirely (escalate to the human as a discrete handoff) or improve the harness so the step disappears (e.g., automate the SOC2 documentation pipeline).
Naming the rule: babysitting an unloopable step is the failure mode this skill exists to prevent. Pre-existing chat consensus: "what can't be looped — abandon firmly and improve the harness."
What this skill does not do
- It does not author project rules — defer to
initfor AGENTS.md. - It does not bootstrap strict-mode validation gates — defer to
strict-validation-setup. - It does not pick the test framework — defer to
test-drivenor the language's idiomatic tester.
Posture
Surgical, not architectural. Remove one gate at a time. After each fix, re-evaluate whether the loop now closes — sometimes one trigger removal is enough. Resist the temptation to redesign the whole system.