Overview

Andrej Karpathy's key insight: "LLMs are exceptionally good at looping until they meet specific goals. Don't tell it what to do — give it success criteria and watch it go."

This skill converts vague imperative instructions ("make the login work") into declarative goals with concrete, testable success criteria. Agents with clear goals self-correct autonomously. Agents with vague goals produce vague results and require constant intervention.

When to Use

Before starting any multi-step task
When a task has been described imperatively ("do X, then Y, then Z")
When you're unsure how you'll know when you're "done"
For long-running or complex implementations

Process

Step 1: Extract the Underlying Goal

Read the full request.
Ask: What is the user trying to achieve, not just what they asked for?
Write the goal as: "The task is complete when [observable, verifiable outcome]."

Example transformation:

❌ Imperative: "Add error handling to the API."
✅ Goal: "The task is complete when: all API endpoints return structured error responses for 4xx/5xx cases, error responses include a code, message, and requestId, and the existing tests pass."

Verify: The goal statement is observable and testable by a third party.

Step 2: Define Success Criteria

List 3–7 specific, binary success criteria:

Success when:
- [ ] All existing tests pass
- [ ] New behavior X is demonstrated by test Y
- [ ] No regressions in file Z
- [ ] Manual check: [describe what to look for]

Each criterion must be falsifiable — you can clearly state when it passes or fails.

Verify: Every criterion can be checked without the original author.

Step 3: Define the Execution Plan

Break the goal into ordered steps, each with its own verify check:

1. [Step] → verify: [command or check]
2. [Step] → verify: [command or check]
3. [Step] → verify: [command or check]

Identify the first failure mode — what's most likely to go wrong? Plan for it.

Verify: The plan is readable and each step is independently verifiable.

Step 4: Execute and Loop

Follow the plan step-by-step.
At each verify checkpoint — actually run the check. Do not skip.
If a check fails: diagnose, fix, re-verify. Do not proceed past a failing check.
When all checks pass: report completion with evidence.

Common Rationalizations (and Rebuttals)

Excuse	Rebuttal
"The goal is obvious"	Obvious goals still need explicit success criteria. What's obvious to you is ambiguous to an agent.
"I'll know when it's done"	That's not a verifiable criterion. Write it down.
"The tests will tell me"	Which tests? What do they cover? What don't they cover?
"It's too simple for a plan"	Simple tasks rarely fail. Complex tasks without a plan always do.

Red Flags

The task is described as a to-do list, not a goal
You don't know how you'll verify completion
You're 80% through and realize the original framing was wrong
"It seems to work" is your verification strategy

Verification

Goal is stated as an observable, testable outcome
Success criteria are listed and binary (pass/fail)
Execution plan has verify steps for each phase
All verify checks have been run (not just assumed passing)
Evidence of completion is documented

goal-driven-execution

How to add

Drop this on your repo README

Related skills

template-skill

slack-gif-creator

baoyu-compress-image

zzz-one-dragon-player

Get new Outros skills every Monday