Simmer
Iterative refinement loop — take an artifact (single file or workspace) and hone it repeatedly against user-defined criteria until it's as good as it can get.
Related skills (test-kitchen family):
test-kitchen:omakase-off— don't know what you want → parallel designs → react → picktest-kitchen:cookoff— know what you want, it's code → parallel implementations → fixed criteria → steal the bestsimmer— know what you want, it's anything → user-defined criteria → iterate until good
Flow
"Simmer this" / "Refine this" / "Optimize this pipeline"
↓
┌─────────────────────────────────────┐
│ SETUP (identify + criteria) │
│ Load simmer-setup subskill │
│ │
│ Output: artifact, rubric, N iters, │
│ evaluator (optional), │
│ background (optional) │
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ LOOP (default 3 iterations) │
│ │
│ Each iteration: │
│ 1. Dispatch generator subagent │
│ 2. Run evaluator (if present) │
│ 3. Dispatch judge subagent │
│ 4. Load reflect subskill │
│ │
│ Generator gets: candidate + ASI │
│ + background │
│ Judge gets: candidate + rubric │
│ + evaluator output (if any) │
│ Reflect gets: full score history │
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ OUTPUT │
│ Best candidate → result file │
│ Score trajectory displayed │
└─────────────────────────────────────┘
When to Use
Trigger when user wants iterative refinement of any kind:
- "Simmer this", "refine this", "hone this", "iterate on this"
- "Make this better", "improve this over a few rounds"
- "Polish this", "tighten this up"
- "Optimize this pipeline", "find the best model for this task"
- "Tune this configuration", "improve these prompts against this test suite"
- Any request to iteratively improve an artifact or workspace
Judge mode is auto-selected by setup based on problem complexity:
| Condition | JUDGE_MODE |
|---|---|
| text/creative, ≤2 criteria, short artifact (email, tweet, tagline) | single |
| text/creative, 3 criteria or long/complex artifact | board |
| code/testable (any) | board |
| pipeline/engineering (any) | board |
| User says "with a single judge" | single (override) |
| User says "with a judge board" or "with a panel" | board (override) |
Plateau upgrade: If the loop started with a single judge and detects a plateau (3 iterations without improvement), offer: "Scores have plateaued. Switch to judge board for deeper diagnosis?" If the user accepts, switch to JUDGE_MODE: board for remaining iterations.
Not simmer: If the artifact is code and the user wants parallel implementations, use cookoff instead.
Orchestration
Announce: "I'm using the simmer skill to set up iterative refinement."
Track progress (TodoWrite if available, otherwise inline):
- Setup — identify artifact, elicit criteria, determine evaluation method
- Refinement loop (N iterations)
- Output best version with score trajectory
Phase 1: Setup
Invoke simmer:simmer-setup.
Do not attempt to identify the artifact or ask about criteria yourself — that is the setup subskill's job.
Shortcut: If the user (or calling system) has already provided artifact, criteria (each with at least one sentence describing what a high score looks like), iteration count, mode, and optionally evaluator/background, skip the setup subskill entirely. Construct the setup brief directly and proceed to Phase 2.
Setup returns a brief:
ARTIFACT: [content, file path, or directory path]
ARTIFACT_TYPE: [single-file | workspace]
CRITERIA:
- [criterion 1]: [what better looks like]
- [criterion 2]: [what better looks like]
- [criterion 3]: [what better looks like]
PRIMARY: [criterion name — omit if equally weighted]
EVALUATOR: [command to run — omit for judge-only mode]
BACKGROUND: [constraints, available resources, domain knowledge — omit if not needed]
OUTPUT_CONTRACT: [valid output format description — omit for text/creative]
VALIDATION_COMMAND: [quick check command — omit if no cheap validation exists]
SEARCH_SPACE: [what's in scope to explore — omit if unconstrained]
JUDGE_MODE: [single | board — auto-selected by setup based on complexity. User can override]
JUDGE_PANEL: [optional custom judge definitions — omit to use defaults for problem class]
ITERATIONS: [N]
MODE: [seedless | from-file | from-paste | from-workspace]
OUTPUT_DIR: [path, default: docs/simmer]
Phase 2: Refinement Loop
For single-file mode:
mkdir -p {OUTPUT_DIR}
For workspace mode:
# Create initial commit to snapshot the seed state
cd {ARTIFACT}
git add -A && git commit -m "simmer: iteration 0 — seed state"
Iteration counting:
"N iterations" means N generate-judge-reflect cycles AFTER the initial seed judgment. The seed judgment is iteration 0 (not counted toward N). So ITERATIONS: 3 means:
- Iteration 0: Judge the seed (no generator)
- Iteration 1: Generate → Judge → Reflect
- Iteration 2: Generate → Judge → Reflect
- Iteration 3: Generate → Judge → Reflect
- Total: 3 generation passes + 1 seed judgment = 4 judge rounds
For seedless mode: iteration 1 generates the initial candidate AND judges it. ITERATIONS: 3 means 3 generation passes total.
Iteration 0 (seed):
Single-file mode:
- Write the seed artifact to
{OUTPUT_DIR}/iteration-0-candidate.md - If seedless: dispatch generator subagent to produce initial candidate from description + criteria, then judge it
- If from-file or from-paste: the seed IS the starting artifact — judge it directly (no generator)
Workspace mode:
- The seed is the current state of the workspace directory
- If from-workspace: judge the current state directly (no generator)
- If seedless: dispatch generator to scaffold the initial workspace, then judge it
Each iteration:
Step 1: Generator (subagent)
Invoke simmer:simmer-generator as a subagent.
Single-file subagent prompt:
You are the generator in a simmer refinement loop.
Invoke the skill: simmer:simmer-generator
ITERATION: [N]
ARTIFACT_TYPE: single-file
CRITERIA:
[rubric from setup]
CURRENT CANDIDATE:
[full text of current best candidate]
JUDGE FEEDBACK (ASI from previous round):
[ASI text, or "First iteration — generate initial candidate" if seedless iteration 1]
Write your improved candidate to: {OUTPUT_DIR}/iteration-[N]-candidate.md
(or appropriate extension matching artifact type)
Report: what specifically changed and why (2-3 sentences).
Workspace subagent prompt:
You are the generator in a simmer refinement loop.
Invoke the skill: simmer:simmer-generator
ITERATION: [N]
ARTIFACT_TYPE: workspace
WORKSPACE: [directory path]
CRITERIA:
[rubric from setup]
BACKGROUND:
[constraints, available resources, domain knowledge from setup]
OUTPUT_CONTRACT:
[valid output format — omit if not specified in setup]
VALIDATION_COMMAND:
[quick check command — omit if not specified in setup]
SEARCH_SPACE:
[what's in scope to explore — omit if not specified in setup]
JUDGE FEEDBACK (ASI from previous round):
[ASI text — may describe coordinated changes across multiple files]
EXPLORATION STATUS:
[from reflect: what's been tried vs untried — omit on iteration 1 or if no search space]
Make your changes directly in the workspace directory.
You may edit multiple files in a single iteration when the ASI calls for coordinated changes.
If making infrastructure changes, run VALIDATION_COMMAND (if available) before reporting success.
Report: what specifically changed and why (2-3 sentences).
Step 2: Run Evaluator (if present)
If the setup brief includes an EVALUATOR command:
``