Image Gen
Generates AI images via Gemini Imagen 4, OpenRouter (Gemini 2.5 Flash Image / GPT-5-image), Z.ai GLM-image, or OpenAI DALL-E 3. Applies anti-AI-slop prompt prefixes per style. Four modes: generate, edit, config, update.
Arguments: $ARGUMENTS
Mode Routing
| Mode | Flow |
|---|---|
| generate | Phase 0 -> 1 -> 2 -> 3 -> 4 |
| edit | Phase 0 -> 1 -> 2E -> 3 -> 4 |
| config | Phase 0 -> C |
| update | Phase 0 -> U |
CONTEXT-AWARE MODE DETECTION (CRITICAL): Mode is detected from BOTH explicit flags AND natural language context. Priority:
- Explicit flags:
--edit,--config,--update-> override everything- Context analysis of
$ARGUMENTStext:
- Edit signals: "edit this", "modify image", "change the", "add to image" + image path present -> edit
- Config signals: "setup", "configure", "set key", "add token" -> config
- Update signals: "check providers", "update models", "latest API" -> update
- Everything else -> generate (this is 99% of cases)
- Default: generate — just a prompt, generate the image
FAST PATH (99% case): When
$ARGUMENTSis just a prompt text (no flags, no mode signals):
- Skip Steps 3-6 in Phase 1 (count, service, style, output questions)
- Use defaults: count=1, service=gemini, style=photo, output=.claude/reports/images/
- Go straight to config table (Step 7) + confirmation (Step 8)
- Only AskUserQuestion if API key is missing
AGENT INVOCATION: This skill can be called by agents (not just users). When called from an agent:
- Treat all provided args as final — do NOT ask for confirmation of values already specified
- Only AskUserQuestion for truly missing required values (prompt, API key)
- The config table is still mandatory but confirmation step can be skipped if all params are explicit
API KEY PRIORITY (check in order, first found wins):
- Explicit key in
$ARGUMENTStext (user pasted inline).envin project root (source .env 2>/dev/null)- Shell environment variable
- AskUserQuestion (redirect to Phase C)
MANDATORY: Before ANY API call, display the full resolved configuration table. User must see exactly what will be sent.
Phase 0: Parse Arguments
Step 1: Parse Flags
EXECUTE using Bash tool:
bash "${CLAUDE_SKILL_DIR}/scripts/parse-args.sh" $ARGUMENTS && echo "OK" || echo "FAILED"
Output: KEY=VALUE pairs. Store all values.
Also scan
$ARGUMENTSraw text for inline values not captured by flags:
- API key pasted in prompt text -> extract and use (overrides
.env)- Service/style mentioned in free text -> treat as flags
| Key | Default | Options |
|---|---|---|
PROMPT | (empty) | Free-text image description |
MODE | generate | generate, edit, config, update |
SERVICE | gemini | gemini, openrouter, openai |
STYLE | photo | photo, illustration, art |
COUNT | 1 | 1-10 |
OUTPUT | .claude/reports/images/ | Directory path |
SIZE | 1024x1024 | WxH format |
EDIT_IMAGE | (empty) | Path to image for edit mode |
EDIT_INSTRUCTIONS | (empty) | Edit instructions text |
PROMPT_MISSING | false | true if no prompt in generate mode |
STOP if FAILED -- check parse-args.sh output for error details.
Step 2: Route to Mode
| Parsed MODE | Go to |
|---|---|
| generate | Phase 1 |
| edit | Phase 1 |
| config | Phase C |
| update | Phase U |
Phase 1: Validate and Gather
Step 1: Load Environment and Check API Key
EXECUTE using Bash tool:
[ -f .env ] && set -a && . .env && set +a; bash "${CLAUDE_SKILL_DIR}/scripts/validate-key.sh" "SERVICE_HERE" && echo "OK" || echo "FAILED"
Replace SERVICE_HERE with the resolved SERVICE value.
If FAILED (INVALID): Redirect to Phase C (config mode). Tell the user: "No valid API key found for {SERVICE}. Let's configure it."
Step 2: Gather Missing Parameters
If MODE=generate and PROMPT_MISSING=true:
ASK using AskUserQuestion:
What image do you need? Describe the scene, subject, and mood.
Options:
- "Describe your image (e.g., 'a cozy coffee shop at sunset with warm lighting')"
- "Cancel"
Store response as PROMPT.
FAST PATH CHECK: If PROMPT is provided (not missing) AND no explicit --service/--style/--count/--output flags were given: → Skip Steps 3-6 entirely. Use defaults (count=1, service=gemini, style=photo, output=.claude/reports/images/). → Jump to Step 7 (config table). This is the 99% path — user just wants an image from their prompt.
Step 3: Confirm Image Count (skip on fast path)
ASK using AskUserQuestion:
How many images to generate?
Options:
- "1 (default, fastest)"
- "2-3 (compare variations)"
- "4+ (batch generation, up to 10)"
Update COUNT with the number. Default: 1.
Provider limit: DALL-E 3 supports only 1 image per request. If SERVICE=openai and COUNT>1, generate COUNT sequential requests.
Step 4: Confirm Service (skip on fast path)
ASK using AskUserQuestion:
Which image generation service?
| Service | Model | Speed | Quality | Cost |
|---------|-------|-------|---------|------|
| openrouter | Gemini 2.5 Flash Image | Fast | High | ~$0.001/image |
| zai | GLM-image | Fast | **Very High** | ~$0.015/image |
| gemini | Imagen 4 | Fast | Very High | Paid plan required |
| openrouter-gpt5 | GPT-5 Image | Medium | **Highest** | ~$0.01/image |
| openai | DALL-E 3 | Medium | High | $0.04-0.12/image |
Options:
- "openrouter (Gemini 2.5 Flash -- cheapest, default)"
- "zai (GLM-image -- flagship Z.ai, high quality)"
- "gemini (Imagen 4 -- high quality, paid plan)"
- "openrouter-gpt5 (GPT-5 Image -- highest quality, ~$0.01/img)"
- "openai (DALL-E 3 -- reliable, most expensive)"
- "Keep current: {SERVICE}"
Update SERVICE if changed. Re-validate key if service changed.
Step 5: Confirm Style (skip on fast path)
ASK using AskUserQuestion:
Image style? This controls anti-slop prompt engineering.
- photo: Physically accurate photography -- real lighting, correct anatomy, natural materials
- illustration: Professional illustration -- clean line work, proper color theory, organic imperfections
- art: Consistent artistic medium -- unified brushwork, intentional composition, coherent color temperature
Options:
- "photo (realistic photography)"
- "illustration (clean vector/drawn style)"
- "art (painterly/artistic medium)"
- "Keep current: {STYLE}"
Step 6: Confirm Output Directory (skip on fast path)
ASK using AskUserQuestion:
Where to save generated images?
Options:
- ".claude/reports/images/ (default)"
- "Current directory (.)"
- "Custom path (type your preferred directory)"
Update OUTPUT with chosen path.
Step 7: Display Resolved Configuration (MANDATORY)
Output this table before proceeding. Do NOT skip this step.
=== Image Generation Config ===
| Parameter | Value |
|-----------|-------|
| Prompt | {PROMPT (first 80 chars)}... |
| Service | {SERVICE} ({model name}) |
| Style | {STYLE} |
| Count | {COUNT} |
| Size | {SIZE} |
| Output | {OUTPUT} |
| API Key | {first 8 chars}...{last 4 chars} |
| Est. Cost | {estimate based on service and count} |
================================
Step 8: Final Confirmation
ASK using AskUserQuestion:
Proceed with generation?
Options:
- "Yes, generate"
- "No, change settings"
- "Cancel"
If "change settings" -> go back to Step 4. If "Cancel" -> STOP with message "Image generation cancelled."
Phase 2: Build Payload and Generate
Step 1: Load Anti-Slop Instructions
Read the anti-slop reference for the resolved STYLE:
Read file: ${CLAUDE_SKILL_DIR}/references/anti-slop.md
Extract the section matching STYLE (photo, illustration, or art). Store as ANTI_SLOP_PREFIX.
Step 2: Build Enhanced Prompt
Combine anti-slop prefix with user prompt:
ENHANCED_PROMPT = ANTI_SLOP_PREFIX + "\n\n" + PROMPT