Image Gen

Generates AI images via Gemini Imagen 4, OpenRouter (Gemini 2.5 Flash Image / GPT-5-image), Z.ai GLM-image, or OpenAI DALL-E 3. Applies anti-AI-slop prompt prefixes per style. Four modes: generate, edit, config, update.

Arguments: $ARGUMENTS

Mode Routing

Mode	Flow
generate	Phase 0 -> 1 -> 2 -> 3 -> 4
edit	Phase 0 -> 1 -> 2E -> 3 -> 4
config	Phase 0 -> C
update	Phase 0 -> U

CONTEXT-AWARE MODE DETECTION (CRITICAL): Mode is detected from BOTH explicit flags AND natural language context. Priority:

Explicit flags: --edit, --config, --update -> override everything

Context analysis of $ARGUMENTS text:

Edit signals: "edit this", "modify image", "change the", "add to image" + image path present -> edit

Config signals: "setup", "configure", "set key", "add token" -> config

Update signals: "check providers", "update models", "latest API" -> update

Everything else -> generate (this is 99% of cases)

Default: generate — just a prompt, generate the image

FAST PATH (99% case): When $ARGUMENTS is just a prompt text (no flags, no mode signals):

Skip Steps 3-6 in Phase 1 (count, service, style, output questions)

Use defaults: count=1, service=gemini, style=photo, output=.claude/reports/images/

Go straight to config table (Step 7) + confirmation (Step 8)

Only AskUserQuestion if API key is missing

AGENT INVOCATION: This skill can be called by agents (not just users). When called from an agent:

Treat all provided args as final — do NOT ask for confirmation of values already specified

Only AskUserQuestion for truly missing required values (prompt, API key)

The config table is still mandatory but confirmation step can be skipped if all params are explicit

API KEY PRIORITY (check in order, first found wins):

Explicit key in $ARGUMENTS text (user pasted inline)

.env in project root (source .env 2>/dev/null)

Shell environment variable

AskUserQuestion (redirect to Phase C)

MANDATORY: Before ANY API call, display the full resolved configuration table. User must see exactly what will be sent.

Phase 0: Parse Arguments

Step 1: Parse Flags

EXECUTE using Bash tool:

bash "${CLAUDE_SKILL_DIR}/scripts/parse-args.sh" $ARGUMENTS && echo "OK" || echo "FAILED"

Output: KEY=VALUE pairs. Store all values.

Also scan $ARGUMENTS raw text for inline values not captured by flags:

API key pasted in prompt text -> extract and use (overrides .env)

Service/style mentioned in free text -> treat as flags

Key	Default	Options
`PROMPT`	(empty)	Free-text image description
`MODE`	generate	generate, edit, config, update
`SERVICE`	gemini	gemini, openrouter, openai
`STYLE`	photo	photo, illustration, art
`COUNT`	1	1-10
`OUTPUT`	.claude/reports/images/	Directory path
`SIZE`	1024x1024	WxH format
`EDIT_IMAGE`	(empty)	Path to image for edit mode
`EDIT_INSTRUCTIONS`	(empty)	Edit instructions text
`PROMPT_MISSING`	false	true if no prompt in generate mode

STOP if FAILED -- check parse-args.sh output for error details.

Step 2: Route to Mode

Parsed MODE	Go to
generate	Phase 1
edit	Phase 1
config	Phase C
update	Phase U

Phase 1: Validate and Gather

Step 1: Load Environment and Check API Key

EXECUTE using Bash tool:

[ -f .env ] && set -a && . .env && set +a; bash "${CLAUDE_SKILL_DIR}/scripts/validate-key.sh" "SERVICE_HERE" && echo "OK" || echo "FAILED"

Replace SERVICE_HERE with the resolved SERVICE value.

If FAILED (INVALID): Redirect to Phase C (config mode). Tell the user: "No valid API key found for {SERVICE}. Let's configure it."

Step 2: Gather Missing Parameters

If MODE=generate and PROMPT_MISSING=true:

ASK using AskUserQuestion:

What image do you need? Describe the scene, subject, and mood.

Options:

"Describe your image (e.g., 'a cozy coffee shop at sunset with warm lighting')"
"Cancel"

Store response as PROMPT.

FAST PATH CHECK: If PROMPT is provided (not missing) AND no explicit --service/--style/--count/--output flags were given: → Skip Steps 3-6 entirely. Use defaults (count=1, service=gemini, style=photo, output=.claude/reports/images/). → Jump to Step 7 (config table). This is the 99% path — user just wants an image from their prompt.

Step 3: Confirm Image Count (skip on fast path)

ASK using AskUserQuestion:

How many images to generate?

Options:

"1 (default, fastest)"
"2-3 (compare variations)"
"4+ (batch generation, up to 10)"

Update COUNT with the number. Default: 1.

Provider limit: DALL-E 3 supports only 1 image per request. If SERVICE=openai and COUNT>1, generate COUNT sequential requests.

Step 4: Confirm Service (skip on fast path)

ASK using AskUserQuestion:

Which image generation service?

| Service | Model | Speed | Quality | Cost |
|---------|-------|-------|---------|------|
| openrouter | Gemini 2.5 Flash Image | Fast | High | ~$0.001/image |
| zai | GLM-image | Fast | **Very High** | ~$0.015/image |
| gemini | Imagen 4 | Fast | Very High | Paid plan required |
| openrouter-gpt5 | GPT-5 Image | Medium | **Highest** | ~$0.01/image |
| openai | DALL-E 3 | Medium | High | $0.04-0.12/image |

Options:

"openrouter (Gemini 2.5 Flash -- cheapest, default)"
"zai (GLM-image -- flagship Z.ai, high quality)"
"gemini (Imagen 4 -- high quality, paid plan)"
"openrouter-gpt5 (GPT-5 Image -- highest quality, ~$0.01/img)"
"openai (DALL-E 3 -- reliable, most expensive)"
"Keep current: {SERVICE}"

Update SERVICE if changed. Re-validate key if service changed.

Step 5: Confirm Style (skip on fast path)

ASK using AskUserQuestion:

Image style? This controls anti-slop prompt engineering.

- photo: Physically accurate photography -- real lighting, correct anatomy, natural materials
- illustration: Professional illustration -- clean line work, proper color theory, organic imperfections
- art: Consistent artistic medium -- unified brushwork, intentional composition, coherent color temperature

Options:

"photo (realistic photography)"
"illustration (clean vector/drawn style)"
"art (painterly/artistic medium)"
"Keep current: {STYLE}"

Step 6: Confirm Output Directory (skip on fast path)

ASK using AskUserQuestion:

Where to save generated images?

Options:

".claude/reports/images/ (default)"
"Current directory (.)"
"Custom path (type your preferred directory)"

Update OUTPUT with chosen path.

Step 7: Display Resolved Configuration (MANDATORY)

Output this table before proceeding. Do NOT skip this step.

=== Image Generation Config ===
| Parameter | Value |
|-----------|-------|
| Prompt | {PROMPT (first 80 chars)}... |
| Service | {SERVICE} ({model name}) |
| Style | {STYLE} |
| Count | {COUNT} |
| Size | {SIZE} |
| Output | {OUTPUT} |
| API Key | {first 8 chars}...{last 4 chars} |
| Est. Cost | {estimate based on service and count} |
================================

Step 8: Final Confirmation

ASK using AskUserQuestion:

Proceed with generation?

Options:

"Yes, generate"
"No, change settings"
"Cancel"

If "change settings" -> go back to Step 4. If "Cancel" -> STOP with message "Image generation cancelled."

Phase 2: Build Payload and Generate

Step 1: Load Anti-Slop Instructions

Read the anti-slop reference for the resolved STYLE: Read file: ${CLAUDE_SKILL_DIR}/references/anti-slop.md

Extract the section matching STYLE (photo, illustration, or art). Store as ANTI_SLOP_PREFIX.

Step 2: Build Enhanced Prompt

Combine anti-slop prefix with user prompt:

ENHANCED_PROMPT = ANTI_SLOP_PREFIX + "\n\n" + PROMPT

brewui:image-gen

How to add

Drop this on your repo README

Related skills

webapp-testing

brand-guidelines

frontend-design

mcp-builder

Get new Design e Frontend skills every Monday

Image Gen

Mode Routing

Phase 0: Parse Arguments

Step 1: Parse Flags

Step 2: Route to Mode

Phase 1: Validate and Gather

Step 1: Load Environment and Check API Key

Step 2: Gather Missing Parameters

Step 3: Confirm Image Count (skip on fast path)

Step 4: Confirm Service (skip on fast path)

Step 5: Confirm Style (skip on fast path)

Step 6: Confirm Output Directory (skip on fast path)

Step 7: Display Resolved Configuration (MANDATORY)

Step 8: Final Confirmation

Phase 2: Build Payload and Generate

Step 1: Load Anti-Slop Instructions

Step 2: Build Enhanced Prompt

Comments · No comments