Image Generation (AI SDK)
Official API-based image generation. Supports OpenAI GPT Image 2, Azure OpenAI, Google, OpenRouter, DashScope (阿里通义万象), Z.AI GLM-Image, MiniMax, Jimeng (即梦), Seedream (豆包) and Replicate.
User Input Tools
When this skill prompts the user, follow this tool-selection rule (priority order):
- Prefer built-in user-input tools exposed by the current agent runtime — e.g.,
AskUserQuestion,request_user_input,clarify,ask_user, or any equivalent. - Fallback: if no such tool exists, emit a numbered plain-text message and ask the user to reply with the chosen number/answer for each question.
- Batching: if the tool supports multiple questions per call, combine all applicable questions into a single call; if only single-question, ask them one at a time in priority order.
Concrete AskUserQuestion references below are examples — substitute the local equivalent in other runtimes.
Script Directory
{baseDir} = this SKILL.md's directory. All scripts/... paths below are relative to {baseDir}. Main script: {baseDir}/scripts/main.ts. Batch payload helper: {baseDir}/scripts/build-batch.ts. Resolve ${BUN_X}: prefer bun; else npx -y bun; else suggest brew install oven-sh/bun/bun.
Step 0: Load Preferences ⛔ BLOCKING
This step MUST complete before any image generation — generation is blocked until EXTEND.md exists.
Check these paths in order; first hit wins:
| Path | Scope |
|---|---|
.baoyu-skills/baoyu-image-gen/EXTEND.md | Project |
${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-image-gen/EXTEND.md | XDG |
$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md | User home |
- Found → load, parse, apply. If
default_model.[provider]is null → ask model only. - Not found → run first-time setup (
references/config/first-time-setup.md) using AskUserQuestion to collect provider + model + quality + save location. Save EXTEND.md, then continue. Do not generate images before this completes.
Legacy compatibility: if .baoyu-skills/baoyu-imagine/EXTEND.md exists and the new path doesn't, the runtime renames it to baoyu-image-gen. If both exist, the runtime leaves them alone and uses the new path.
EXTEND.md keys: default provider, default quality, default aspect ratio, default image size, OpenAI image API dialect, default models, batch worker cap, provider-specific batch limits. Schema: references/config/preferences-schema.md.
Usage
Minimum working examples — see references/usage-examples.md for the full set including per-provider invocations and batch mode.
Identity-preserving reference prompts
When the user wants a real person/character/object preserved from reference images, do not replace the reference with a long generic description. Prefer short, hard identity-preservation language:
- "Use the person/object in the reference image(s) as the same identity. Do not redesign it or create a similar-looking new subject."
- "Only change scene, clothing, pose, lighting, rendering style, and composition. Keep the face/proportions/hair/key accessories/overall identity from the references."
- If using multiple references, state that they are the same subject and should jointly define identity.
Pitfall: long descriptions like "young East Asian woman, oval face, clear eyes..." can cause the model to synthesize a new person matching the description instead of preserving the referenced person.
# Basic
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image cat.png
# With aspect ratio and high quality
${BUN_X} {baseDir}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9 --quality 2k
# Prompt from files
${BUN_X} {baseDir}/scripts/main.ts --promptfiles system.md content.md --image out.png
# With reference image
${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png
# Specific provider
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider dashscope --model qwen-image-2.0-pro
# OpenAI GPT Image 2
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider openai --model gpt-image-2
# Codex CLI (uses logged-in Codex subscription — no OPENAI_API_KEY required; requires `codex` on PATH)
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider codex-cli --ar 16:9
# Batch mode
${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json --jobs 4
# Build a batch file from outline.md + prompts/ (e.g. baoyu-article-illustrator output)
${BUN_X} {baseDir}/scripts/build-batch.ts --outline outline.md --prompts prompts --output batch.json --images-dir attachments
${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json --jobs 4
Reference-Image Identity Preservation
When the user wants a person/object preserved from reference images:
- Prefer a small curated set of existing source references (usually 2–4) over many images; large multi-megabyte refs can destabilize streaming providers.
- Make the prompt say the references are the same subject and the output must use that identity. Avoid long generic facial-feature descriptions that can cause the model to synthesize a new similar-looking person.
- Do not use newly generated outputs as references unless the user explicitly asks; generated refs compound drift.
- If results become too polished or influencer-like, reduce stylized refs and add explicit anti-beautification constraints (no face slimming, eye enlargement, heavy makeup, commercial travel shoot, over-smoothing).
- If the subject should look younger/older, preserve the face and express age through clothing, posture, scene, and styling; do not ask the model to change facial identity.
Options
| Option | Description |
|---|---|
--prompt <text>, -p | Prompt text |
--promptfiles <files...> | Read prompt from files (concatenated) |
--image <path> | Output image path (required in single-image mode) |
--batchfile <path> | JSON batch file for multi-image generation |
--jobs <count> | Worker count for batch mode (default: auto, max from config, built-in default 10) |
--provider google|openai|azure|openrouter|dashscope|zai|minimax|jimeng|seedream|replicate|codex-cli | Force provider (default: auto-detect; codex-cli is never auto-selected — must be pinned via CLI or EXTEND.md) |
--model <id>, -m | Model ID — see provider references for defaults and allowed values |
--ar <ratio> | Aspect ratio (16:9, 1:1, 4:3, …) |
--size <WxH> | Explicit size (e.g., 1024x1024; for gpt-image-2, width/height must be multiples of 16, max edge 3840px, ratio no wider than 3:1) |
--quality normal|2k | Quality preset (default: 2k) |
--imageSize 1K|2K|4K | Image size for Google/OpenRouter (default: from quality) |
--imageApiDialect openai-native|ratio-metadata | OpenAI-compatible endpoint dialect — use ratio-metadata for gateways that expect aspect-ratio size plus metadata.resolution |
--ref <files...> | Reference images. Supported by Google multimodal, OpenAI GPT Image edits, Azure OpenAI edits (PNG/JPG only), OpenRouter multimodal models, Replicate supported families, MiniMax subject-reference, Seedream 5.0/4.5/4.0, DashScope wan2.7-image-pro/wan2.7-image. Not supported by Jimeng, Seedream 3.0, SeedEdit 3.0, or any DashScope model outside the wan2.7-image* family |
--n <count> | Number of images. Replicate requires --n 1 (single-output save semantics) |
--json | JSON output |
Environment Variables
| Variable | Description |
|---|---|
OPENAI_API_KEY | OpenAI API key |
AZURE_OPENAI_API_KEY | Azure OpenAI API key |
OPENROUTER_API_KEY | OpenRouter API key |
GOOGLE_API_KEY | Google API key |
DASHSCOPE_API_KEY | DashScope API key |
ZAI_API_KEY (alias BIGMODEL_API_KEY) | Z.AI API key |
MINIMAX_API_KEY | MiniMax API key |
REPLICATE_API_TOKEN | Replicate API token |
| `JIMENG_ACCESS_KE |