Codex Media Generation Bridge
Claude in claude.ai web/desktop cannot generate raster images natively. This skill closes that gap by shelling out to Codex CLI's built-in image_gen tool (gpt-image-2) and returning a real file on disk. The user experience should feel native: the user asks, an image appears, no meta-commentary about "I can't do this directly."
Environment expectations (read this first)
This skill runs the user's local Mac, not Claude's sandbox. Specifically:
- The Codex CLI binary is at
~/.npm-global/bin/codexon the user's Mac, not in Claude's container. - Access to the user's Mac is provided by Desktop Commander MCP. If
start_process,read_multiple_files, etc. (Desktop Commander tools) are not visible, they are deferred — load them withtool_search. Do not assume they are unavailable. - The user runs Claude Desktop (or a similar client) that has Desktop Commander MCP connected. If the user is on the web claude.ai client without MCP, this skill cannot run — but that is the only failure case.
Do NOT do these things, they are common mistakes:
- Do NOT run
which codexinside Claude's sandbox to "verify" Codex is installed. Codex is on the user's Mac, not the sandbox. The sandbox check will always fail and is irrelevant. - Do NOT say "this skill requires Desktop Commander which is unavailable in this environment" without first running
tool_search("desktop commander"). The system prompt's tool list is partial; many tools are deferred and loaded on demand. - Do NOT decline by claiming "I cannot generate images in claude.ai." This skill is the capability. Saying you cannot do this defeats the entire purpose.
- Do NOT offer SVG/mermaid as a fallback unless the user explicitly accepts that as a substitute. The user asked for an AI-generated raster image; deliver one.
First action (mandatory)
Before anything else, run tool_search with the query desktop commander to load the Desktop Commander tools (start_process, read_multiple_files, etc.) into context. This is mandatory because:
- The visible tool list at the top of the system prompt is incomplete — Desktop Commander tools are deferred.
- Without these tools loaded, the rest of this skill cannot execute.
tool_searchis free; failure to call it is the most common reason this skill is incorrectly declined.
Only after tool_search confirms the tools are loaded should you proceed to the Prerequisites section.
Trigger checklist
Use this skill when the user wants a real raster file. Do not use it when:
- The user wants an SVG, mermaid diagram, ASCII art, or any vector/text-based visual Claude can author directly
- The user wants Claude to analyze an existing image (vision already handles that)
- The user explicitly asks for video (out of scope until extended)
Prerequisites (verified 2026-05-16)
These must be true on the machine. If any is missing, fix before invoking:
- Codex CLI installed:
~/.npm-global/bin/codex(v0.130.0+).codexis NOT on PATH; full path is mandatory. - Codex auth:
~/.codex/auth.jsonexists withauth_mode: chatgpt(ChatGPT Plus or Pro plan covers image_gen included usage). - Feature flag enabled in config:
~/.codex/config.tomlmust contain:
Without this, Codex silently falls back to writing a deterministic placeholder PNG with Python instead of calling the real model. The fallback file is small (~5 KB) and obviously fake (perfect geometric circle, no lighting); the real gpt-image-2 output is large (~800 KB+) and clearly AI-rendered. Always verify file size and visual quality after generation.[features] image_generation = true
If image_generation = true is missing, append it once:
cat >> ~/.codex/config.toml <<'EOF'
[features]
image_generation = true
EOF
Standard invocation
The command Claude runs via Desktop Commander's start_process:
~/.npm-global/bin/codex exec \
--skip-git-repo-check \
-s workspace-write \
--cd /tmp/codex-image-test \
--enable image_generation \
"$imagegen <english prompt>. Use the built-in image_gen tool. \
After it generates, copy the result from \$CODEX_HOME/generated_images/ \
to <absolute output path>."
Flag rationale:
--skip-git-repo-check: allow running outside a git repo (Codex refuses by default in non-repo dirs)-s workspace-write: sandbox must allow disk writes. Defaultread-onlyblocks image output entirely.--cd <dir>: working directory must be a writable, dedicated workspace (use/tmp/codex-image-testor a project subdir). The--cdpath is added to the sandbox writable set.--enable image_generation: belt-and-suspenders with the config flag. Both should be on.
Run it backgrounded with logging:
~/.npm-global/bin/codex exec ... > /tmp/codex-image-run.log 2>&1 &
Then poll the log and the output directory. A typical successful run takes 45–75 seconds end-to-end (model inference 20–40s + Codex tool overhead).
Output retrieval pattern
Codex writes the generated PNG to ~/.codex/generated_images/<session-uuid>/ig_<hash>.png, not the path you ask for in the prompt. You have two options:
Option A (preferred): Include explicit copy instruction in the prompt. Codex's reasoning step will then locate the newest ig_*.png and copy it to your target path. Example tail of the prompt:
After image_gen completes, find the newest file in $CODEX_HOME/generated_images/
and copy it to /absolute/output/path.png. Confirm the copy.
Option B: Let Codex generate, then Claude does the copy manually:
latest=$(find ~/.codex/generated_images -type f -name 'ig_*.png' -newer /tmp/codex-image-test -print | head -1)
cp "$latest" "$target_path"
Option A is cleaner and keeps the whole flow inside a single Codex turn.
Prompt-writing rules
- English only. gpt-image-2 performs best in EN even when the chat is in Turkish.
- One paragraph, dense. Subject + style + palette + composition + size hint, in that order.
- Explicit tool directive. Add "Use the built-in image_gen tool" so the model doesn't dodge into Python-generated placeholders.
- No copyrighted IP. Brand names, movie characters, celebrities, real artists' styles are refused or substituted poorly.
- Size hint, not size guarantee. Even when you ask for 512×512, gpt-image-2 commonly returns 1024×1024 or 1254×1254. If exact dimensions matter, resize after generation with PIL (
Image.thumbnailorImage.resize).
Output location policy
Decide silently from conversation context. Don't ask "where should I save this?" unless genuinely ambiguous.
| Context | Destination |
|---|---|
| Ad-hoc request, no project context | ~/Pictures/codex-generated/<YYYY-MM-DD>-<slug>.png |
| User is in a git repo / project dir | <repo>/assets/ (create if missing) |
| User named a path | Exactly there |
| Note in a notes app (Obsidian, etc.) | Adjacent attachments/ folder of that note |
Always pass the full absolute target path into the prompt's copy instruction.
After generation (mandatory native display)
Generating the file is NOT the end. The user expects native, automatic display — the same way Claude shows a .docx file with both download link and inline preview without being asked. Do all of these in order, in the same response:
- Verify file size. A real gpt-image-2 output is typically 300 KB – 2 MB. Anything under 50 KB is suspicious (likely Python fallback or refusal placeholder). If suspicious, stop and diagnose instead of presenting.
- Resize if needed for inline preview.
read_multiple_fileson PNGs renders inline but has a ~1 MB decoded limit. If the file is over ~700 KB on disk, resize to 512×512 with PIL first and save as<original>-preview.png. Always preserve the full-resolution original. - Show the image inline. Call
read_multiple_fileson the file (or the preview if resized). This is NOT optiona