SSkilltecabyclaudinhocode
Enviar skill
← Voltar para o catálogo

codex-media-gen

Desenvolvimento

When the user asks for an image, illustration, icon, banner, hero graphic, logo concept, mockup, or any visual asset that needs to be a real bitmap file (PNG/JPEG), use this skill. It generates real AI raster images via the user's locally-installed Codex CLI (gpt-image-2 model), accessed through Desktop Commander MCP on the user's Mac. Trigger this skill whenever the user wants a picture, even if

1estrelas
Ver no GitHub ↗Autor: capitansuatLicença: MIT

Codex Media Generation Bridge

Claude in claude.ai web/desktop cannot generate raster images natively. This skill closes that gap by shelling out to Codex CLI's built-in image_gen tool (gpt-image-2) and returning a real file on disk. The user experience should feel native: the user asks, an image appears, no meta-commentary about "I can't do this directly."

Environment expectations (read this first)

This skill runs the user's local Mac, not Claude's sandbox. Specifically:

  • The Codex CLI binary is at ~/.npm-global/bin/codex on the user's Mac, not in Claude's container.
  • Access to the user's Mac is provided by Desktop Commander MCP. If start_process, read_multiple_files, etc. (Desktop Commander tools) are not visible, they are deferred — load them with tool_search. Do not assume they are unavailable.
  • The user runs Claude Desktop (or a similar client) that has Desktop Commander MCP connected. If the user is on the web claude.ai client without MCP, this skill cannot run — but that is the only failure case.

Do NOT do these things, they are common mistakes:

  • Do NOT run which codex inside Claude's sandbox to "verify" Codex is installed. Codex is on the user's Mac, not the sandbox. The sandbox check will always fail and is irrelevant.
  • Do NOT say "this skill requires Desktop Commander which is unavailable in this environment" without first running tool_search("desktop commander"). The system prompt's tool list is partial; many tools are deferred and loaded on demand.
  • Do NOT decline by claiming "I cannot generate images in claude.ai." This skill is the capability. Saying you cannot do this defeats the entire purpose.
  • Do NOT offer SVG/mermaid as a fallback unless the user explicitly accepts that as a substitute. The user asked for an AI-generated raster image; deliver one.

First action (mandatory)

Before anything else, run tool_search with the query desktop commander to load the Desktop Commander tools (start_process, read_multiple_files, etc.) into context. This is mandatory because:

  1. The visible tool list at the top of the system prompt is incomplete — Desktop Commander tools are deferred.
  2. Without these tools loaded, the rest of this skill cannot execute.
  3. tool_search is free; failure to call it is the most common reason this skill is incorrectly declined.

Only after tool_search confirms the tools are loaded should you proceed to the Prerequisites section.

Trigger checklist

Use this skill when the user wants a real raster file. Do not use it when:

  • The user wants an SVG, mermaid diagram, ASCII art, or any vector/text-based visual Claude can author directly
  • The user wants Claude to analyze an existing image (vision already handles that)
  • The user explicitly asks for video (out of scope until extended)

Prerequisites (verified 2026-05-16)

These must be true on the machine. If any is missing, fix before invoking:

  1. Codex CLI installed: ~/.npm-global/bin/codex (v0.130.0+). codex is NOT on PATH; full path is mandatory.
  2. Codex auth: ~/.codex/auth.json exists with auth_mode: chatgpt (ChatGPT Plus or Pro plan covers image_gen included usage).
  3. Feature flag enabled in config: ~/.codex/config.toml must contain:
    [features]
    image_generation = true
    
    Without this, Codex silently falls back to writing a deterministic placeholder PNG with Python instead of calling the real model. The fallback file is small (~5 KB) and obviously fake (perfect geometric circle, no lighting); the real gpt-image-2 output is large (~800 KB+) and clearly AI-rendered. Always verify file size and visual quality after generation.

If image_generation = true is missing, append it once:

cat >> ~/.codex/config.toml <<'EOF'

[features]
image_generation = true
EOF

Standard invocation

The command Claude runs via Desktop Commander's start_process:

~/.npm-global/bin/codex exec \
  --skip-git-repo-check \
  -s workspace-write \
  --cd /tmp/codex-image-test \
  --enable image_generation \
  "$imagegen <english prompt>. Use the built-in image_gen tool. \
   After it generates, copy the result from \$CODEX_HOME/generated_images/ \
   to <absolute output path>."

Flag rationale:

  • --skip-git-repo-check: allow running outside a git repo (Codex refuses by default in non-repo dirs)
  • -s workspace-write: sandbox must allow disk writes. Default read-only blocks image output entirely.
  • --cd <dir>: working directory must be a writable, dedicated workspace (use /tmp/codex-image-test or a project subdir). The --cd path is added to the sandbox writable set.
  • --enable image_generation: belt-and-suspenders with the config flag. Both should be on.

Run it backgrounded with logging:

~/.npm-global/bin/codex exec ... > /tmp/codex-image-run.log 2>&1 &

Then poll the log and the output directory. A typical successful run takes 45–75 seconds end-to-end (model inference 20–40s + Codex tool overhead).

Output retrieval pattern

Codex writes the generated PNG to ~/.codex/generated_images/<session-uuid>/ig_<hash>.png, not the path you ask for in the prompt. You have two options:

Option A (preferred): Include explicit copy instruction in the prompt. Codex's reasoning step will then locate the newest ig_*.png and copy it to your target path. Example tail of the prompt:

After image_gen completes, find the newest file in $CODEX_HOME/generated_images/
and copy it to /absolute/output/path.png. Confirm the copy.

Option B: Let Codex generate, then Claude does the copy manually:

latest=$(find ~/.codex/generated_images -type f -name 'ig_*.png' -newer /tmp/codex-image-test -print | head -1)
cp "$latest" "$target_path"

Option A is cleaner and keeps the whole flow inside a single Codex turn.

Prompt-writing rules

  • English only. gpt-image-2 performs best in EN even when the chat is in Turkish.
  • One paragraph, dense. Subject + style + palette + composition + size hint, in that order.
  • Explicit tool directive. Add "Use the built-in image_gen tool" so the model doesn't dodge into Python-generated placeholders.
  • No copyrighted IP. Brand names, movie characters, celebrities, real artists' styles are refused or substituted poorly.
  • Size hint, not size guarantee. Even when you ask for 512×512, gpt-image-2 commonly returns 1024×1024 or 1254×1254. If exact dimensions matter, resize after generation with PIL (Image.thumbnail or Image.resize).

Output location policy

Decide silently from conversation context. Don't ask "where should I save this?" unless genuinely ambiguous.

ContextDestination
Ad-hoc request, no project context~/Pictures/codex-generated/<YYYY-MM-DD>-<slug>.png
User is in a git repo / project dir<repo>/assets/ (create if missing)
User named a pathExactly there
Note in a notes app (Obsidian, etc.)Adjacent attachments/ folder of that note

Always pass the full absolute target path into the prompt's copy instruction.

After generation (mandatory native display)

Generating the file is NOT the end. The user expects native, automatic display — the same way Claude shows a .docx file with both download link and inline preview without being asked. Do all of these in order, in the same response:

  1. Verify file size. A real gpt-image-2 output is typically 300 KB – 2 MB. Anything under 50 KB is suspicious (likely Python fallback or refusal placeholder). If suspicious, stop and diagnose instead of presenting.
  2. Resize if needed for inline preview. read_multiple_files on PNGs renders inline but has a ~1 MB decoded limit. If the file is over ~700 KB on disk, resize to 512×512 with PIL first and save as <original>-preview.png. Always preserve the full-resolution original.
  3. Show the image inline. Call read_multiple_files on the file (or the preview if resized). This is NOT optiona

Como adicionar

/plugin marketplace add capitansuat/codex-media-gen

O comando exato pode variar conforme o repositório. Confira o README no GitHub.

Comentários · Nenhum comentário

Entre para comentar. Entrar

  • Ainda não há comentários. Seja o primeiro.