GPT Atelier
OpenAI GPT Image generation and editing. Wraps both the Image API (one-shot) and Responses API (multi-turn conversational) with 6 scripts covering generate, edit, compose, converse, stream, and test workflows.
Prerequisite: OPENAI_API_KEY environment variable.
Models
| Flag | Model | Strengths |
|---|---|---|
| (default) | gpt-image-2 | Reasoning-based, ~99% text rendering, up to 8 consistent images, 4K, streaming |
--fast | gpt-image-1.5 | Region-aware editing, 4x faster, cheaper |
--mini | gpt-image-1-mini | Cheapest ($0.006/image low quality) |
Quick Start
# Test connectivity
python3 scripts/test_connection.py --check-models
# Generate an image
python3 scripts/generate_image.py "A Minoan bull-leaper under golden light"
# Compare models side-by-side (HTML page)
python3 scripts/compare_models.py "A bronze seal stamp in Minoan style" --all --open
# Edit with mask
python3 scripts/edit_image.py "Replace the sky with a dramatic sunset" photo.png --mask sky_mask.png
# Compose from references
python3 scripts/compose_images.py "Create a gift basket containing these items" item1.png item2.png item3.png
# Multi-turn editing session
python3 scripts/converse_image.py
# Streaming with partial images
python3 scripts/stream_image.py "An ancient fresco being restored" --partials 3
Image API scripts share: --output DIR, --filename NAME, --quality low|medium|high, --format png|jpeg|webp, --fast, --mini. converse_image.py uses --orchestrator instead of --fast/--mini.
Core Workflows
1. Text-to-Image Generation
python3 scripts/generate_image.py "prompt" [options]
| Option | Default | Description |
|---|---|---|
--size | auto | WxH or preset: square, landscape, portrait, wide, 2k, 4k, 4k-portrait |
--quality | high | low, medium, high, auto |
--n | 1 | Number of images (1-8) |
--format | png | png, jpeg (faster), webp |
--compression | — | 0-100 for jpeg/webp |
--background | — | opaque, transparent, auto |
--moderation | auto | auto, low |
# Product photography
python3 scripts/generate_image.py \
"High-end product photography of luxury watch on black marble, dramatic key light, f/5.6, commercial quality" \
--size square --quality high
# Budget thumbnails
python3 scripts/generate_image.py "Quick sketch of a coffee cup" --mini --quality low
# Multiple variations
python3 scripts/generate_image.py "Logo design for a tech startup" --n 4 --size square
2. Image Editing
python3 scripts/edit_image.py "instruction" input.png [options]
Additional options: --mask, --images (extra references).
Auto-converts B&W masks to RGBA (requires Pillow: pip install Pillow).
# Region-aware edit with mask
python3 scripts/edit_image.py "Add a flamingo to the pool" lounge.png --mask pool_mask.png
# Full-image restyle
python3 scripts/edit_image.py "Convert to watercolor painting style" photo.jpg
# Edit with reference images
python3 scripts/edit_image.py "Replace the car with this bicycle" street.png --images bicycle.png
3. Multi-Reference Composition
python3 scripts/compose_images.py "instruction" img1.png img2.png [img3.png ...] [options]
Requires 2+ reference images. The model creates a new image incorporating all references.
python3 scripts/compose_images.py \
"Create a mood board combining these design elements" \
texture.png palette.png sketch.png --size landscape
4. Multi-Turn Conversational Editing (Responses API)
python3 scripts/converse_image.py [prompt] [options]
Without a prompt, enters interactive REPL. Maintains conversation state via previous_response_id.
# Interactive session
python3 scripts/converse_image.py --auto-save
> A cyberpunk street scene at night
> Now add neon signs with Japanese text
> Make it rain and add reflections
> /save final_scene
# Single-shot
python3 scripts/converse_image.py "Design a coffee brand logo" --output ./logos
Interactive commands: /save, /action auto|generate|edit, /model, /clear, /history, /help, /quit.
Attach images with @path: @logo.png Add this logo to the top-right corner.
Additional options:
| Flag | Orchestrator | Tradeoff |
|---|---|---|
| (default) | gpt-5.4 | Standard quality, fastest |
--thinking | gpt-5.4-thinking | Better composition for complex scenes, slower |
--pro | gpt-5.4-pro | Highest quality, most expensive |
--orchestrator MODEL | any | Manual override |
5. Model Comparison (HTML Page)
python3 scripts/compare_models.py "A Minoan bull-leaper" --open
python3 scripts/compare_models.py "prompt" --all --open
python3 scripts/compare_models.py "" --list-models
Generates the same prompt across multiple models and outputs a dark-themed HTML comparison page. See references/compare-models-reference.md for full options.
6. Streaming with Partial Images
python3 scripts/stream_image.py "prompt" --partials N [options]
Outputs partial images as they generate, then the final image.
# Stream with 3 progressively sharper partials
python3 scripts/stream_image.py "A detailed architectural drawing" --partials 3 --save-partials
Additional option: --save-partials to save intermediate images.
Prompting Quick-Hits
- Lead with scene/style, not subject. First words carry highest visual weight. Specify intended use (ad, UI mockup, editorial) so the model picks the right polish level.
- Always double-quote literal text.
"HELLO WORLD"engages the high-accuracy text rendering engine. - Pixel dimensions in prompt. For custom aspect ratios, append
"Output in exactly WxH (R:R ratio) resolution"— the APIsizeparam alone is unreliable. Done automatically byinject_size_hint()for gpt-image-2. - Use
--thinkingfor complex scenes. The orchestrator model matters — Thinking models produce significantly better multi-element compositions. - Generate fresh, don't edit. Reference-image editing on gpt-image-2 produces yellow tint and poor prompt adherence. For design-final work, generate from scratch.
See references/prompting-guide.md for full details.
When to Use GPT Atelier vs Nano Banana Pro
| Task | GPT Atelier | Nano Banana Pro |
|---|---|---|
| Text rendering (complex, non-Latin) | Best | Good |
| Mask-based inpainting | Native, low drift | Higher drift (~40%) |
| Reference-image editing | Yellow tint risk — use --fast | Better fidelity |
| Multi-turn editing with state | Responses API | Multi-turn chat |
| Photorealism | Excellent | Excellent |
| Cinematic digital painting | Good | Best |
| UI mockups / screenshots | Best | Good |
| Multi-image consistency | Up to 8/prompt | Up to 14 references |
| Streaming partial delivery | Native | Not available |
| Dark/artistic themes | Stricter policy | More permissive |
| Budget/volume | $0.006/img (mini low) | Gemini pricing |
| Arbitrary aspect ratios | Any (16px multiples) | 10 presets |
Cost Control
Start with --quality low ($0.006/image) for ideation. Graduate to medium ($0.05) for review, high ($0.21) for final assets. Use --fast for cheaper generation, --mini for maximum cost savings. Use --format jpeg for faster response times.
Script Reference
| Script | Purpose | API |
|---|---|---|
generate_image.py | Text-to-image | Image API |
edit_image.py | Edit with mask/references | Image API |
compose_images.py | Multi-reference composition | Image API |
converse_image.py | Multi-turn editing | Responses API |
compare_models.py | Side-by-side model comparison (HTML) | Image API |
stream_image.py | Streaming with partials | Image API |
test_connection.py | Connectivity check | Models API |
Reference Documentation
| File | Contents |
|---|---|
| `references/ |