GLM Design-to-Code
Converts design inputs (screenshots, text descriptions, HTML files, URLs) to working frontend code using GLM vision models. Three modes: CREATE, REVIEW, FIX.
Arguments: $ARGUMENTS
Mode Routing
| Mode | Flow |
|---|---|
| CREATE | Phase 0 → 0.5 → 1 → 2 → 3 (with auto-fix) → 4 (mandatory verify) → 5 (if --review) |
| REVIEW | Phase 0 → 0.5 → 1 → 5 |
| FIX | Phase 0 → 0.5 → 1 → 6 |
PARAMETER PRIORITY: User prompt arguments ALWAYS override defaults and environment.
- Explicit flags in
$ARGUMENTS(--model,--profile,--provider,--framework) → highest priority- API keys from
$ARGUMENTSprompt text (if user pasted a key inline) → override.env- Environment variables (
.env, shell env) → fallback- Defaults from
parse-args.sh→ lowest priorityMANDATORY OUTPUT: Before ANY API call, output the full resolved configuration table (see Step 2.5 in Phase 2). This applies to ALL modes (CREATE, REVIEW, FIX). The user must always see what was resolved from their input.
Phase 0: Parse Arguments and Gather Config
Step 1: Parse Flags
EXECUTE using Bash tool:
bash "${CLAUDE_SKILL_DIR}/scripts/parse-args.sh" "$ARGUMENTS" && echo "OK" || echo "FAILED"
Output: key=value pairs. Store all values.
Also scan
$ARGUMENTSraw text for any inline values not captured by flags:
- API key pasted in prompt text → extract and use (overrides
.env)- Model name mentioned in free text (e.g., "use glm-4.6v") → treat as
--model- Profile/provider mentioned in text → treat as flags
| Key | Default | Options |
|---|---|---|
IMAGE | (required) | Path to screenshot file, URL, HTML file, or text description |
INPUT_TYPE | auto | image, html, text, url |
FRAMEWORK | html | html, react, flutter, custom |
PROFILE | max | max, optimal, efficient |
PROVIDER | zai | zai, openrouter |
OUTPUT | ./d2c-output | Output directory |
REVIEW | false | true/false |
MODE | create | create, review, fix |
FIX_TEXT | (empty) | Text from --fix "..." |
REVIEW_FILE | (empty) | Path from --review-file |
MODEL | (empty) | Model override from --model |
MAX_TOKENS | 32768 | 32768 (max), 16384 (optimal), 8192 (efficient) |
STOP if FAILED -- check parse-args.sh.
Step 1.5: Detect Mode
| Condition | Mode |
|---|---|
--fix flag present | FIX |
--review flag present | REVIEW |
| Otherwise | CREATE |
Step 1.7: Classify Intent (MODE=create only)
Skip this step for REVIEW and FIX modes.
Analyze the user's prompt text ($ARGUMENTS) and the input type to classify intent. Opus classifies automatically -- no AskUserQuestion needed (exception: INPUT_TYPE=html with ambiguous signal -- ask).
| Intent | Signals | Default GLM instruction |
|---|---|---|
reproduce | Polished mockup, "exact", "copy", "pixel-perfect", no modification language | "Reproduce this design as working code. Match every visual detail exactly." |
creative | "sketch", "wireframe", "rough", "make it look professional", "polish" | "This is a rough sketch. Create a polished, professional UI based on this layout. Use modern design, clean typography, harmonious colors." |
enhance | "add a", "include", "put a ... on", existing design + additions | "This is an existing design. Enhance it: {user request}. Keep all existing content intact." |
modify | "change", "update", "make darker", color/font/layout changes | "Modify this design: {user changes}. Keep everything else unchanged." |
convert | "to React", "to Flutter", "convert", INPUT_TYPE=html + different framework | "Convert this {source} to {FRAMEWORK}. Preserve visual appearance." |
Default: reproduce (matches current behavior when no specific signals detected).
Store as variables for later use:
INTENT-- one of: reproduce, creative, enhance, modify, convertGLM_INSTRUCTION-- the instruction text (from table above, with placeholders filled from user prompt)
Exception: If INPUT_TYPE=html and no clear intent signal in prompt -- ASK using AskUserQuestion:
HTML input detected. What would you like to do?Options:
- "Convert to {FRAMEWORK} (preserve appearance)"
- "Reproduce as clean HTML/CSS from scratch"
- "Use as reference -- create improved version"
Step 2: Process Input by Type
Based on INPUT_TYPE from parse-args.sh:
If INPUT_TYPE=image
EXECUTE using Bash tool:
IMAGE="IMAGE_PATH_HERE"
[ -f "$IMAGE" ] && file --mime-type "$IMAGE" | grep -qE ': image/' && echo "VALID_IMAGE" || echo "INVALID"
If INVALID:
ASK using AskUserQuestion:
The file "{IMAGE}" is not a valid image. Please provide a valid input:
Options:
- "Enter path to screenshot file (PNG/JPG/WebP)"
- "Enter a URL to screenshot"
- "Enter a text description instead"
On answer:
- File path -> re-validate as image, update IMAGE and INPUT_TYPE=image
- URL -> update IMAGE, set INPUT_TYPE=url, go to URL processing above
- Text description -> update IMAGE with text, set INPUT_TYPE=text
If INPUT_TYPE=url
Take a Playwright screenshot of the URL first: EXECUTE using Bash tool:
URL="URL_HERE"
npx playwright screenshot --full-page "$URL" /tmp/d2c-url-screenshot.png 2>&1 && echo "SCREENSHOT_OK" || echo "SCREENSHOT_FAILED"
If SCREENSHOT_OK: Set IMAGE=/tmp/d2c-url-screenshot.png and continue as image input. If SCREENSHOT_FAILED: Try using Playwright MCP
browser_navigate+browser_take_screenshot. If Playwright MCP also fails:
ASK using AskUserQuestion:
Could not take screenshot of "{URL}". The URL may be unreachable or Playwright is not available. Choose alternative:
Options:
- "I'll provide a screenshot file instead"
- "Convert from text description"
- "Skip -- I'll paste the HTML source"
On answer:
- Screenshot file -> ask for path, set INPUT_TYPE=image
- Text description -> ask for description, set INPUT_TYPE=text
- HTML source -> ask for file path, set INPUT_TYPE=html
If INPUT_TYPE=html
EXECUTE using Bash tool:
HTML_FILE="HTML_PATH_HERE"
[ -f "$HTML_FILE" ] && echo "HTML_VALID ($(wc -l < "$HTML_FILE" | tr -d ' ') lines)" || echo "HTML_MISSING"
If HTML_VALID: Attempt to screenshot the HTML for dual input (image + HTML source):
EXECUTE using Bash tool:
HTML_FILE="HTML_PATH_HERE"
npx playwright screenshot --full-page "file://$(cd "$(dirname "$HTML_FILE")" && pwd)/$(basename "$HTML_FILE")" /tmp/d2c-html-screenshot.png 2>&1 && echo "SCREENSHOT_OK" || echo "SCREENSHOT_FAILED"
If SCREENSHOT_OK: Set
HTML_SCREENSHOT=/tmp/d2c-html-screenshot.png,DUAL_INPUT=true. Will useglm-build-request.shwith both screenshot and HTML source.
If SCREENSHOT_FAILED: Try fallback:
command -v wkhtmltoimage >/dev/null 2>&1 && wkhtmltoimage --quality 90 --width 1440 "$HTML_FILE" /tmp/d2c-html-screenshot.png 2>&1 && echo "SCREENSHOT_OK" || echo "SCREENSHOT_FAILED"
If still FAILED: Try Playwright MCP
browser_navigatetofile://URL +browser_take_screenshot.
If all fail:
ASK using AskUserQuestion:
Could not screenshot the HTML file. Choose how to proceed:
Options:
- "I'll provide a screenshot file" (ask for path, set DUAL_INPUT=true)
- "Continue without screenshot (text-only)" (set DUAL_INPUT=false)
When
DUAL_INPUT=false: Will useglm-build-text-request.sh(text-only with HTML content).
If INPUT_TYPE=text
The description text is in the IMAGE field. No validation needed -- will use glm-build-text-request.sh in Phase 2.
Step 3: Confirm Settings (if no flags provided)
If IMAGE was the only argument (no flags), ASK using AskUserQuestion:
Design-to-Code Configuration:
Input: {IMAGE} ({INPUT_TYPE})
Framework: html (HTML/CSS), react (React 18 + CSS Modules), flutter (Flutter Web), custom
Profile: max (pixel-p