GLM Design-to-Code

Converts design inputs (screenshots, text descriptions, HTML files, URLs) to working frontend code using GLM vision models. Three modes: CREATE, REVIEW, FIX.

Arguments: $ARGUMENTS

Mode Routing

Mode	Flow
CREATE	Phase 0 → 0.5 → 1 → 2 → 3 (with auto-fix) → 4 (mandatory verify) → 5 (if --review)
REVIEW	Phase 0 → 0.5 → 1 → 5
FIX	Phase 0 → 0.5 → 1 → 6

PARAMETER PRIORITY: User prompt arguments ALWAYS override defaults and environment.

Explicit flags in $ARGUMENTS (--model, --profile, --provider, --framework) → highest priority

API keys from $ARGUMENTS prompt text (if user pasted a key inline) → override .env

Environment variables (.env, shell env) → fallback

Defaults from parse-args.sh → lowest priority

MANDATORY OUTPUT: Before ANY API call, output the full resolved configuration table (see Step 2.5 in Phase 2). This applies to ALL modes (CREATE, REVIEW, FIX). The user must always see what was resolved from their input.

Phase 0: Parse Arguments and Gather Config

Step 1: Parse Flags

EXECUTE using Bash tool:

bash "${CLAUDE_SKILL_DIR}/scripts/parse-args.sh" "$ARGUMENTS" && echo "OK" || echo "FAILED"

Output: key=value pairs. Store all values.

Also scan $ARGUMENTS raw text for any inline values not captured by flags:

API key pasted in prompt text → extract and use (overrides .env)

Model name mentioned in free text (e.g., "use glm-4.6v") → treat as --model

Profile/provider mentioned in text → treat as flags

Key	Default	Options
`IMAGE`	(required)	Path to screenshot file, URL, HTML file, or text description
`INPUT_TYPE`	auto	image, html, text, url
`FRAMEWORK`	html	html, react, flutter, custom
`PROFILE`	max	max, optimal, efficient
`PROVIDER`	zai	zai, openrouter
`OUTPUT`	`./d2c-output`	Output directory
`REVIEW`	false	true/false
`MODE`	create	create, review, fix
`FIX_TEXT`	(empty)	Text from --fix "..."
`REVIEW_FILE`	(empty)	Path from --review-file
`MODEL`	(empty)	Model override from --model
`MAX_TOKENS`	32768	32768 (max), 16384 (optimal), 8192 (efficient)

STOP if FAILED -- check parse-args.sh.

Step 1.5: Detect Mode

Condition	Mode
`--fix` flag present	FIX
`--review` flag present	REVIEW
Otherwise	CREATE

Step 1.7: Classify Intent (MODE=create only)

Skip this step for REVIEW and FIX modes.

Analyze the user's prompt text ($ARGUMENTS) and the input type to classify intent. Opus classifies automatically -- no AskUserQuestion needed (exception: INPUT_TYPE=html with ambiguous signal -- ask).

Intent	Signals	Default GLM instruction
`reproduce`	Polished mockup, "exact", "copy", "pixel-perfect", no modification language	"Reproduce this design as working code. Match every visual detail exactly."
`creative`	"sketch", "wireframe", "rough", "make it look professional", "polish"	"This is a rough sketch. Create a polished, professional UI based on this layout. Use modern design, clean typography, harmonious colors."
`enhance`	"add a", "include", "put a ... on", existing design + additions	"This is an existing design. Enhance it: {user request}. Keep all existing content intact."
`modify`	"change", "update", "make darker", color/font/layout changes	"Modify this design: {user changes}. Keep everything else unchanged."
`convert`	"to React", "to Flutter", "convert", INPUT_TYPE=html + different framework	"Convert this {source} to {FRAMEWORK}. Preserve visual appearance."

Default: reproduce (matches current behavior when no specific signals detected).

Store as variables for later use:

INTENT -- one of: reproduce, creative, enhance, modify, convert
GLM_INSTRUCTION -- the instruction text (from table above, with placeholders filled from user prompt)

Exception: If INPUT_TYPE=html and no clear intent signal in prompt -- ASK using AskUserQuestion:
HTML input detected. What would you like to do?
Options:

"Convert to {FRAMEWORK} (preserve appearance)"

"Reproduce as clean HTML/CSS from scratch"

"Use as reference -- create improved version"

Step 2: Process Input by Type

Based on INPUT_TYPE from parse-args.sh:

If INPUT_TYPE=image

EXECUTE using Bash tool:

IMAGE="IMAGE_PATH_HERE"
[ -f "$IMAGE" ] && file --mime-type "$IMAGE" | grep -qE ': image/' && echo "VALID_IMAGE" || echo "INVALID"

If INVALID:

ASK using AskUserQuestion:

The file "{IMAGE}" is not a valid image. Please provide a valid input:

Options:

"Enter path to screenshot file (PNG/JPG/WebP)"
"Enter a URL to screenshot"
"Enter a text description instead"

On answer:

File path -> re-validate as image, update IMAGE and INPUT_TYPE=image
URL -> update IMAGE, set INPUT_TYPE=url, go to URL processing above
Text description -> update IMAGE with text, set INPUT_TYPE=text

If INPUT_TYPE=url

Take a Playwright screenshot of the URL first: EXECUTE using Bash tool:

URL="URL_HERE"
npx playwright screenshot --full-page "$URL" /tmp/d2c-url-screenshot.png 2>&1 && echo "SCREENSHOT_OK" || echo "SCREENSHOT_FAILED"

If SCREENSHOT_OK: Set IMAGE=/tmp/d2c-url-screenshot.png and continue as image input. If SCREENSHOT_FAILED: Try using Playwright MCP browser_navigate + browser_take_screenshot. If Playwright MCP also fails:

ASK using AskUserQuestion:

Could not take screenshot of "{URL}". The URL may be unreachable or Playwright is not available. Choose alternative:

Options:

"I'll provide a screenshot file instead"
"Convert from text description"
"Skip -- I'll paste the HTML source"

On answer:

Screenshot file -> ask for path, set INPUT_TYPE=image
Text description -> ask for description, set INPUT_TYPE=text
HTML source -> ask for file path, set INPUT_TYPE=html

If INPUT_TYPE=html

EXECUTE using Bash tool:

HTML_FILE="HTML_PATH_HERE"
[ -f "$HTML_FILE" ] && echo "HTML_VALID ($(wc -l < "$HTML_FILE" | tr -d ' ') lines)" || echo "HTML_MISSING"

If HTML_VALID: Attempt to screenshot the HTML for dual input (image + HTML source):

EXECUTE using Bash tool:

HTML_FILE="HTML_PATH_HERE"
npx playwright screenshot --full-page "file://$(cd "$(dirname "$HTML_FILE")" && pwd)/$(basename "$HTML_FILE")" /tmp/d2c-html-screenshot.png 2>&1 && echo "SCREENSHOT_OK" || echo "SCREENSHOT_FAILED"

If SCREENSHOT_OK: Set HTML_SCREENSHOT=/tmp/d2c-html-screenshot.png, DUAL_INPUT=true. Will use glm-build-request.sh with both screenshot and HTML source.

If SCREENSHOT_FAILED: Try fallback:

command -v wkhtmltoimage >/dev/null 2>&1 && wkhtmltoimage --quality 90 --width 1440 "$HTML_FILE" /tmp/d2c-html-screenshot.png 2>&1 && echo "SCREENSHOT_OK" || echo "SCREENSHOT_FAILED"

If still FAILED: Try Playwright MCP browser_navigate to file:// URL + browser_take_screenshot.

If all fail:

ASK using AskUserQuestion:

Could not screenshot the HTML file. Choose how to proceed:

Options:

"I'll provide a screenshot file" (ask for path, set DUAL_INPUT=true)
"Continue without screenshot (text-only)" (set DUAL_INPUT=false)

When DUAL_INPUT=false: Will use glm-build-text-request.sh (text-only with HTML content).

If INPUT_TYPE=text

The description text is in the IMAGE field. No validation needed -- will use glm-build-text-request.sh in Phase 2.

Step 3: Confirm Settings (if no flags provided)

If IMAGE was the only argument (no flags), ASK using AskUserQuestion:

Design-to-Code Configuration:

Input: {IMAGE} ({INPUT_TYPE})
Framework: html (HTML/CSS), react (React 18 + CSS Modules), flutter (Flutter Web), custom
Profile: max (pixel-p

brewui:glm-design-to-code

How to add

Drop this on your repo README

Related skills

webapp-testing

brand-guidelines

frontend-design

web-artifacts-builder

Get new Design e Frontend skills every Monday

GLM Design-to-Code

Mode Routing

Phase 0: Parse Arguments and Gather Config

Step 1: Parse Flags

Step 1.5: Detect Mode

Step 1.7: Classify Intent (MODE=create only)

Step 2: Process Input by Type

If INPUT_TYPE=image

If INPUT_TYPE=url

If INPUT_TYPE=html

If INPUT_TYPE=text

Step 3: Confirm Settings (if no flags provided)

Comments · No comments