Kimi WebBridge

Control the user's real browser (with their login sessions) via a local daemon at http://127.0.0.1:10086.

Health check (always do this first)

~/.kimi-webbridge/bin/kimi-webbridge status

Then act on the result:

running: true and extension_connected: true — healthy. Proceed with the tool calls below.
Anything else (command not found, running: false, extension_connected: false, errors) — Read references/operations.md in this skill directory. It has the install / start / diagnose routing table.

Don't guess fixes here — every non-healthy state is handled in references/operations.md.

Tools

Tool	Args	Returns	Note
`navigate`	`url`, `newTab`(bool), `group_title`	`{success, url, tabId}`	Always use `newTab:true` on first call. `group_title` sets the tab group's visible label
`find_tab`	`url`, `active`(bool)	`{success, url, tabId}`	Reuse an already-open tab — pass any URL or domain; `active:true` picks the tab the user is currently viewing, default picks the leftmost match
`snapshot`	—	`{url, title, tree}` with `@e` refs	Accessibility tree (text) — use this to read page content and locate elements
`click`	`selector` (@e ref or CSS)	`{success, tag, text}`	Synthetic `el.click()`
`fill`	`selector`, `value`	`{success, tag, mode}`	Works on `<input>`/`<textarea>` AND `[contenteditable]` (ProseMirror/Lexical/Slate). `mode` is `"value"` or `"contenteditable"`
`evaluate`	`code` (supports async/await)	`{type, value}`
`screenshot`	`format`(png\|jpeg), `quality`(0-100), optional `selector` (@e/CSS)	`{format, dataLength, data}` (base64)	Full page floods context — use helper script (saves to disk, returns path). With `selector` only that element is captured (small, OK to call API directly)
`network`	`cmd`(start\|stop\|list\|detail), `filter`, `requestId`	request/response data
`upload`	`selector`, `files`(string[])	`{success, fileCount}`
`save_as_pdf`	`paper_format`, `landscape`, `scale`, `print_background`, `file_name`	`{path, sizeBytes, mimeType, pageTitle}`	Render current page → PDF, saved under `/tmp/kimi-webbridge-pdfs/`
`list_tabs`	—	`{success, tabs:[{tabId, url, title, active, groupTitle}]}`	Inspect tabs in the current session
`close_tab`	—	`{success, closed: bool}`	Close the current tab in the session
`close_session`	—	`{success, closed: int}`	Close all tabs — `closed` is the count. Always call at task end

Using find_tab

Use find_tab when the user explicitly asks to operate on an already-open tab. It matches by domain, so any URL on the same site works. Without active:true, returns the leftmost matching tab; with active:true, returns the tab the user is currently viewing — pass it when the user says "用我打开的 X" / "在我当前的 X 页面上".

curl -s -X POST http://127.0.0.1:10086/command \
  -d '{"action":"find_tab","args":{"url":"https://www.kimi.com","active":true},"session":"kimi"}'

If find_tab returns "no open tab found", the page is not open — fall back to navigate with newTab:true.

Call Format

curl -s -X POST http://127.0.0.1:10086/command \
  -H 'Content-Type: application/json' \
  -d '{"action":"navigate","args":{"url":"https://example.com","newTab":true}}'

Sessions

Each session maps to a separate browser tab group. Use different session names for different sites to keep operations isolated.

Add "session":"name" to the request body:

curl -s -X POST http://127.0.0.1:10086/command \
  -d '{"action":"navigate","args":{"url":"https://example.com","newTab":true},"session":"my-task"}'

Always assign distinct session names when working with multiple sites in parallel.

Screenshots: Use the Helper Script

Never call the screenshot API directly — it returns base64-encoded image data (hundreds of KB of text) that will flood the context window.

Use scripts/screenshot.sh instead. It decodes and saves the image to disk, returning only the file path:

# Default — saves to /tmp/kimi-webbridge-screenshots/{timestamp}.png
bash "$(dirname "$SKILL_PATH")/scripts/screenshot.sh"

# With a session
bash "$(dirname "$SKILL_PATH")/scripts/screenshot.sh" -s my-task

# Custom output path
bash "$(dirname "$SKILL_PATH")/scripts/screenshot.sh" -o /tmp/page.png

# JPEG format, quality 60
bash "$(dirname "$SKILL_PATH")/scripts/screenshot.sh" -f jpeg -q 60

After getting the file path, use the Read tool to view the image.

If $SKILL_PATH is unavailable, call the script by its absolute path.

Prefer snapshot over CSS/JS selectors

snapshot returns interactive elements with @e refs based on semantic role/name. Use them directly with click/fill — they survive CSS class hash changes that break manually-written selectors.

Fall back to evaluate (JS) only when:

The target has no @e ref in the snapshot
You need attributes not in the snapshot (e.g., href)
You need to dispatch complex event sequences, or scroll

Evaluate Tips

Always use compact JSON.stringify(data) — never add null, 2 formatting. Indentation and newlines can inflate the response several times over, causing truncation during transmission.
evaluate calls share the page's JS realm — re-declaring the same const/let across two calls throws SyntaxError. Wrap in an IIFE for a fresh scope: (() => { const x = ...; return x; })().

Text input — use `fill`

fill handles all three text input shapes. Pass selector (CSS or @e ref) + value:

Target	What `fill` does	Returned `mode`
`<input>` / `<textarea>`	Sets `.value` via native setter, fires `input`/`change`.	`"value"`
`[contenteditable]` (ProseMirror / TipTap / Lexical / Slate / Quill etc.)	Focuses, selects all existing content, calls `document.execCommand('insertText', ...)` which fires `beforeinput`/`input` with `inputType:'insertText'` and `data:value`.	`"contenteditable"`
Other element	Best-effort `.value` + events.	`"value"`

fill is clear-and-insert: existing content is replaced. For "append to existing text", read the current value via evaluate, concatenate, then fill with the result.

Form submit / special keys

There's no separate "press Enter" tool. To submit a form, click the submit button directly (click on the @e ref or selector). To dispatch a key event programmatically (e.g. Escape to close a modal):

{"action":"evaluate","args":{"code":"document.activeElement.dispatchEvent(new KeyboardEvent('keydown',{key:'Escape',bubbles:true}))"}}

Save the current page as PDF

save_as_pdf renders the current page to PDF, writes it to /tmp/kimi-webbridge-pdfs/, and returns the file path (the daemon strips the base64 — agent never sees raw PDF bytes).

All args optional:

paper_format: letter (default) | a4 | legal | a3 | tabloid
landscape: false (default)
scale: 1.0 (default), range [0.1, 2.0]
print_background: true (default) — keep background colors
file_name: caller-supplied name; if absent, derived from page title

Decoded PDF cap is 100 MB. Above that the daemon refuses; reduce scale or split the page.

Known limitations

Sites that strictly check event.isTrusted (some banking portals, captcha challenges) reject fill and click because both go through DOM-level synthetic events (isTrusted=false). This is a product boundary, not a bug — no automation primitive that runs on the user's machine without stealing OS focus can produce trusted events on these sites.
Cross-origin iframes: fill, click, evaluate, and snapshot operate on the top frame. If a target element lives in a same-page iframe from a different origin (e.g. embedded sandbox demos), navigate to the iframe's URL directly instead.

Versions

Daemon, extension, and this skill share a 1:1 version s

kimi-webbridge

How to add

Drop this on your repo README

Related skills

dev-browser

agent-browser

understand-chat

understand-dashboard

Get new Pesquisa e Web skills every Monday

Kimi WebBridge

Health check (always do this first)

Tools

Using find_tab

Call Format

Sessions

Screenshots: Use the Helper Script

Prefer snapshot over CSS/JS selectors

Evaluate Tips

Text input — use `fill`

Form submit / special keys

Save the current page as PDF

Known limitations

Versions

Comments · No comments

How to add

Drop this on your repo README

Related skills

dev-browser

agent-browser

understand-chat

understand-dashboard

Get new Pesquisa e Web skills every Monday

Kimi WebBridge

Health check (always do this first)

Tools

Using find_tab

Call Format

Sessions

Screenshots: Use the Helper Script

Prefer snapshot over CSS/JS selectors

Evaluate Tips

Text input — use fill

Form submit / special keys

Save the current page as PDF

Known limitations

Versions

Comments · No comments

Text input — use `fill`