agent-browser core
Fast browser automation CLI for AI agents. Chrome/Chromium via CDP, no
Playwright or Puppeteer dependency. Accessibility-tree snapshots with compact
@eN refs let agents interact with pages in ~200-400 tokens instead of
parsing raw HTML.
Most normal web tasks (navigate, read, click, fill, extract, screenshot) are covered here. Load a specialized skill when the task falls outside browser web pages — see When to load another skill.
The core loop
agent-browser open <url> # 1. Open a page
agent-browser snapshot -i # 2. See what's on it (interactive elements only)
agent-browser click @e3 # 3. Act on refs from the snapshot
agent-browser snapshot -i # 4. Re-snapshot after any page change
Refs (@e1, @e2, ...) are assigned fresh on every snapshot. They become
stale the moment the page changes — after clicks that navigate, form
submits, dynamic re-renders, dialog opens. Always re-snapshot before your
next ref interaction.
Quickstart
# Install once
npm i -g agent-browser && agent-browser install
# Take a screenshot of a page
agent-browser open https://example.com
agent-browser screenshot home.png
agent-browser close
# Search, click a result, and capture it
agent-browser open https://duckduckgo.com
agent-browser snapshot -i # find the search box ref
agent-browser fill @e1 "agent-browser cli"
agent-browser press Enter
agent-browser wait --load networkidle
agent-browser snapshot -i # refs now reflect results
agent-browser click @e5 # click a result
agent-browser screenshot result.png
The browser stays running across commands so these feel like a single
session. Use agent-browser close (or close --all) when you're done.
Reading a page
agent-browser snapshot # full tree (verbose)
agent-browser snapshot -i # interactive elements only (preferred)
agent-browser snapshot -i -u # include href urls on links
agent-browser snapshot -i -c # compact (no empty structural nodes)
agent-browser snapshot -i -d 3 # cap depth at 3 levels
agent-browser snapshot -s "#main" # scope to a CSS selector
agent-browser snapshot -i --json # machine-readable output
Snapshot output looks like:
Page: Example - Log in
URL: https://example.com/login
@e1 [heading] "Log in"
@e2 [form]
@e3 [input type="email"] placeholder="Email"
@e4 [input type="password"] placeholder="Password"
@e5 [button type="submit"] "Continue"
@e6 [link] "Forgot password?"
For unstructured reading (no refs needed):
agent-browser get text @e1 # visible text of an element
agent-browser get html @e1 # innerHTML
agent-browser get attr @e1 href # any attribute
agent-browser get value @e1 # input value
agent-browser get title # page title
agent-browser get url # current URL
agent-browser get count ".item" # count matching elements
Interacting
agent-browser click @e1 # click
agent-browser click @e1 --new-tab # open link in new tab instead of navigating
agent-browser dblclick @e1 # double-click
agent-browser hover @e1 # hover
agent-browser focus @e1 # focus (useful before keyboard input)
agent-browser fill @e2 "hello" # clear then type
agent-browser type @e2 " world" # type without clearing
agent-browser press Enter # press a key at current focus
agent-browser press Control+a # key combination
agent-browser check @e3 # check checkbox
agent-browser uncheck @e3 # uncheck
agent-browser select @e4 "option-value" # select dropdown option
agent-browser select @e4 "a" "b" # select multiple
agent-browser upload @e5 file1.pdf # upload file(s)
agent-browser scroll down 500 # scroll page (up/down/left/right)
agent-browser scrollintoview @e1 # scroll element into view
agent-browser drag @e1 @e2 # drag and drop
When refs don't work or you don't want to snapshot
Use semantic locators:
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find text "Sign In" click --exact # exact match only
agent-browser find label "Email" fill "user@test.com"
agent-browser find placeholder "Search" type "query"
agent-browser find testid "submit-btn" click
agent-browser find first ".card" click
agent-browser find nth 2 ".card" hover
Or a raw CSS selector:
agent-browser click "#submit"
agent-browser fill "input[name=email]" "user@test.com"
agent-browser click "button.primary"
Rule of thumb: snapshot + @eN refs are fastest and most reliable for
AI agents. find role/text/label is next best and doesn't require a prior
snapshot. Raw CSS is a fallback when the others fail.
Waiting (read this)
Agents fail more often from bad waits than from bad selectors. Pick the right wait for the situation:
agent-browser wait @e1 # until an element appears
agent-browser wait 2000 # dumb wait, milliseconds (last resort)
agent-browser wait --text "Success" # until the text appears on the page
agent-browser wait --url "**/dashboard" # until URL matches pattern (glob)
agent-browser wait --load networkidle # until network idle (post-navigation)
agent-browser wait --load domcontentloaded # until DOMContentLoaded
agent-browser wait --fn "window.myApp.ready === true" # until JS condition
After any page-changing action, pick one:
- Wait for a specific element you expect to appear:
wait @reforwait --text "...". - Wait for URL change:
wait --url "**/new-page". - Wait for network idle (catch-all for SPA navigation):
wait --load networkidle.
Avoid bare wait 2000 except when debugging — it makes scripts slow and
flaky. Timeouts default to 25 seconds.
Common workflows
Log in
agent-browser open https://app.example.com/login
agent-browser snapshot -i
# Pick the email/password refs out of the snapshot, then:
agent-browser fill @e3 "user@example.com"
agent-browser fill @e4 "hunter2"
agent-browser click @e5
agent-browser wait --url "**/dashboard"
agent-browser snapshot -i
Credentials in shell history are a leak. For anything sensitive, use the auth vault (see references/authentication.md):
agent-browser auth save my-app --url https://app.example.com/login \
--username user@example.com --password-stdin
# (type password, Ctrl+D)
agent-browser auth login my-app # fills + clicks, waits for form
Persist session across runs
# Log in once, save cookies + localStorage
agent-browser state save ./auth.json
# Later runs start already-logged-in
agent-browser --state ./auth.json open https://app.example.com
Or use --session-name for auto-save/restore:
AGENT_BROWSER_SESSION_NAME=my-app agent-browser open https://app.example.com
# State is auto-saved and restored on subsequent runs with the same name.
Extract data
# Structured snapshot (best for AI reasoning over page content)
agent-browser snapshot -i --json > page.json
# Targeted extraction with refs
agent-browser snapshot -i
agent-browser get text @e5
agent-browser get attr @e10 href
# Arbitrary shape via JavaScript
cat <<'EOF' | agent-browser eval --stdin
const rows = document.querySelectorAll("table tbody tr");
Array.from(rows).map(r => ({
name: r.cells[0].innerText,
price: r.cells[1].innerText,
}));
EOF
Prefer eval --stdin (heredoc) or eval -b <base64> for any JS with
quotes or special characters. Inline agent-browser eval "..." works
only for simple expressions.
Screenshot
a