agent-browser core

Fast browser automation CLI for AI agents. Chrome/Chromium via CDP, no Playwright or Puppeteer dependency. Accessibility-tree snapshots with compact @eN refs let agents interact with pages in ~200-400 tokens instead of parsing raw HTML.

Most normal web tasks (navigate, read, click, fill, extract, screenshot) are covered here. Load a specialized skill when the task falls outside browser web pages — see When to load another skill.

The core loop

agent-browser open <url>        # 1. Open a page
agent-browser snapshot -i       # 2. See what's on it (interactive elements only)
agent-browser click @e3         # 3. Act on refs from the snapshot
agent-browser snapshot -i       # 4. Re-snapshot after any page change

Refs (@e1, @e2, ...) are assigned fresh on every snapshot. They become stale the moment the page changes — after clicks that navigate, form submits, dynamic re-renders, dialog opens. Always re-snapshot before your next ref interaction.

Quickstart

# Install once
npm i -g agent-browser && agent-browser install

# Take a screenshot of a page
agent-browser open https://example.com
agent-browser screenshot home.png
agent-browser close

# Search, click a result, and capture it
agent-browser open https://duckduckgo.com
agent-browser snapshot -i                      # find the search box ref
agent-browser fill @e1 "agent-browser cli"
agent-browser press Enter
agent-browser wait --load networkidle
agent-browser snapshot -i                      # refs now reflect results
agent-browser click @e5                        # click a result
agent-browser screenshot result.png

The browser stays running across commands so these feel like a single session. Use agent-browser close (or close --all) when you're done.

Reading a page

agent-browser snapshot                    # full tree (verbose)
agent-browser snapshot -i                 # interactive elements only (preferred)
agent-browser snapshot -i -u              # include href urls on links
agent-browser snapshot -i -c              # compact (no empty structural nodes)
agent-browser snapshot -i -d 3            # cap depth at 3 levels
agent-browser snapshot -s "#main"         # scope to a CSS selector
agent-browser snapshot -i --json          # machine-readable output

Snapshot output looks like:

Page: Example - Log in
URL: https://example.com/login

@e1 [heading] "Log in"
@e2 [form]
  @e3 [input type="email"] placeholder="Email"
  @e4 [input type="password"] placeholder="Password"
  @e5 [button type="submit"] "Continue"
  @e6 [link] "Forgot password?"

For unstructured reading (no refs needed):

agent-browser get text @e1                # visible text of an element
agent-browser get html @e1                # innerHTML
agent-browser get attr @e1 href           # any attribute
agent-browser get value @e1               # input value
agent-browser get title                   # page title
agent-browser get url                     # current URL
agent-browser get count ".item"           # count matching elements

Interacting

agent-browser click @e1                   # click
agent-browser click @e1 --new-tab         # open link in new tab instead of navigating
agent-browser dblclick @e1                # double-click
agent-browser hover @e1                   # hover
agent-browser focus @e1                   # focus (useful before keyboard input)
agent-browser fill @e2 "hello"            # clear then type
agent-browser type @e2 " world"           # type without clearing
agent-browser press Enter                 # press a key at current focus
agent-browser press Control+a             # key combination
agent-browser check @e3                   # check checkbox
agent-browser uncheck @e3                 # uncheck
agent-browser select @e4 "option-value"   # select dropdown option
agent-browser select @e4 "a" "b"          # select multiple
agent-browser upload @e5 file1.pdf        # upload file(s)
agent-browser scroll down 500             # scroll page (up/down/left/right)
agent-browser scrollintoview @e1          # scroll element into view
agent-browser drag @e1 @e2                # drag and drop

When refs don't work or you don't want to snapshot

Use semantic locators:

agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find text "Sign In" click --exact     # exact match only
agent-browser find label "Email" fill "user@test.com"
agent-browser find placeholder "Search" type "query"
agent-browser find testid "submit-btn" click
agent-browser find first ".card" click
agent-browser find nth 2 ".card" hover

Or a raw CSS selector:

agent-browser click "#submit"
agent-browser fill "input[name=email]" "user@test.com"
agent-browser click "button.primary"

Rule of thumb: snapshot + @eN refs are fastest and most reliable for AI agents. find role/text/label is next best and doesn't require a prior snapshot. Raw CSS is a fallback when the others fail.

Waiting (read this)

Agents fail more often from bad waits than from bad selectors. Pick the right wait for the situation:

agent-browser wait @e1                     # until an element appears
agent-browser wait 2000                    # dumb wait, milliseconds (last resort)
agent-browser wait --text "Success"        # until the text appears on the page
agent-browser wait --url "**/dashboard"    # until URL matches pattern (glob)
agent-browser wait --load networkidle      # until network idle (post-navigation)
agent-browser wait --load domcontentloaded # until DOMContentLoaded
agent-browser wait --fn "window.myApp.ready === true"  # until JS condition

After any page-changing action, pick one:

Wait for a specific element you expect to appear: wait @ref or wait --text "...".
Wait for URL change: wait --url "**/new-page".
Wait for network idle (catch-all for SPA navigation): wait --load networkidle.

Avoid bare wait 2000 except when debugging — it makes scripts slow and flaky. Timeouts default to 25 seconds.

Common workflows

Log in

agent-browser open https://app.example.com/login
agent-browser snapshot -i

# Pick the email/password refs out of the snapshot, then:
agent-browser fill @e3 "user@example.com"
agent-browser fill @e4 "hunter2"
agent-browser click @e5
agent-browser wait --url "**/dashboard"
agent-browser snapshot -i

Credentials in shell history are a leak. For anything sensitive, use the auth vault (see references/authentication.md):

agent-browser auth save my-app --url https://app.example.com/login \
  --username user@example.com --password-stdin
# (type password, Ctrl+D)

agent-browser auth login my-app    # fills + clicks, waits for form

Persist session across runs

# Log in once, save cookies + localStorage
agent-browser state save ./auth.json

# Later runs start already-logged-in
agent-browser --state ./auth.json open https://app.example.com

Or use --session-name for auto-save/restore:

AGENT_BROWSER_SESSION_NAME=my-app agent-browser open https://app.example.com
# State is auto-saved and restored on subsequent runs with the same name.

Extract data

# Structured snapshot (best for AI reasoning over page content)
agent-browser snapshot -i --json > page.json

# Targeted extraction with refs
agent-browser snapshot -i
agent-browser get text @e5
agent-browser get attr @e10 href

# Arbitrary shape via JavaScript
cat <<'EOF' | agent-browser eval --stdin
const rows = document.querySelectorAll("table tbody tr");
Array.from(rows).map(r => ({
  name: r.cells[0].innerText,
  price: r.cells[1].innerText,
}));
EOF

Prefer eval --stdin (heredoc) or eval -b <base64> for any JS with quotes or special characters. Inline agent-browser eval "..." works only for simple expressions.

agent-browser

How to add

Drop this on your repo README

Related skills

claude-api

skill-creator

claude-mem

oh-my-issues

Get new Desenvolvimento skills every Monday