Browser Ops
Browser automation via agent-browser. 25 tools wrapping Playwright for navigation, interaction, observation, and session management. Validated on two benchmark suites: 12/15 pass on a 15-task suite (100% excluding external blockers), 9/10 on a 10-task progressive suite. Standout: Notion end-to-end signup with AgentMail OTP verification.
Terminology used in this file:
- Playwright: A browser automation framework that lets tools control Chromium/Chrome.
- a11y tree: The accessibility tree (screen-reader-friendly page structure) used by
browser_snapshot. - DOM: Document Object Model, the browser's structured representation of page elements.
- CSS selector: A rule for targeting specific DOM elements (for example
.priceor#submit). - OAuth: A standard login/authorization flow that redirects through an identity provider (for example, "Sign in with GitHub").
Setup
npm install -g @anthropic-ai/agent-browser
agent-browser start
- Claude Code: copy this skill folder into
.claude/skills/browser-ops/ - Codex CLI: append this SKILL.md content to your project's root
AGENTS.md
For the full installation walkthrough (prerequisites, verification, troubleshooting), see references/installation-guide.md.
Staying Updated
This skill ships with an UPDATES.md changelog and UPDATE-GUIDE.md for your AI agent.
After installing, tell your agent: "Check UPDATES.md in the browser-ops skill for any new features or changes."
When updating, tell your agent: "Read UPDATE-GUIDE.md and apply the latest changes from UPDATES.md."
Follow UPDATE-GUIDE.md so customized local files are diffed before any overwrite.
Quick Start
The simplest possible browser flow: navigate, inspect, capture.
browser_navigate(url="https://example.com")
browser_snapshot(mode="interactive")
browser_screenshot(path="/tmp/example.png")
browser_close()
Decision Tree: Browser vs Other Tools
Ask this FIRST. Getting it wrong wastes significant token budget.
Need data from the web?
|
+-- Is it static content? (prices, articles, search results, public data)
| YES --> Use WebSearch / WebFetch (built-in tools)
| ~100 tokens. No browser overhead.
|
+-- Does it require interaction? (login, form fill, click sequences, session state)
| YES --> Use browser tools
|
+-- Does it require email verification?
| YES --> Use browser + AgentMail (see Email Verification section)
|
+-- Is the target known to block bots? (Cloudflare-protected, etc.)
YES --> Check references/failure-log.md before starting.
May need stealth config or alternative approach.
Rule of thumb: If you can get the data with curl, you don't need a browser.
Core Workflow
Every browser task follows this loop:
1. browser_navigate(url) -- go to the page
2. browser_snapshot(mode='interactive') -- get refs (@e1, @e2...)
3. Identify target ref from snapshot -- find the button/input/link
4. browser_click(@ref) / browser_fill(@ref, text) -- act
5. browser_snapshot(mode='interactive') -- verify result
6. Repeat 3-5 until done
7. browser_close() -- ALWAYS close when done
The ref system: Snapshot returns element references like @e1, @e2. Use these refs with click/fill/type. Refs are stable within a page state but reset after navigation.
Token Efficiency: Snapshot Modes
| Mode | Tokens/page | Shows | Use when |
|---|---|---|---|
interactive | ~1,400 | Buttons, links, inputs only | Default for everything |
compact | ~3,000-5,000 | Condensed full tree | Need text content + interactive |
full | ~15,000 | Complete a11y tree | Last resort, known need |
Default to interactive. It is 10x cheaper than full and sufficient for 90% of tasks.
Tiered Access Model
Tier 1: A11y Tree Snapshot (~1,400 tokens/page)
browser_snapshot(mode='interactive') --> get refs --> click/fill
For: navigation, form filling, structured page interaction
This is your DEFAULT.
Tier 2: Screenshot + VLM (0 API tokens) [EXPERIMENTAL]
browser_screenshot() --> local VLM (Qwen3-VL-2B / UI-TARS-1.5-7B)
For: visual-only content, CAPTCHAs, pages where a11y tree misses data
Tier 3: Targeted DOM Extraction (variable tokens)
browser_evaluate('document.querySelector(sel).textContent')
For: known pages with known CSS selectors, JSON-LD extraction
Use when you know EXACTLY what element contains the data.
Escalation path: Start at Tier 1. If snapshot doesn't show the data you need, try Tier 3 with a targeted selector. Only use Tier 2 when visual understanding is required.
Token Optimization for Data-Heavy Pages
For content-rich pages (HN, Reddit, forums, dashboards), the interactive snapshot balloons from ~1,400 tokens (simple pages) to ~47K tokens (dense pages). This wrecks budgets.
Pattern: Snapshot first to understand page structure, then browser_evaluate with targeted JS for bulk extraction.
1. browser_navigate(url)
2. browser_snapshot(mode='interactive') -- understand structure (pay cost once)
3. browser_evaluate(' -- extract data surgically
JSON.stringify(
[...document.querySelectorAll(".titleline a")]
.map(a => ({title: a.textContent, href: a.href}))
)
')
4. Parse JSON result -- structured data at ~200 tokens vs 47K snapshot
When to use: Any page where you need to extract 10+ items of the same type. Snapshot gives you the selector knowledge; eval gives you the data cheaply.
Email Verification (AgentMail)
For tasks requiring email verification (account signup, OTP flows).
Setup
- AgentMail Python wrapper:
./scripts/mailbox.py(self-contained) - CLI wrapper:
./scripts/agentmail.sh - Dependencies:
./scripts/requirements.txt - First-time setup:
./scripts/agentmail.sh setup - Create your own mailbox (see pattern below)
AgentMail provides disposable email inboxes for AI agents. You create a mailbox, use the address in signup forms, then poll for incoming verification emails and extract OTP codes or links.
The Pattern (Validated on Notion Signup)
1. Create mailbox: ./scripts/agentmail.sh create <username>
2. Fill signup form: browser_fill(ref, "username@agentmail.to")
3. Submit form: browser_click(ref)
4. Poll for email: ./scripts/agentmail.sh poll username@agentmail.to --timeout 120
5. Extract OTP/link: ./scripts/agentmail.sh extract <inbox_id> <msg_id>
6. Enter OTP: browser_fill(ref, "123456")
7. Submit: browser_click(ref)
Gotchas
- Emails take 5-30 seconds to arrive. Always poll with timeout.
- Some services detect
agentmail.todomain -- have backup strategy. - OTP codes expire. Extract and submit promptly after polling.
Validated Flows
- Notion signup: Full end-to-end -- signup, OTP poll, extract, submit, onboarding, page creation.
- PKP forum: Email verification worked. Blocked by moderator approval gate (external).
Session Rules
CRITICAL: No parallel browser sessions.
- All tools share one browser daemon per session
- Parallel usage causes state collisions (one action navigates, another loses its page)
- Run browser tasks SEQUENTIALLY. Always.
AGENT_BROWSER_SESSIONenv var controls session name (default: "mcp")- Per-session isolation is NOT yet implemented
Always close the browser when done:
browser_close() -- releases the session for the next task
Forgetting to close leaves an orphaned Chromium process.
Stealth Configuration
Layer 1 provides basic stealth via environment variables. All browser sessions can run with headed mode, custom UA, persistent profile, and aut