Browser Ops

Browser automation via agent-browser. 25 tools wrapping Playwright for navigation, interaction, observation, and session management. Validated on two benchmark suites: 12/15 pass on a 15-task suite (100% excluding external blockers), 9/10 on a 10-task progressive suite. Standout: Notion end-to-end signup with AgentMail OTP verification.

Terminology used in this file:

Playwright: A browser automation framework that lets tools control Chromium/Chrome.
a11y tree: The accessibility tree (screen-reader-friendly page structure) used by browser_snapshot.
DOM: Document Object Model, the browser's structured representation of page elements.
CSS selector: A rule for targeting specific DOM elements (for example .price or #submit).
OAuth: A standard login/authorization flow that redirects through an identity provider (for example, "Sign in with GitHub").

Setup

npm install -g @anthropic-ai/agent-browser
agent-browser start

Claude Code: copy this skill folder into .claude/skills/browser-ops/
Codex CLI: append this SKILL.md content to your project's root AGENTS.md

For the full installation walkthrough (prerequisites, verification, troubleshooting), see references/installation-guide.md.

Staying Updated

This skill ships with an UPDATES.md changelog and UPDATE-GUIDE.md for your AI agent.

After installing, tell your agent: "Check UPDATES.md in the browser-ops skill for any new features or changes."

When updating, tell your agent: "Read UPDATE-GUIDE.md and apply the latest changes from UPDATES.md."

Follow UPDATE-GUIDE.md so customized local files are diffed before any overwrite.

Quick Start

The simplest possible browser flow: navigate, inspect, capture.

browser_navigate(url="https://example.com")
browser_snapshot(mode="interactive")
browser_screenshot(path="/tmp/example.png")
browser_close()

Decision Tree: Browser vs Other Tools

Ask this FIRST. Getting it wrong wastes significant token budget.

Need data from the web?
  |
  +-- Is it static content? (prices, articles, search results, public data)
  |     YES --> Use WebSearch / WebFetch (built-in tools)
  |             ~100 tokens. No browser overhead.
  |
  +-- Does it require interaction? (login, form fill, click sequences, session state)
  |     YES --> Use browser tools
  |
  +-- Does it require email verification?
  |     YES --> Use browser + AgentMail (see Email Verification section)
  |
  +-- Is the target known to block bots? (Cloudflare-protected, etc.)
        YES --> Check references/failure-log.md before starting.
              May need stealth config or alternative approach.

Rule of thumb: If you can get the data with curl, you don't need a browser.

Core Workflow

Every browser task follows this loop:

1. browser_navigate(url)                       -- go to the page
2. browser_snapshot(mode='interactive')        -- get refs (@e1, @e2...)
3. Identify target ref from snapshot           -- find the button/input/link
4. browser_click(@ref) / browser_fill(@ref, text) -- act
5. browser_snapshot(mode='interactive')        -- verify result
6. Repeat 3-5 until done
7. browser_close()                             -- ALWAYS close when done

The ref system: Snapshot returns element references like @e1, @e2. Use these refs with click/fill/type. Refs are stable within a page state but reset after navigation.

Token Efficiency: Snapshot Modes

Mode	Tokens/page	Shows	Use when
`interactive`	~1,400	Buttons, links, inputs only	Default for everything
`compact`	~3,000-5,000	Condensed full tree	Need text content + interactive
`full`	~15,000	Complete a11y tree	Last resort, known need

Default to interactive. It is 10x cheaper than full and sufficient for 90% of tasks.

Tiered Access Model

Tier 1: A11y Tree Snapshot (~1,400 tokens/page)
  browser_snapshot(mode='interactive') --> get refs --> click/fill
  For: navigation, form filling, structured page interaction
  This is your DEFAULT.

Tier 2: Screenshot + VLM (0 API tokens) [EXPERIMENTAL]
  browser_screenshot() --> local VLM (Qwen3-VL-2B / UI-TARS-1.5-7B)
  For: visual-only content, CAPTCHAs, pages where a11y tree misses data

Tier 3: Targeted DOM Extraction (variable tokens)
  browser_evaluate('document.querySelector(sel).textContent')
  For: known pages with known CSS selectors, JSON-LD extraction
  Use when you know EXACTLY what element contains the data.

Escalation path: Start at Tier 1. If snapshot doesn't show the data you need, try Tier 3 with a targeted selector. Only use Tier 2 when visual understanding is required.

Token Optimization for Data-Heavy Pages

For content-rich pages (HN, Reddit, forums, dashboards), the interactive snapshot balloons from ~1,400 tokens (simple pages) to ~47K tokens (dense pages). This wrecks budgets.

Pattern: Snapshot first to understand page structure, then browser_evaluate with targeted JS for bulk extraction.

1. browser_navigate(url)
2. browser_snapshot(mode='interactive')   -- understand structure (pay cost once)
3. browser_evaluate('                     -- extract data surgically
     JSON.stringify(
       [...document.querySelectorAll(".titleline a")]
         .map(a => ({title: a.textContent, href: a.href}))
     )
   ')
4. Parse JSON result -- structured data at ~200 tokens vs 47K snapshot

When to use: Any page where you need to extract 10+ items of the same type. Snapshot gives you the selector knowledge; eval gives you the data cheaply.

Email Verification (AgentMail)

For tasks requiring email verification (account signup, OTP flows).

Setup

AgentMail Python wrapper: ./scripts/mailbox.py (self-contained)
CLI wrapper: ./scripts/agentmail.sh
Dependencies: ./scripts/requirements.txt
First-time setup: ./scripts/agentmail.sh setup
Create your own mailbox (see pattern below)

AgentMail provides disposable email inboxes for AI agents. You create a mailbox, use the address in signup forms, then poll for incoming verification emails and extract OTP codes or links.

The Pattern (Validated on Notion Signup)

1. Create mailbox:     ./scripts/agentmail.sh create <username>
2. Fill signup form:   browser_fill(ref, "username@agentmail.to")
3. Submit form:        browser_click(ref)
4. Poll for email:     ./scripts/agentmail.sh poll username@agentmail.to --timeout 120
5. Extract OTP/link:   ./scripts/agentmail.sh extract <inbox_id> <msg_id>
6. Enter OTP:          browser_fill(ref, "123456")
7. Submit:             browser_click(ref)

Gotchas

Emails take 5-30 seconds to arrive. Always poll with timeout.
Some services detect agentmail.to domain -- have backup strategy.
OTP codes expire. Extract and submit promptly after polling.

Validated Flows

Notion signup: Full end-to-end -- signup, OTP poll, extract, submit, onboarding, page creation.
PKP forum: Email verification worked. Blocked by moderator approval gate (external).

Session Rules

CRITICAL: No parallel browser sessions.

All tools share one browser daemon per session
Parallel usage causes state collisions (one action navigates, another loses its page)
Run browser tasks SEQUENTIALLY. Always.
AGENT_BROWSER_SESSION env var controls session name (default: "mcp")
Per-session isolation is NOT yet implemented

Always close the browser when done:

browser_close()  -- releases the session for the next task

Forgetting to close leaves an orphaned Chromium process.

Stealth Configuration

Layer 1 provides basic stealth via environment variables. All browser sessions can run with headed mode, custom UA, persistent profile, and aut

browser-ops

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

claude-api

skill-creator

oh-my-issues

claude-mem

Recibe nuevas skills de Desenvolvimento todos los lunes

Browser Ops

Setup

Staying Updated

Quick Start

Decision Tree: Browser vs Other Tools

Core Workflow

Token Efficiency: Snapshot Modes

Tiered Access Model

Token Optimization for Data-Heavy Pages

Email Verification (AgentMail)

Setup

The Pattern (Validated on Notion Signup)

Gotchas

Validated Flows

Session Rules

Stealth Configuration

Comentarios · Sin comentarios