Files

Dmytro Stanchiev 036278bab4 chore: add agent-browser skills

Signed-off-by: Dmytro Stanchiev <git@dmytros.dev>

2026-04-21 12:23:35 -04:00

16 KiB

Raw Blame History

Command Reference

Complete reference for all agent-browser commands. For quick start and common patterns, see SKILL.md.

agent-browser open            # Launch browser (no navigation); stays on about:blank.
                              # Pair with `network route`, `cookies set --curl`, or
                              # `addinitscript` to stage state before the first navigation.
agent-browser open <url>      # Launch + navigate (aliases: goto, navigate)
                              # Supports: https://, http://, file://, about:, data://
                              # Auto-prepends https:// if no protocol given
agent-browser back            # Go back
agent-browser forward         # Go forward
agent-browser reload          # Reload page
agent-browser pushstate <url> # SPA client-side navigation. Auto-detects
                              # window.next.router.push (triggers RSC fetch on Next.js);
                              # falls back to history.pushState + popstate/navigate events.
agent-browser close           # Close browser (aliases: quit, exit)
agent-browser connect 9222    # Connect to browser via CDP port

agent-browser batch \
  '["open"]' \
  '["network","route","*","--abort","--resource-type","script"]' \
  '["cookies","set","--curl","cookies.curl","--domain","localhost"]' \
  '["navigate","http://localhost:3000/target"]'

open with no URL gives you a clean launch so any interception, cookies, or init scripts you register take effect on the first real navigation. Use for SSR-only debug (--resource-type script), protected-origin auth, or capturing fresh react suspense/vitals state without noise from a prior page.

Snapshot (page analysis)

agent-browser snapshot            # Full accessibility tree
agent-browser snapshot -i         # Interactive elements only (recommended)
agent-browser snapshot -c         # Compact output
agent-browser snapshot -d 3       # Limit depth to 3
agent-browser snapshot -s "#main" # Scope to CSS selector

Interactions (use @refs from snapshot)

agent-browser click @e1           # Click
agent-browser click @e1 --new-tab # Click and open in new tab
agent-browser dblclick @e1        # Double-click
agent-browser focus @e1           # Focus element
agent-browser fill @e2 "text"     # Clear and type
agent-browser type @e2 "text"     # Type without clearing
agent-browser press Enter         # Press key (alias: key)
agent-browser press Control+a     # Key combination
agent-browser keydown Shift       # Hold key down
agent-browser keyup Shift         # Release key
agent-browser hover @e1           # Hover
agent-browser check @e1           # Check checkbox
agent-browser uncheck @e1         # Uncheck checkbox
agent-browser select @e1 "value"  # Select dropdown option
agent-browser select @e1 "a" "b"  # Select multiple options
agent-browser scroll down 500     # Scroll page (default: down 300px)
agent-browser scrollintoview @e1  # Scroll element into view (alias: scrollinto)
agent-browser drag @e1 @e2        # Drag and drop
agent-browser upload @e1 file.pdf # Upload files

Get Information

agent-browser get text @e1        # Get element text
agent-browser get html @e1        # Get innerHTML
agent-browser get value @e1       # Get input value
agent-browser get attr @e1 href   # Get attribute
agent-browser get title           # Get page title
agent-browser get url             # Get current URL
agent-browser get cdp-url         # Get CDP WebSocket URL
agent-browser get count ".item"   # Count matching elements
agent-browser get box @e1         # Get bounding box
agent-browser get styles @e1      # Get computed styles (font, color, bg, etc.)

Check State

agent-browser is visible @e1      # Check if visible
agent-browser is enabled @e1      # Check if enabled
agent-browser is checked @e1      # Check if checked

Screenshots and PDF

agent-browser screenshot          # Save to temporary directory
agent-browser screenshot path.png # Save to specific path
agent-browser screenshot --full   # Full page
agent-browser pdf output.pdf      # Save as PDF

Video Recording

agent-browser record start ./demo.webm    # Start recording
agent-browser click @e1                   # Perform actions
agent-browser record stop                 # Stop and save video
agent-browser record restart ./take2.webm # Stop current + start new

Wait

agent-browser wait @e1                     # Wait for element
agent-browser wait 2000                    # Wait milliseconds
agent-browser wait --text "Success"        # Wait for text (or -t)
agent-browser wait --url "**/dashboard"    # Wait for URL pattern (or -u)
agent-browser wait --load networkidle      # Wait for network idle (or -l)
agent-browser wait --fn "window.ready"     # Wait for JS condition (or -f)

Mouse Control

agent-browser mouse move 100 200      # Move mouse
agent-browser mouse down left         # Press button
agent-browser mouse up left           # Release button
agent-browser mouse wheel 100         # Scroll wheel

Semantic Locators (alternative to refs)

agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find text "Sign In" click --exact      # Exact match only
agent-browser find label "Email" fill "user@test.com"
agent-browser find placeholder "Search" type "query"
agent-browser find alt "Logo" click
agent-browser find title "Close" click
agent-browser find testid "submit-btn" click
agent-browser find first ".item" click
agent-browser find last ".item" click
agent-browser find nth 2 "a" hover

Browser Settings

agent-browser set viewport 1920 1080          # Set viewport size
agent-browser set viewport 1920 1080 2        # 2x retina (same CSS size, higher res screenshots)
agent-browser set device "iPhone 14"          # Emulate device
agent-browser set geo 37.7749 -122.4194       # Set geolocation (alias: geolocation)
agent-browser set offline on                  # Toggle offline mode
agent-browser set headers '{"X-Key":"v"}'     # Extra HTTP headers
agent-browser set credentials user pass       # HTTP basic auth (alias: auth)
agent-browser set media dark                  # Emulate color scheme
agent-browser set media light reduced-motion  # Light mode + reduced motion

Cookies and Storage

agent-browser cookies                     # Get all cookies
agent-browser cookies set name value      # Set cookie
agent-browser cookies clear               # Clear cookies
agent-browser storage local               # Get all localStorage
agent-browser storage local key           # Get specific key
agent-browser storage local set k v       # Set value
agent-browser storage local clear         # Clear all

Network

agent-browser network route <url>              # Intercept requests
agent-browser network route <url> --abort      # Block requests
agent-browser network route <url> --body '{}'  # Mock response
agent-browser network unroute [url]            # Remove routes
agent-browser network requests                 # View tracked requests
agent-browser network requests --filter api    # Filter requests

Tabs and Windows

agent-browser tab                              # List tabs with tabId and label
agent-browser tab new [url]                    # New tab
agent-browser tab new --label docs [url]       # New tab with a memorable label
agent-browser tab t2                           # Switch to tab by id
agent-browser tab docs                         # Switch to tab by label
agent-browser tab close                        # Close current tab
agent-browser tab close t2                     # Close tab by id
agent-browser tab close docs                   # Close tab by label
agent-browser window new                       # New window

Tab ids are stable strings of the form t1, t2, t3. They're never reused within a session, so the same id keeps referring to the same tab across commands. Positional integers are not accepted — tab 2 errors with a teaching message; use t2.

User-assigned labels (docs, app, admin) are interchangeable with ids everywhere a tab ref is accepted. Labels are the agent-friendly way to write multi-tab workflows:

agent-browser tab new --label docs https://docs.example.com
agent-browser tab new --label app  https://app.example.com
agent-browser tab docs                   # switch to docs
agent-browser snapshot                   # populate refs for docs
agent-browser click @e1                  # ref click on docs
agent-browser tab app                    # switch to app
agent-browser tab close docs             # close by label

Labels are never auto-generated, never rewritten on navigation, and must be unique within a session. To interact with another tab, switch to it first: the daemon maintains a single active tab, so refs (@eN) belong to the tab that was active when the snapshot ran.

Frames

agent-browser frame "#iframe"     # Switch to iframe by CSS selector
agent-browser frame @e3           # Switch to iframe by element ref
agent-browser frame main          # Back to main frame

Iframe support

Iframes are detected automatically during snapshots. When the main-frame snapshot runs, Iframe nodes are resolved and their content is inlined beneath the iframe element in the output (one level of nesting; iframes within iframes are not expanded).

agent-browser snapshot -i
# @e3 [Iframe] "payment-frame"
#   @e4 [input] "Card number"
#   @e5 [button] "Pay"

# Interact directly — refs inside iframes already work
agent-browser fill @e4 "4111111111111111"
agent-browser click @e5

# Or switch frame context for scoped snapshots
agent-browser frame @e3               # Switch using element ref
agent-browser snapshot -i             # Snapshot scoped to that iframe
agent-browser frame main              # Return to main frame

The frame command accepts:

Element refs — frame @e3 resolves the ref to an iframe element
CSS selectors — frame "#payment-iframe" finds the iframe by selector
Frame name/URL — matches against the browser's frame tree

Dialogs

By default, alert and beforeunload dialogs are automatically accepted so they never block the agent. confirm and prompt dialogs still require explicit handling. Use --no-auto-dialog to disable this behavior.

agent-browser dialog accept [text]  # Accept dialog
agent-browser dialog dismiss        # Dismiss dialog
agent-browser dialog status         # Check if a dialog is currently open

JavaScript

agent-browser eval "document.title"          # Simple expressions only
agent-browser eval -b "<base64>"             # Any JavaScript (base64 encoded)
agent-browser eval --stdin                   # Read script from stdin

Use -b/--base64 or --stdin for reliable execution. Shell escaping with nested quotes and special characters is error-prone.

# Base64 encode your script, then:
agent-browser eval -b "ZG9jdW1lbnQucXVlcnlTZWxlY3RvcignW3NyYyo9Il9uZXh0Il0nKQ=="

# Or use stdin with heredoc for multiline scripts:
cat <<'EOF' | agent-browser eval --stdin
const links = document.querySelectorAll('a');
Array.from(links).map(a => a.href);
EOF

State Management

agent-browser state save auth.json    # Save cookies, storage, auth state
agent-browser state load auth.json    # Restore saved state

Global Options

agent-browser --session <name> ...    # Isolated browser session
agent-browser --json ...              # JSON output for parsing
agent-browser --headed ...            # Show browser window (not headless)
agent-browser --full ...              # Full page screenshot (-f)
agent-browser --cdp <port> ...        # Connect via Chrome DevTools Protocol
agent-browser -p <provider> ...       # Cloud browser provider (--provider)
agent-browser --proxy <url> ...       # Use proxy server
agent-browser --proxy-bypass <hosts>  # Hosts to bypass proxy
agent-browser --headers <json> ...    # HTTP headers scoped to URL's origin
agent-browser --executable-path <p>   # Custom browser executable
agent-browser --extension <path> ...  # Load browser extension (repeatable)
agent-browser --ignore-https-errors   # Ignore SSL certificate errors
agent-browser --help                  # Show help (-h)
agent-browser --version               # Show version (-V)
agent-browser <command> --help        # Show detailed help for a command

Debugging

agent-browser --headed open example.com   # Show browser window
agent-browser --cdp 9222 snapshot         # Connect via CDP port
agent-browser connect 9222                # Alternative: connect command
agent-browser console                     # View console messages
agent-browser console --clear             # Clear console
agent-browser errors                      # View page errors
agent-browser errors --clear              # Clear errors
agent-browser highlight @e1               # Highlight element
agent-browser inspect                     # Open Chrome DevTools for this session
agent-browser trace start                 # Start recording trace
agent-browser trace stop trace.zip        # Stop and save trace
agent-browser profiler start              # Start Chrome DevTools profiling
agent-browser profiler stop trace.json    # Stop and save profile

React / Web Vitals

Requires --enable react-devtools at launch for the react ... commands. vitals and pushstate are framework-agnostic.

agent-browser open --enable react-devtools <url>    # Launch with React hook installed
agent-browser react tree                            # Full component tree
agent-browser react inspect <fiberId>               # Props, hooks, state, source
agent-browser react renders start                   # Begin re-render recording
agent-browser react renders stop [--json]           # Stop and print render profile
agent-browser react suspense [--only-dynamic] [--json]  # Suspense boundaries + classifier
                                                         # --only-dynamic hides the "static" list
agent-browser vitals [url] [--json]                 # LCP/CLS/TTFB/FCP/INP + hydration
agent-browser pushstate <url>                       # SPA client-side nav (auto-detects Next router)

Init scripts

agent-browser open --init-script <path>             # Register before first navigation (repeatable)
agent-browser addinitscript <js>                    # Register at runtime (returns identifier)
agent-browser removeinitscript <identifier>         # Remove a previously registered init script

agent-browser cookies set --curl <file>                             # Auto-detects JSON/cURL/Cookie-header
agent-browser cookies set --curl <file> --domain example.com        # Scope to a domain

Supported formats: JSON array of {name, value}, a cURL dump from DevTools -> Network -> Copy as cURL, or a bare Cookie header. Errors never echo cookie values.

Network route by resource type

agent-browser network route '*' --abort --resource-type script       # Block scripts only (SSR-lock pattern)
agent-browser network route '*' --resource-type image,font --body '' # Stub images and fonts

Environment Variables

AGENT_BROWSER_SESSION="mysession"            # Default session name
AGENT_BROWSER_EXECUTABLE_PATH="/path/chrome" # Custom browser path
AGENT_BROWSER_EXTENSIONS="/ext1,/ext2"       # Comma-separated extension paths
AGENT_BROWSER_INIT_SCRIPTS="/a.js,/b.js"     # Comma-separated init script paths
AGENT_BROWSER_ENABLE="react-devtools"        # Comma-separated built-in init script features
AGENT_BROWSER_PROVIDER="browserbase"         # Cloud browser provider
AGENT_BROWSER_STREAM_PORT="9223"             # Override WebSocket streaming port (default: OS-assigned)
AGENT_BROWSER_HOME="/path/to/agent-browser"  # Custom install location

16 KiB

Raw Blame History

Command Reference

Navigation

Pre-navigation setup (one-turn batch)

Snapshot (page analysis)

Interactions (use @refs from snapshot)

Get Information

Check State

Screenshots and PDF

Video Recording

Wait

Mouse Control

Semantic Locators (alternative to refs)

Browser Settings

Cookies and Storage

Network

Tabs and Windows

Frames

Iframe support

Dialogs

JavaScript

State Management

Global Options

Debugging

React / Web Vitals

Init scripts

Network route by resource type

Environment Variables

16 KiB Raw Blame History

Command Reference

Navigation

Pre-navigation setup (one-turn batch)

Snapshot (page analysis)

Interactions (use @refs from snapshot)

Get Information

Check State

Screenshots and PDF

Video Recording

Wait

Mouse Control

Semantic Locators (alternative to refs)

Browser Settings

Cookies and Storage

Network

Tabs and Windows

Frames

Iframe support

Dialogs

JavaScript

State Management

Global Options

Debugging

React / Web Vitals

Init scripts

cURL cookie import

Network route by resource type

Environment Variables

16 KiB

Raw Blame History