Browser Automation
CDP Chrome automation — snapshot, click, navigate, screenshot, tabs. Control Chrome through ref-based element targeting with cli-jaw browser commands.
cli-jaw serve to be running and Google Chrome to be installed.Quick Reference
| Skill name | browser |
| Category | Automation |
| Status | Active (loaded by default) |
| Requires | cli-jaw binary, Google Chrome |
| Optional | cliclick (Homebrew, for coordinate-based clicks) |
| Screenshots | ~/.cli-jaw/screenshots/ |
| Related skills | desktop-control, web-ai, vision-click |
| Skill file | ~/.cli-jaw/skills/browser/SKILL.md |
Prerequisites
cli-jaw servemust be running — the browser skill communicates through the cli-jaw server.- Google Chrome must be installed on the machine.
playwright-coremust be installed in the cli-jaw project (used for CDP communication).- Optional:
brew install cliclickfor coordinate-based mouse operations.
Core Workflow
All browser automation follows the same loop: take a snapshot to see interactive elements, act on them by ref ID, then snapshot again to verify the result.
Use a fresh snapshot after navigation, reload, tab changes, or any action that substantially changes the page. Ref IDs belong to the latest usable snapshot and can go stale.
Quick Start
# Start an automation session (headless, no visible window)
cli-jaw browser start --agent
# Start an interactive browser (manual, visible)
cli-jaw browser start
# Navigate to a URL
cli-jaw browser navigate "https://example.com"
# Take an interactive snapshot (shows ref IDs)
cli-jaw browser snapshot --interactive
# Click a ref
cli-jaw browser click e3
# Type text and submit
cli-jaw browser type e5 "hello" --submit
# Take a screenshot
cli-jaw browser screenshot
Snapshot Output
The snapshot --interactive command returns a compact list of interactive elements on the page, each with a short ref ID you use in subsequent commands.
e2 link "Images"
e3 textbox "Search" ← To type here: type e3 "query"
e4 button "Google Search" ← To click: click e4
e5 button "I'm Feeling Lucky"
Ref IDs like e3 are short-lived and scoped to the latest snapshot. Always re-snapshot after page changes.
Command Reference
Browser Management
| Command | Description |
|---|---|
cli-jaw browser start [--port <auto>] [--headless] [--agent] | Launch Chrome. --agent = headless automation; plain = interactive visible. |
cli-jaw browser stop | Close the browser session. |
cli-jaw browser status | Report current CDP connection state and port. |
cli-jaw browser doctor [--json] | Diagnose CDP/runtime ownership mismatch and orphan cleanup scope. |
cli-jaw browser cleanup-runtimes [--json] [--close --force] | Dry-run by default. Only closes jaw-owned orphan runtimes with --close --force. |
cli-jaw browser reset [--force] | Clear browser profile and screenshots. Only when explicitly requested. |
Observe
| Command | Description |
|---|---|
cli-jaw browser snapshot | Full page snapshot (all nodes). |
cli-jaw browser snapshot --interactive | Interactive elements only with ref IDs. Preferred for token budget. |
cli-jaw browser snapshot --interactive --max-nodes 30 --json | Limit node count and output as JSON. |
cli-jaw browser screenshot | Capture current viewport. Saves to ~/.cli-jaw/screenshots/. |
cli-jaw browser screenshot --full-page | Full-page screenshot (scrolled). |
cli-jaw browser screenshot --ref e5 | Screenshot a specific element by ref ID. |
cli-jaw browser screenshot --clip 0 0 320 180 --json | Clip region screenshot with JSON output path. |
cli-jaw browser text | Extract all visible text from the page. |
cli-jaw browser text --format html | Extract page content as HTML. |
cli-jaw browser get-dom --selector ".card" --max-chars 2000 --json | Get DOM subtree matching a CSS selector. |
cli-jaw browser console --json --limit 20 | Retrieve console log entries. |
cli-jaw browser network --json --limit 20 | Retrieve network request log. |
Act
| Command | Description |
|---|---|
cli-jaw browser click e3 | Click element by ref ID. |
cli-jaw browser click e3 --double | Double-click element. |
cli-jaw browser click e3 --right | Right-click (context menu). |
cli-jaw browser type e3 "hello" | Type text into an input field. |
cli-jaw browser type e3 "hello" --submit | Type text then press Enter. |
cli-jaw browser press Enter | Press a key (Enter, Escape, Tab, etc.). |
cli-jaw browser hover e5 | Hover over an element. |
cli-jaw browser select e7 "option1" | Select an option in a dropdown. |
cli-jaw browser drag e3 e5 | Drag from one element to another. |
cli-jaw browser mouse-click 400 300 | Click at absolute coordinates. |
cli-jaw browser mouse-click 400 300 --double | Double-click at coordinates. |
cli-jaw browser move-mouse 400 300 | Move mouse to coordinates (no click). |
cli-jaw browser mouse-down | Press mouse button down. |
cli-jaw browser mouse-up --right | Release mouse button (optionally right). |
Navigate and Inspect
| Command | Description |
|---|---|
cli-jaw browser navigate "https://example.com" | Go to a URL. |
cli-jaw browser open "https://example.com" | Alias for navigate. |
cli-jaw browser tabs | List all open tabs. |
cli-jaw browser tabs --json | List tabs as JSON. |
cli-jaw browser active-tab --json | Get active tab info as JSON. |
cli-jaw browser tab-switch 2 | Switch to tab by index. |
cli-jaw browser reload | Reload the current page. |
cli-jaw browser resize 1440 900 | Resize the viewport. |
cli-jaw browser scroll --x 0 --y 1000 | Scroll to absolute position. |
cli-jaw browser wait-for-selector ".toast-success" --timeout 30000 | Wait until a CSS selector appears. |
cli-jaw browser wait-for-text "Dashboard" --timeout 30000 | Wait until specific text appears on the page. |
cli-jaw browser evaluate "document.title" | Evaluate JavaScript in the page context. |
evaluate is a diagnostic command. Do not expose arbitrary user-provided JavaScript through higher-level vendor workflows such as web-ai.Support Labels
| Surface | Label | Notes |
|---|---|---|
Local cli-jaw browser primitives | ready | Server-backed local Chrome/CDP only |
doctor and cleanup-runtimes | ready | Dry-run by default; close requires --force |
| Dashboard visible/headless start split | ready | Visible manual and headless agent modes are separate |
| web-ai provider workflows | beta | Use the web-ai skill and provider-specific gates |
| External hosted/cloud CDP | deferred | Do not claim remote browser hosting support |
Rules and Constraints
- Ref IDs are ephemeral. They are scoped to the latest snapshot. Always re-run
snapshot --interactiveafter navigation, reload, tab changes, or any action that changes the page. - Prefer
--interactivefor snapshots to save token budget — full snapshots include non-interactive nodes and can be large. - Use
--agentfor automation sessions. It starts headless and avoids popping a visible browser window. - Never close Chrome processes found via general
psoutput. Only usecleanup-runtimes --close --forcewhich validates jaw-owned processes via durable runtime records. - Ask before
reset. Theresetcommand clears the browser profile and screenshots. Only use when the user explicitly requests it. evaluateis for diagnostics only. Do not pipe arbitrary user-provided JavaScript throughevaluatein higher-level workflows.- Non-DOM elements (Canvas, WebGL, cross-origin iframes, custom UI) should fall back to the
vision-clickskill only after confirming no usable ref exists.
Common Workflows
Web Search
cli-jaw browser start --agent
cli-jaw browser navigate "https://www.google.com"
cli-jaw browser snapshot --interactive
cli-jaw browser type e3 "search query" --submit
cli-jaw browser snapshot --interactive
cli-jaw browser click e7
Form Filling
cli-jaw browser snapshot --interactive
cli-jaw browser type e1 "John Doe"
cli-jaw browser type e2 "john@example.com"
cli-jaw browser click e3 # Submit button
cli-jaw browser snapshot # Verify submission
Read Page Content
cli-jaw browser navigate "https://news.ycombinator.com"
cli-jaw browser text # Plain text extraction
cli-jaw browser text --format html # HTML content
cli-jaw browser snapshot --interactive
AI Web Workflows
For ChatGPT or other web-ai workflows, use the web-ai skill. The browser skill owns primitive page control; web-ai owns structured question rendering, active-tab safety, and response baseline handling.
~해줌 Examples
Natural language requests that trigger the browser skill. Use Korean or English — both work.
"구글 검색해줌 — CLI-JAW 설치 방법"
browser start --agent + navigate + type --submit + snapshot."이 웹사이트 스크린샷 찍어줌" / "이 페이지 캡처해줌"
navigate + screenshot --full-page."이 페이지에서 로그인 폰 채워줌 — ID: admin, PW: test1234"
snapshot --interactive + type + click."탭 목록 보여줌" / "열 려있는 탭 중에 구글 탭으로 전환해줌"
tabs, then switches to the matching tab with tab-switch. Multi-tab management workflow."브라우저 상태 확인해줌" / "Chrome 안 되면 진단해줌"
browser status and browser doctor --json to check CDP connectivity, port ownership, and orphan runtime state. Diagnostic workflow.Headless Mode
Three ways to run Chrome headless:
# Recommended for automation
cli-jaw browser start --agent
# Explicit headless for manual mode
cli-jaw browser start --headless
# Via environment variable
CHROME_HEADLESS=1 cli-jaw browser start
Use --agent for all agent automation. It avoids popping a visible browser window and is the expected default in automated sessions.
Environment Variables
| Variable | Description |
|---|---|
CHROME_HEADLESS=1 | Enable headless mode for manual starts. |
CHROME_NO_SANDBOX=1 | Disable Chrome sandbox. Docker/CI only — do not use on regular machines. |
The default CDP port is derived from the cli-jaw server port. Use cli-jaw browser start --port <port> only when you need an explicit override.
Standalone agbrowse Alternative
When you explicitly want to drive a single Chrome instance (e.g., keep one logged-in profile open, avoid running both cli-jaw serve and a second CDP session), the same browser commands are available through the standalone agbrowse CLI (npm install -g agbrowse). The flag surface is identical; only the binary prefix changes.
cli-jaw browser and agbrowse against the same --port simultaneously — the second start will reuse the first CDP and the persisted state files can collide.Runtime Cleanup
# Diagnose runtime state
cli-jaw browser doctor --json
# Dry-run: see what would be cleaned
cli-jaw browser cleanup-runtimes
# Actually close orphan jaw-owned runtimes
cli-jaw browser cleanup-runtimes --close --force
cleanup-runtimes is intentionally conservative. It only acts on a durable browser-runtime-owner.json record written by jaw-owned Chrome launches. The process command line must still match the recorded PID, CDP port, and profile. Chrome helper processes containing --type= are rejected. Never use general ps output as proof that a Chrome process is safe to close.
Recovery Strategy
When things go wrong, stop and inspect state before taking the next action.
screenshot for visual inspection of what is actually on the page.snapshot --interactive. Refs go stale after page changes.wait-for-selector or wait-for-text with an appropriate timeout.status. Only stop/start when that is the selected recovery path.reset unless the user already requested a destructive reset.vision-click skill as an explicit fallback only after confirming no usable ref exists.Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
| CDP connection refused | Chrome not started or wrong port | cli-jaw browser status, then start with the expected port |
running:false with owner:jaw-owned | Stale runtime metadata or dead CDP | cli-jaw browser doctor, then retry browser start |
| Old headless jaw Chrome remains | Durable jaw-owned orphan candidate | cli-jaw browser cleanup-runtimes dry-run, then --close --force |
| Window only opens test browser | Chrome singleton absorbed launch | Close all Chrome windows, then use start --agent |
| Headless CDP not opening | Headless not requested in GUI-less env | Add --headless or use --agent |
| Port conflict | Another process owns the CDP port | Choose a different --port |
| Snapshot too large | Page has many nodes | Use --interactive or --max-nodes to limit output |
Planned Runtime Delta
The following commands are planned parity targets from the 30_browser reference design. They are not available in the current runtime unless explicitly upgraded in a later PRD.
Planned Observe and Diagnostics
cli-jaw browser console --clear --reload --duration 3000
cli-jaw browser network --reload --duration 1000
cli-jaw browser wait 2000
Planned Actions
cli-jaw browser resize 0 0 --fullscreen
cli-jaw browser scroll down
cli-jaw browser scroll up --amount 1000
wait-for <ref> is deprecated in the reference design because refs are snapshot-scoped. Prefer selector-based or text-based waits.Related Skills
| Skill | Relationship |
|---|---|
desktop-control | Unified desktop + browser automation. Routes DOM targets to CDP (browser skill), desktop apps to Computer Use. |
web-ai | AI-powered web workflows (e.g., ChatGPT). Uses browser primitives but adds structured question rendering and response handling. |
vision-click | Vision-based click targeting. Fallback for non-DOM elements (Canvas, WebGL, cross-origin iframes). |
screen-capture | Screen capture and recording. Complements browser screenshots with system-level capture. |
Notes
- Ref IDs are short-lived and should be treated as latest-snapshot scoped.
- Always re-run
snapshot --interactiveafter navigation or major page changes. - Prefer
--interactivefor token budget. - Screenshots save to
~/.cli-jaw/screenshots/. start --agentshould be the default for agent automation.- Non-DOM elements such as Canvas, WebGL, cross-origin iframes, and custom UI should use the
vision-clickskill only as an explicit fallback.