Browser Automation

Default Active Automation

CDP Chrome automation — snapshot, click, navigate, screenshot, tabs. Control Chrome through ref-based element targeting with cli-jaw browser commands.

This skill is active by default and injected into every session. It requires cli-jaw serve to be running and Google Chrome to be installed.

Quick Reference

Skill namebrowser
CategoryAutomation
StatusActive (loaded by default)
Requirescli-jaw binary, Google Chrome
Optionalcliclick (Homebrew, for coordinate-based clicks)
Screenshots~/.cli-jaw/screenshots/
Related skillsdesktop-control, web-ai, vision-click
Skill file~/.cli-jaw/skills/browser/SKILL.md

Prerequisites

Core Workflow

All browser automation follows the same loop: take a snapshot to see interactive elements, act on them by ref ID, then snapshot again to verify the result.

snapshot --interactive act by ref or key snapshot verify

Use a fresh snapshot after navigation, reload, tab changes, or any action that substantially changes the page. Ref IDs belong to the latest usable snapshot and can go stale.

Quick Start

# Start an automation session (headless, no visible window)
cli-jaw browser start --agent

# Start an interactive browser (manual, visible)
cli-jaw browser start

# Navigate to a URL
cli-jaw browser navigate "https://example.com"

# Take an interactive snapshot (shows ref IDs)
cli-jaw browser snapshot --interactive

# Click a ref
cli-jaw browser click e3

# Type text and submit
cli-jaw browser type e5 "hello" --submit

# Take a screenshot
cli-jaw browser screenshot

Snapshot Output

The snapshot --interactive command returns a compact list of interactive elements on the page, each with a short ref ID you use in subsequent commands.

e1   link        "Gmail"
e2   link        "Images"
e3   textbox     "Search"             ← To type here: type e3 "query"
e4   button      "Google Search"      ← To click: click e4
e5   button      "I'm Feeling Lucky"

Ref IDs like e3 are short-lived and scoped to the latest snapshot. Always re-snapshot after page changes.

Command Reference

Browser Management

CommandDescription
cli-jaw browser start [--port <auto>] [--headless] [--agent]Launch Chrome. --agent = headless automation; plain = interactive visible.
cli-jaw browser stopClose the browser session.
cli-jaw browser statusReport current CDP connection state and port.
cli-jaw browser doctor [--json]Diagnose CDP/runtime ownership mismatch and orphan cleanup scope.
cli-jaw browser cleanup-runtimes [--json] [--close --force]Dry-run by default. Only closes jaw-owned orphan runtimes with --close --force.
cli-jaw browser reset [--force]Clear browser profile and screenshots. Only when explicitly requested.

Observe

CommandDescription
cli-jaw browser snapshotFull page snapshot (all nodes).
cli-jaw browser snapshot --interactiveInteractive elements only with ref IDs. Preferred for token budget.
cli-jaw browser snapshot --interactive --max-nodes 30 --jsonLimit node count and output as JSON.
cli-jaw browser screenshotCapture current viewport. Saves to ~/.cli-jaw/screenshots/.
cli-jaw browser screenshot --full-pageFull-page screenshot (scrolled).
cli-jaw browser screenshot --ref e5Screenshot a specific element by ref ID.
cli-jaw browser screenshot --clip 0 0 320 180 --jsonClip region screenshot with JSON output path.
cli-jaw browser textExtract all visible text from the page.
cli-jaw browser text --format htmlExtract page content as HTML.
cli-jaw browser get-dom --selector ".card" --max-chars 2000 --jsonGet DOM subtree matching a CSS selector.
cli-jaw browser console --json --limit 20Retrieve console log entries.
cli-jaw browser network --json --limit 20Retrieve network request log.

Act

CommandDescription
cli-jaw browser click e3Click element by ref ID.
cli-jaw browser click e3 --doubleDouble-click element.
cli-jaw browser click e3 --rightRight-click (context menu).
cli-jaw browser type e3 "hello"Type text into an input field.
cli-jaw browser type e3 "hello" --submitType text then press Enter.
cli-jaw browser press EnterPress a key (Enter, Escape, Tab, etc.).
cli-jaw browser hover e5Hover over an element.
cli-jaw browser select e7 "option1"Select an option in a dropdown.
cli-jaw browser drag e3 e5Drag from one element to another.
cli-jaw browser mouse-click 400 300Click at absolute coordinates.
cli-jaw browser mouse-click 400 300 --doubleDouble-click at coordinates.
cli-jaw browser move-mouse 400 300Move mouse to coordinates (no click).
cli-jaw browser mouse-downPress mouse button down.
cli-jaw browser mouse-up --rightRelease mouse button (optionally right).

Navigate and Inspect

CommandDescription
cli-jaw browser navigate "https://example.com"Go to a URL.
cli-jaw browser open "https://example.com"Alias for navigate.
cli-jaw browser tabsList all open tabs.
cli-jaw browser tabs --jsonList tabs as JSON.
cli-jaw browser active-tab --jsonGet active tab info as JSON.
cli-jaw browser tab-switch 2Switch to tab by index.
cli-jaw browser reloadReload the current page.
cli-jaw browser resize 1440 900Resize the viewport.
cli-jaw browser scroll --x 0 --y 1000Scroll to absolute position.
cli-jaw browser wait-for-selector ".toast-success" --timeout 30000Wait until a CSS selector appears.
cli-jaw browser wait-for-text "Dashboard" --timeout 30000Wait until specific text appears on the page.
cli-jaw browser evaluate "document.title"Evaluate JavaScript in the page context.
evaluate is a diagnostic command. Do not expose arbitrary user-provided JavaScript through higher-level vendor workflows such as web-ai.

Support Labels

SurfaceLabelNotes
Local cli-jaw browser primitivesreadyServer-backed local Chrome/CDP only
doctor and cleanup-runtimesreadyDry-run by default; close requires --force
Dashboard visible/headless start splitreadyVisible manual and headless agent modes are separate
web-ai provider workflowsbetaUse the web-ai skill and provider-specific gates
External hosted/cloud CDPdeferredDo not claim remote browser hosting support

Rules and Constraints

Common Workflows

Web Search

cli-jaw browser start --agent
cli-jaw browser navigate "https://www.google.com"
cli-jaw browser snapshot --interactive
cli-jaw browser type e3 "search query" --submit
cli-jaw browser snapshot --interactive
cli-jaw browser click e7

Form Filling

cli-jaw browser snapshot --interactive
cli-jaw browser type e1 "John Doe"
cli-jaw browser type e2 "john@example.com"
cli-jaw browser click e3           # Submit button
cli-jaw browser snapshot           # Verify submission

Read Page Content

cli-jaw browser navigate "https://news.ycombinator.com"
cli-jaw browser text               # Plain text extraction
cli-jaw browser text --format html # HTML content
cli-jaw browser snapshot --interactive

AI Web Workflows

For ChatGPT or other web-ai workflows, use the web-ai skill. The browser skill owns primitive page control; web-ai owns structured question rendering, active-tab safety, and response baseline handling.

~해줌 Examples

Natural language requests that trigger the browser skill. Use Korean or English — both work.

Example 1
"구글 검색해줌 — CLI-JAW 설치 방법"
Starts a headless browser, navigates to Google, types the search query, and returns results. Triggers browser start --agent + navigate + type --submit + snapshot.
Example 2
"이 웹사이트 스크린샷 찍어줌" / "이 페이지 캡처해줌"
Navigates to the given URL, takes a full-page screenshot, and returns the saved file path. Uses navigate + screenshot --full-page.
Example 3
"이 페이지에서 로그인 폰 채워줌 — ID: admin, PW: test1234"
Snapshots to find the login form fields, types the credentials into the username and password inputs, and submits. Uses snapshot --interactive + type + click.
Example 4
"탭 목록 보여줌" / "열 려있는 탭 중에 구글 탭으로 전환해줌"
Lists open tabs with tabs, then switches to the matching tab with tab-switch. Multi-tab management workflow.
Example 5
"브라우저 상태 확인해줌" / "Chrome 안 되면 진단해줌"
Runs browser status and browser doctor --json to check CDP connectivity, port ownership, and orphan runtime state. Diagnostic workflow.

Headless Mode

Three ways to run Chrome headless:

# Recommended for automation
cli-jaw browser start --agent

# Explicit headless for manual mode
cli-jaw browser start --headless

# Via environment variable
CHROME_HEADLESS=1 cli-jaw browser start

Use --agent for all agent automation. It avoids popping a visible browser window and is the expected default in automated sessions.

Environment Variables

VariableDescription
CHROME_HEADLESS=1Enable headless mode for manual starts.
CHROME_NO_SANDBOX=1Disable Chrome sandbox. Docker/CI only — do not use on regular machines.

The default CDP port is derived from the cli-jaw server port. Use cli-jaw browser start --port <port> only when you need an explicit override.

Standalone agbrowse Alternative

When you explicitly want to drive a single Chrome instance (e.g., keep one logged-in profile open, avoid running both cli-jaw serve and a second CDP session), the same browser commands are available through the standalone agbrowse CLI (npm install -g agbrowse). The flag surface is identical; only the binary prefix changes.

cli-jaw browser form
agbrowse form
cli-jaw browser start --agent
agbrowse start
cli-jaw browser status
agbrowse status
cli-jaw browser navigate "<url>"
agbrowse navigate "<url>"
cli-jaw browser snapshot --interactive
agbrowse snapshot --interactive
cli-jaw browser click e3
agbrowse click e3
cli-jaw browser type e5 "hello" --submit
agbrowse type e5 "hello" --submit
cli-jaw browser screenshot
agbrowse screenshot
cli-jaw browser stop
agbrowse stop
Do not run cli-jaw browser and agbrowse against the same --port simultaneously — the second start will reuse the first CDP and the persisted state files can collide.

Runtime Cleanup

# Diagnose runtime state
cli-jaw browser doctor --json

# Dry-run: see what would be cleaned
cli-jaw browser cleanup-runtimes

# Actually close orphan jaw-owned runtimes
cli-jaw browser cleanup-runtimes --close --force

cleanup-runtimes is intentionally conservative. It only acts on a durable browser-runtime-owner.json record written by jaw-owned Chrome launches. The process command line must still match the recorded PID, CDP port, and profile. Chrome helper processes containing --type= are rejected. Never use general ps output as proof that a Chrome process is safe to close.

Recovery Strategy

When things go wrong, stop and inspect state before taking the next action.

1
Snapshot fails — take a screenshot for visual inspection of what is actually on the page.
2
Ref not found — re-run snapshot --interactive. Refs go stale after page changes.
3
Async UI not ready — use wait-for-selector or wait-for-text with an appropriate timeout.
4
CDP connection fails — report the exact error, then use status. Only stop/start when that is the selected recovery path.
5
Chrome/profile is stuck — ask before reset unless the user already requested a destructive reset.
6
DOM ref unavailable — use the vision-click skill as an explicit fallback only after confirming no usable ref exists.

Troubleshooting

SymptomCauseFix
CDP connection refusedChrome not started or wrong portcli-jaw browser status, then start with the expected port
running:false with owner:jaw-ownedStale runtime metadata or dead CDPcli-jaw browser doctor, then retry browser start
Old headless jaw Chrome remainsDurable jaw-owned orphan candidatecli-jaw browser cleanup-runtimes dry-run, then --close --force
Window only opens test browserChrome singleton absorbed launchClose all Chrome windows, then use start --agent
Headless CDP not openingHeadless not requested in GUI-less envAdd --headless or use --agent
Port conflictAnother process owns the CDP portChoose a different --port
Snapshot too largePage has many nodesUse --interactive or --max-nodes to limit output

Planned Runtime Delta

The following commands are planned parity targets from the 30_browser reference design. They are not available in the current runtime unless explicitly upgraded in a later PRD.

Planned Observe and Diagnostics

cli-jaw browser console --clear --reload --duration 3000
cli-jaw browser network --reload --duration 1000
cli-jaw browser wait 2000

Planned Actions

cli-jaw browser resize 0 0 --fullscreen
cli-jaw browser scroll down
cli-jaw browser scroll up --amount 1000
wait-for <ref> is deprecated in the reference design because refs are snapshot-scoped. Prefer selector-based or text-based waits.

Related Skills

SkillRelationship
desktop-controlUnified desktop + browser automation. Routes DOM targets to CDP (browser skill), desktop apps to Computer Use.
web-aiAI-powered web workflows (e.g., ChatGPT). Uses browser primitives but adds structured question rendering and response handling.
vision-clickVision-based click targeting. Fallback for non-DOM elements (Canvas, WebGL, cross-origin iframes).
screen-captureScreen capture and recording. Complements browser screenshots with system-level capture.

Notes