Your agent reads the screen as stable element ids, not pixels, verifies every action, and routes it through one safety gate. Local, any model.
$
Then run clawdcursor consent --accept. On macOS also run clawdcursor grant. Done.
Same tools, two entry shapes. Pick once at install.
AI lives in your editor. It spawns clawdcursor over stdio. No daemon, no port.
{
"mcpServers": {
"clawdcursor": {
"command": "clawdcursor",
"args": ["mcp", "--compact"]
}
}
}
HTTP MCP on 127.0.0.1:3847/mcp. doctor then agent runs the built-in autonomous loop; agent --no-llm serves tools only when your agent has its own brain.
A11y tree before pixels. Vision only when needed.
Fuse the a11y tree + OCR into one confidence-scored el_NN map. Act on an element by stable id. No image bytes to the model. Vision is last resort.
OCR when the tree is sparse. A screenshot only when you truly need pixels: canvas-only apps or spatial reasoning.
Pass expect and the action confirms its outcome, reporting a DEVIATION if the UI didn't obey. Every call routes through one safety layer.
Windows x64/ARM64 · macOS 12+ · Linux X11/Wayland
Platform-aware key combos. Cmd on macOS, Ctrl elsewhere. No LLM cost.
Collapse N deterministic tool calls into a single guarded, safety-gated batch. N calls → 1.
| Compound | Purpose | Actions |
|---|---|---|
computer |
Mouse, keyboard, screenshots. Raw I/O. | screenshot · click · double_click · right_click · triple_click · hover · scroll · scroll_horizontal · drag · drag_path · type · key · wait |
accessibility |
Drive UI by element name, not pixel. Survives DPI, resize, layout shifts. | read_tree · find · get_element · focused · invoke · focus · set_value · get_value · expand · collapse · toggle · select · state · list_children · wait_for |
window |
Launch, focus, resize. App-level state. | list · active · focus · maximize · minimize · restore · close · resize · list_displays · screen_size · open_app · open_file · open_url · switch_tab · navigate |
system |
Clipboard, OCR, shortcuts, undo, webview detection, CDP relaunch, task delegation. The meta surface for an external brain. | clipboard_read · clipboard_write · system_time · ocr · undo · shortcuts_list · shortcuts_run · delegate · detect_webview · relaunch_with_cdp · system_prompt |
browser |
Chrome DevTools Protocol: real DOM access for Electron / WebView2 apps whose a11y tree is sparse. | connect · page_context · read_text · click · type · select_option · evaluate · wait_for · list_tabs · switch_tab · scroll |
task |
Hand the whole task to the autonomous loop. Daemon mode only: needs clawdcursor agent with an LLM configured. |
single arg: { instruction: string }, no action enum |
Compact (~1,500 tokens): computer({ "action": "key", "combo": "mod+s" }).
Granular: key_press({ "key": "mod+s" }).
Both hit the same safety.evaluate() chokepoint. Pass --granular for the granular surface.
See schema.snapshot.json for every parameter.
# Install & setup
clawdcursor consent # one-time desktop-control authorization (always required)
clawdcursor grant # macOS only: Accessibility + Screen Recording prompts. MCP setup ends here.
clawdcursor doctor # ONLY for `agent` mode: configures the daemon's built-in LLM (+ diagnostics)
clawdcursor status # readiness check (consent, permissions, AI config)
# Run
clawdcursor mcp # stdio MCP server for editor hosts
clawdcursor mcp --compact # same, with 7 compound tools (recommended)
clawdcursor agent # HTTP MCP daemon at :3847/mcp, optional built-in LLM
clawdcursor agent --no-llm # tool surface only: your agent brings its own brain
clawdcursor stop # stop every running mode
clawdcursor uninstall # remove all clawdcursor config and data
Open source. Any model. Localhost-only by default, enforced. No telemetry.
Star on GitHub