🔒 Fully self-hosted
The daemon runs on your Mac; your screen never leaves home. Password + bearer auth, login rate-limiting.
Give AI agents (and your browser) eyes and hands on a real iPhone. Live WebRTC remote control in the browser; agents use one HTTP API to screenshot, read the UI element tree, tap and type — CJK text lands in one shot. Runs entirely on your own Mac, no third-party cloud.
One command installs the daemon; one more teaches your agent to drive the phone.
One agent API; the daemon auto-routes to the best path — fast paths get faster, the fallback never goes away.
One curated bridge shortcut reaches native iOS APIs — battery, Health, location… Structured JSON comes back: fastest, deterministic, no vision needed.
Reads iOS's own accessibility tree: tap by element label, send Unicode straight into fields — no host-cursor contention, no coordinate drift, CJK lands clean. The agent keeps seeing and acting even while a human holds the phone.
Screen stream + system-level input over iPhone Mirroring — the universal fallback that can see and tap anything. Also the human's low-latency WebRTC remote desktop.
Same phone, same Pinyin keyboard — the real recorded results of both paths:
Three moves: see (elements or screenshot) → act → verify. A ready-made MCP server (9 tools) plugs straight into Claude.
The daemon runs on your Mac; your screen never leaves home. Password + bearer auth, login rate-limiting.
VideoToolbox H.264 over WebRTC — near-native latency in the browser; Cloudflare TURN for cross-network.
One config line for Claude Desktop / Claude Code; npx skills add teaches agents the “vision once → script forever” method.
/agent/elements turns the screen into an element list — reasoning gets drastically cheaper, labels feed straight into taps.
Scroll must be a wheel event, Mirroring only takes keycodes, focus races, TCC signing… every pitfall is in the docs.
A stable local signing identity — upgrades and reinstalls never re-prompt Screen Recording / Accessibility.