src — browser-automation

Module: src-browser-automation Cohesion: 0.80 Members: 0

src — browser-automation

The src/browser-automation module provides a robust, AI-agent-friendly interface for controlling web browsers. Inspired by the Native Engine project, it aims to offer a unified and powerful set of tools for web interaction, data extraction, and environment manipulation.

Purpose and Overview

This module serves as the primary interface for AI agents to interact with web content. It abstracts away the complexities of browser automation frameworks, offering a high-level, action-oriented API. Key capabilities include:

The module is built on Playwright, leveraging its capabilities for reliable cross-browser automation and Chrome DevTools Protocol (CDP) control.

Architecture

The module's architecture is designed with a clear separation of concerns:

  1. BrowserManager: The core engine that directly interacts with the Playwright API. It manages browser instances, contexts, pages, and implements all the low-level automation logic. It also acts as an EventEmitter to broadcast browser-related events.
  2. BrowserTool: A facade that provides a simplified, action-based interface for AI agents. It translates high-level BrowserAction inputs into calls to BrowserManager methods, handles input validation, and formats results into a standard ToolResult structure.
  3. Supporting Subsystems: Specialized modules for tasks like profile management, route interception, screenshot annotation, and Chrome discovery, which BrowserManager or BrowserTool integrate as needed.
graph TD
    A[AI Agent / Client] --> B{BrowserTool.execute(action)}
    B --> C[BrowserManager]
    C --> D[Playwright API]
    C --> E[BrowserProfileManager]
    C --> F[RouteInterceptor]
    C --> G[ScreenshotAnnotator]
    B --> H[Chrome Discovery]
    B --> I[Built-in Profiles]
    C --> J[EventEmitter]
    J --> A

Key Components

BrowserManager (browser-manager.ts)

This class is the heart of the browser automation. It encapsulates the Playwright browser instance (Browser), context (BrowserContext), and manages multiple pages (Page) as tabs.

Core Responsibilities:

BrowserTool (browser-tool.ts)

This class acts as the public API for AI agents. It exposes a single execute method that takes a BrowserAction and its associated parameters.

Core Responsibilities:

BrowserProfileManager (profile-manager.ts)

Manages the persistence of browser state to disk.

Core Responsibilities:

RouteInterceptor (route-interceptor.ts)

Provides fine-grained control over network requests made by the browser.

Core Responsibilities:

ScreenshotAnnotator (screenshot-annotator.ts)

A utility for enhancing screenshots.

Core Responsibilities:

Chrome Discovery & Built-in Profiles (chrome-discovery.ts, builtin-profiles.ts)

These modules facilitate connecting to existing browser instances.

Types (types.ts)

This file defines all the interfaces and types used across the browser automation module, ensuring consistency and type safety. This includes:

Integration Points

The browser automation module integrates with several other parts of the codebase and external libraries:

Usage Considerations