src — browser

Module: src-browser Cohesion: 0.80 Members: 0

src — browser

The src/browser module provides comprehensive browser automation capabilities, catering to two distinct use cases:

  1. Full Browser Automation: Leveraging the Chrome DevTools Protocol (CDP) for headless or headful control of a real Chrome/Chromium instance, offering a Puppeteer/Playwright-like API. This is handled by controller.ts.
  2. Terminal-Embedded Browsing: A lightweight, text-based browser experience designed for terminal environments, capable of fetching content, extracting information, and generating basic screenshots without a full browser process. This is handled by embedded-browser.ts.

This dual approach allows the system to choose the appropriate level of browser interaction based on the context and available resources.


1. CDP-based Browser Automation (controller.ts)

This part of the module provides robust control over a Chrome/Chromium browser instance using the Chrome DevTools Protocol (CDP). It abstracts the low-level WebSocket communication into higher-level, developer-friendly APIs for managing browser processes, pages, and interactions.

Architecture Overview

The CDP-based browser automation is structured into three main components:

graph TD
    A[BrowserController] -- Manages Browser Process --> B(Chrome/Chromium);
    A -- Establishes --> C[CDPConnection];
    C -- Communicates via WebSocket --> B;
    A -- Creates --> D[PageController];
    D -- Uses --> C;
    D -- Emits Events (console, dialog) --> E[Application Logic];
    A -- Emits Events (close) --> E;

Key Components

CDPConnection

The CDPConnection class is the foundational layer for interacting with the Chrome DevTools Protocol. It manages a WebSocket connection to the browser's DevTools endpoint, facilitating the sending of CDP commands and the reception of CDP events.

Internal Mechanism: The class maintains pendingMessages to track outstanding command requests and eventListeners to dispatch incoming events to registered callbacks.

PageController

The PageController class encapsulates the functionality related to a single browser tab or page. It provides a high-level, Puppeteer/Playwright-style API for common browser interactions. Each PageController instance is associated with a specific browser target (tab) and uses a CDPConnection to send commands and listen for events relevant to that page.

BrowserController

The BrowserController class is responsible for launching and managing the Chrome/Chromium browser process itself. It acts as the entry point for creating and managing PageController instances.

Singleton Access

The controller.ts module also provides singleton functions for easy access to a single browser instance:

Integration with the Codebase

The BrowserController and PageController are central to browser automation tasks:

Execution Flow Example (browser-tool navigation):

graph TD
    A[browser-tool.execute] --> B{initBrowser};
    B --> C[BrowserController.newPage];
    C --> D[PageController.setViewport];
    D --> E[CDPConnection.send];
    A --> F{navigate};
    F --> G[PageController.goto];
    G --> H[CDPConnection.send];
    G --> I[PageController.waitForNavigation];
    I --> J[CDPConnection.on];

2. Terminal-Embedded Browser (embedded-browser.ts)

The embedded-browser.ts module offers a simpler, non-interactive browser experience primarily for rendering web content within a terminal. It does not launch a full browser process or use CDP. Instead, it relies on external command-line tools for fetching and rendering.

Key Components

EmbeddedBrowser

The EmbeddedBrowser class provides basic web page navigation, content extraction, and screenshot capabilities suitable for a terminal environment.