tests — desktop-automation

Module: tests-desktop-automation Cohesion: 0.80 Members: 0

tests — desktop-automation

The desktop-automation module provides a robust, cross-platform API for interacting with the desktop environment, including mouse, keyboard, window, application, and screen operations. It abstracts away platform-specific implementations, allowing developers to write automation scripts that can run on Linux, Windows, and macOS. Additionally, it includes a "Smart Snapshot" system for intelligent UI element detection, combining accessibility and OCR capabilities.

This documentation focuses on the core components, their interactions, and how to leverage them for desktop automation tasks.

Core Concepts

The module is built around three primary concepts:

  1. DesktopAutomationManager: The central facade that provides a unified API for all desktop automation tasks. It manages the underlying automation providers and offers configuration, event handling, and safety features.
  2. IAutomationProvider: An interface (or abstract class in practice) that defines the contract for platform-specific or library-based automation implementations. The DesktopAutomationManager delegates actual operations to an active IAutomationProvider.
  3. SmartSnapshotManager: A system for taking "snapshots" of the UI, detecting elements using either accessibility APIs or Optical Character Recognition (OCR), and providing a structured view of the interactive elements on the screen.

Architecture Overview

The DesktopAutomationManager acts as a central orchestrator. It can be configured to use a specific IAutomationProvider or automatically select the most suitable one based on the operating system and available tools. All high-level automation commands (e.g., click, type, focusWindow) are routed through the manager to the currently active provider. The SmartSnapshotManager operates in conjunction, providing intelligent element detection capabilities that can be integrated into automation workflows.

graph TD
    A[DesktopAutomationManager] --> B{IAutomationProvider};
    B --> C[MockAutomationProvider];
    B --> D[NutJsProvider];
    B --> E[LinuxNativeProvider];
    B --> F[WindowsNativeProvider];
    B --> G[MacOSNativeProvider];
    A -- Manages --> H[SmartSnapshotManager];
    H -- Uses --> I[ScreenshotTool];
    H -- Uses --> J[OCRTool];

DesktopAutomationManager

The DesktopAutomationManager is the primary entry point for developers. It provides a comprehensive set of methods for desktop interaction and manages the lifecycle and configuration of automation providers.

Getting an Instance

The manager is designed as a singleton, accessible via getDesktopAutomation(). This ensures that only one instance manages desktop automation resources at a time.

import { getDesktopAutomation } from '../../src/desktop-automation/index.js';

const manager = getDesktopAutomation();
await manager.initialize(); class="hl-cmt">// Initialize the underlying provider
class="hl-cmt">// ... perform automation ...
await manager.shutdown(); class="hl-cmt">// Clean up resources

You can reset the singleton instance using resetDesktopAutomation() for testing or specific scenarios. The first call to getDesktopAutomation() can also accept an initial configuration.

import { getDesktopAutomation, resetDesktopAutomation } from '../../src/desktop-automation/index.js';

resetDesktopAutomation(); class="hl-cmt">// Clear any existing instance
const manager = getDesktopAutomation({ debug: true, provider: 'nutjs' });
await manager.initialize();

Initialization and Provider Selection

Upon initialize(), the manager attempts to find and initialize an IAutomationProvider. By default, it prioritizes native providers (platform-specific tools), then nutjs, and finally mock (for testing). You can explicitly specify a provider in the configuration.

Configuration

The manager's behavior can be configured via updateConfig() and retrieved with getConfig().

const config = manager.getConfig();
console.log(config.provider); class="hl-cmt">// e.g., 'native'

manager.updateConfig({
  defaultDelays: {
    mouseMove: 50,
    keyPress: 20,
  },
  safety: {
    failSafe: false, class="hl-cmt">// Disable fail-safe for specific scenarios
  },
});

Safety Features: The manager includes built-in safety mechanisms:

Event System

The DesktopAutomationManager emits events for various desktop interactions, allowing for monitoring or reactive automation.

manager.on('mouse-move', (pos) => {
  console.log(`Mouse moved to: ${pos.x}, ${pos.y}`);
});

manager.on('key-press', (key, modifiers) => {
  console.log(`Key pressed: ${key} with modifiers: ${modifiers.join(',')}`);
});

manager.on('window-focus', (windowInfo) => {
  console.log(`Window focused: ${windowInfo.title} (${windowInfo.handle})`);
});

Key events include:

Core Automation Methods

The manager exposes a comprehensive API for desktop interaction, mirroring the IAutomationProvider interface:

Mouse Operations:

Keyboard Operations:

Window Operations:

Application Operations:

Screen Operations:

Clipboard Operations:

IAutomationProvider Interface

The IAutomationProvider interface defines the contract for any desktop automation implementation. Each provider must implement these methods and declare its capabilities.

interface IAutomationProvider {
  name: string;
  capabilities: {
    mouse: boolean;
    keyboard: boolean;
    windows: boolean;
    apps: boolean;
    clipboard: boolean;
    ocr: boolean;
    screenshots?: boolean;
    colorPicker?: boolean;
  };
  isAvailable(): Promise<boolean>;
  initialize(): Promise<void>;
  shutdown(): Promise<void>;
  class="hl-cmt">// ... methods for mouse, keyboard, window, app, screen, clipboard operations ...
}

Concrete Automation Providers

The module includes several concrete implementations of IAutomationProvider:

1. MockAutomationProvider

2. NutJsProvider

3. Native Providers

These providers leverage platform-specific command-line tools or APIs for optimal performance and deeper integration. They are typically preferred when available.

##### LinuxNativeProvider

##### WindowsNativeProvider

##### MacOSNativeProvider

SmartSnapshotManager

The SmartSnapshotManager provides capabilities for intelligent UI element detection, allowing automation scripts to interact with elements based on their visual or accessibility properties rather than just coordinates.

Getting an Instance

The SmartSnapshotManager is typically instantiated directly, often configured with a detection method and defaultTtl.

import { SmartSnapshotManager } from &#39;../../src/desktop-automation/smart-snapshot.js&#39;;

const snapshotManager = new SmartSnapshotManager({
  method: &#39;ocr&#39;, class="hl-cmt">// or &#39;accessibility&#39;
  defaultTtl: 30_000, class="hl-cmt">// Snapshots are valid for 30 seconds
});

Snapshot Creation

Element Referencing

Injecting Browser Elements

A powerful feature of the SmartSnapshotManager is its ability to combine desktop-level UI elements with elements sourced from a browser context. This is crucial for hybrid automation scenarios.

class="hl-cmt">// Example: Injecting elements from a browser automation tool
const browserElements = [
  {
    ref: snapshotManager.getNextRef(),
    role: &#39;button&#39;,
    name: &#39;Submit Form&#39;,
    bounds: { x: 100, y: 200, width: 150, height: 40 },
    center: { x: 175, y: 220 },
    interactive: true,
    focused: false,
    enabled: true,
    visible: true,
  },
];

snapshotManager.injectBrowserElements(browserElements, &#39;my-browser-plugin&#39;);

class="hl-cmt">// Now, a subsequent call to manager.findUIElement() could find &#39;Submit Form&#39;
class="hl-cmt">// even if it&#39;s within a browser window that desktop OCR/accessibility can&#39;t fully parse.

Developer Guide

Basic Usage Flow

  1. Get the Manager: Obtain the singleton instance.
  2. Initialize: Prepare the underlying automation provider.
  3. Perform Actions: Use the manager's methods for mouse, keyboard, window, etc.
  4. Shutdown: Release resources when done.
import { getDesktopAutomation } from &#39;../../src/desktop-automation/index.js&#39;;

async function automateTask() {
  const automation = getDesktopAutomation();
  await automation.initialize();

  try {
    class="hl-cmt">// Move mouse and click
    await automation.moveMouse(100, 100);
    await automation.click();

    class="hl-cmt">// Type text
    await automation.type(&#39;Hello, Desktop!&#39;);

    class="hl-cmt">// Find and focus a window
    const terminalWindow = await automation.findWindow(&#39;Terminal&#39;);
    if (terminalWindow) {
      await automation.focusWindow(terminalWindow.handle);
      console.log(`Focused: ${terminalWindow.title}`);
    }

    class="hl-cmt">// Get clipboard content
    await automation.copyText(&#39;Copied from automation&#39;);
    const clipboardText = await automation.getClipboardText();
    console.log(`Clipboard: ${clipboardText}`);

  } catch (error) {
    console.error(&#39;Automation failed:&#39;, error);
  } finally {
    await automation.shutdown();
  }
}

automateTask();

Using Smart Snapshots for Intelligent Interaction

import { getDesktopAutomation, SmartSnapshotManager } from &#39;../../src/desktop-automation/index.js&#39;;

async function interactWithUI() {
  const automation = getDesktopAutomation();
  await automation.initialize();

  const snapshotManager = new SmartSnapshotManager({ method: &#39;ocr&#39; }); class="hl-cmt">// Use OCR for element detection

  try {
    class="hl-cmt">// Take a snapshot of the current screen
    const snapshot = await snapshotManager.takeSnapshot();
    console.log(`Snapshot taken with ${snapshot.elements.length} elements.`);

    class="hl-cmt">// Find an element by name (e.g., a button detected by OCR)
    const submitButton = snapshot.elements.find(e => e.name === &#39;Submit&#39; && e.role === &#39;button&#39;);

    if (submitButton) {
      console.log(`Found &#39;Submit&#39; button at ${submitButton.center.x}, ${submitButton.center.y}`);
      await automation.click(submitButton.center.x, submitButton.center.y);
    } else {
      console.log("Submit button not found.");
    }

  } catch (error) {
    console.error(&#39;UI interaction failed:&#39;, error);
  } finally {
    await automation.shutdown();
  }
}

interactWithUI();

Extending with a New Automation Provider

To add support for a new platform or library:

  1. Create a new class that implements the IAutomationProvider interface.
  2. Implement all required methods (mouse, keyboard, window, etc.) using the new platform's APIs or tools.
  3. Define name and capabilities for your provider.
  4. Implement isAvailable() to check if the necessary tools/libraries are present on the system.
  5. Register your provider with the DesktopAutomationManager using manager.registerProvider(yourNewProviderInstance). You can then configure the manager to use it.

This modular design ensures that the desktop-automation module can be easily extended to support new environments or integrate with different automation backends.