src-sidecar

Module: src-sidecar Cohesion: 0.80 Members: 0

src-sidecar

The codebuddy-sidecar module is a native Rust application designed to extend the capabilities of a client application (e.g., a web application or desktop client) by providing local, high-performance functionalities that are typically difficult or inefficient to implement directly in the client. It acts as a bridge, communicating via a simple JSON-RPC protocol over standard input/output.

Its primary functions are:

  1. Local Speech-to-Text (STT): Leveraging whisper-rs for efficient, offline transcription using pre-trained Whisper models.
  2. Desktop Automation: Interacting with the operating system's clipboard and simulating keyboard input using arboard and enigo.

Architecture Overview

The codebuddy-sidecar operates as a standalone process, communicating with a parent "Client Application" (e.g., a web app, Electron app, or another native process) via newline-delimited JSON-RPC messages over stdin and stdout.

graph TD
    A[Client Application] -->|JSON-RPC Request (stdin)| B(codebuddy-sidecar)
    B -->|JSON-RPC Response (stdout)| A
    B -->|Desktop Automation (enigo, arboard)| C[Operating System / Hardware]
    B -->|STT (whisper-rs)| D[Whisper Model Files]

This design allows the client application to offload computationally intensive tasks (like STT) or privileged operations (like desktop automation) to a robust native process, receiving structured JSON responses.

Communication Protocol (JSON-RPC)

The sidecar implements a simplified JSON-RPC protocol. Each request and response is a single JSON object, terminated by a newline character.

Request Format

Requests are sent by the client application to the sidecar's stdin.

{
    "id": 1,
    "method": "stt.transcribe",
    "params": {
        "audio_b64": "...",
        "language": "en"
    }
}

Response Format

Responses are sent by the sidecar to the client application's stdout.

Success Response:

{
    "id": 1,
    "result": {
        "text": "Hello world",
        "duration_secs": 1.2,
        "model_used": "base.en"
    }
}

Error Response:

{
    "id": 1,
    "error": "Model file not found: /path/to/model.bin"
}

Modules

The codebuddy-sidecar is organized into two primary modules: stt for Speech-to-Text and desktop for desktop automation. The main module handles the JSON-RPC communication and dispatches requests.

src/main.rs - Main Entry Point

The main function in src/main.rs is the entry point of the sidecar. It continuously reads lines from stdin, parses them as Request objects, dispatches the calls to the appropriate module functions, and writes the Response (either result or error) to stdout.

It initializes a shared stt::SttState instance to manage Whisper model contexts across requests.

Supported Meta Methods:

src/stt.rs - Speech-to-Text Module

This module provides local Speech-to-Text capabilities using the whisper-rs Rust bindings for OpenAI's Whisper model. It implements a dual-model strategy to optimize for both speed and accuracy based on audio duration.

SttState

The SttState struct manages the loaded Whisper models. It holds two WhisperContext instances, fast_ctx and accurate_ctx, protected by Mutexes for thread-safe access. This allows for loading and switching between different models (e.g., a smaller, faster model for short phrases and a larger, more accurate model for longer dictations).

graph TD
    A[SttState] --> B{fast_ctx: Mutex<Option<WhisperContext>>}
    A --> C{accurate_ctx: Mutex<Option<WhisperContext>>}
    B --> D[WhisperContext (e.g., base.en)]
    C --> E[WhisperContext (e.g., large-v3)]

Methods

All stt methods are called on the SttState instance.

##### stt.load_model

Loads a Whisper model from a specified path into either the "fast" or "accurate" slot.

    {"id": 1, "method": "stt.load_model", "params": {"path": "/path/to/whisper-base.en.bin", "slot": "fast"}}

##### stt.transcribe

Transcribes base64-encoded WAV audio data. It automatically selects between the "fast" and "accurate" models based on the audio duration and a configurable threshold.

    {
        "text": "The transcribed text.",
        "segments": [
            {"text": "The", "start": 0.0, "end": 0.2},
            {"text": "transcribed", "start": 0.2, "end": 0.8},
            {"text": "text.", "start": 0.8, "end": 1.2}
        ],
        "duration_secs": 1.2,
        "processing_ms": 150,
        "model_used": "base.en",
        "model_slot": "fast"
    }
    {"id": 2, "method": "stt.transcribe", "params": {"audio_b64": "UklGRiQAAABXQVZFZm10IBAA...", "language": "en"}}

##### stt.list_models

Returns the names of the models currently loaded in the "fast" and "accurate" slots.

    {"id": 3, "method": "stt.list_models", "params": {}}

##### stt.status

Checks if models are loaded and if the STT system is ready for transcription.

    {"id": 4, "method": "stt.status", "params": {}}

Helper Functions

src/desktop.rs - Desktop Automation Module

This module provides cross-platform desktop automation functionalities, including clipboard manipulation and keyboard input simulation. It uses the arboard crate for clipboard access and enigo for keyboard events.

Methods

All desktop methods are standalone public functions.

##### desktop.paste

Pastes text into the currently focused application. It can use the system clipboard (Ctrl+V) or simulate typing.

    {"id": 5, "method": "desktop.paste", "params": {"text": "Hello from Code Buddy!", "auto_submit": true}}

##### desktop.type_text

Types text directly by simulating individual key presses. This method is slower than paste but can be useful in contexts where clipboard pasting is not reliable.

    {"id": 6, "method": "desktop.type_text", "params": {"text": "This is typed."}}

##### desktop.key_press

Simulates pressing a specific key or a key combination (e.g., Ctrl+C).

    {"id": 7, "method": "desktop.key_press", "params": {"key": "c", "modifiers": ["ctrl"]}}

##### desktop.clipboard_get

Retrieves the current content of the system clipboard.

    {"id": 8, "method": "desktop.clipboard_get", "params": {}}

##### desktop.clipboard_set

Sets the content of the system clipboard.

    {"id": 9, "method": "desktop.clipboard_set", "params": {"text": "New clipboard text"}}

Helper Functions

Building and Running

The codebuddy-sidecar is a standard Rust binary.

To build the release version:

cargo build --release

The executable will be located at target/release/codebuddy-sidecar.

To run and test it manually (e.g., with a single request):

echo '{"id": 1, "method": "ping", "params": {}}' | target/release/codebuddy-sidecar

Or, for continuous interaction:

target/release/codebuddy-sidecar
# Then type JSON requests followed by a newline, e.g.:
# {"id": 1, "method": "version", "params": {}}
# {"id": 2, "method": "stt.load_model", "params": {"path": "/path/to/whisper-base.en.bin"}}

Contribution Guidelines

When contributing to codebuddy-sidecar: