src — talk-mode

Module: src-talk-mode Cohesion: 0.80 Members: 0

src — talk-mode

The src/talk-mode module provides a robust and extensible Text-to-Speech (TTS) system, designed to integrate various TTS providers and manage speech synthesis, caching, and queued playback. It acts as a central hub for all voice output functionality within the application.

Module Purpose

The primary goals of the talk-mode module are:

  1. Abstract TTS Providers: Offer a unified interface for interacting with different TTS services (e.g., OpenAI, ElevenLabs, Edge TTS, local engines).
  2. Manage Voices: Discover and manage available voices across all integrated providers.
  3. Efficient Synthesis: Provide mechanisms for caching synthesized audio to reduce API calls and latency.
  4. Queued Playback: Handle a queue of speech requests, allowing for prioritized and sequential playback.
  5. Configurability: Allow flexible configuration of providers, default voices, synthesis options, and queue behavior.

Architecture Overview

The talk-mode module follows a provider-based architecture, centered around the TTSManager class.

graph TD
    A[Application Code] --> B(getTTSManager())
    B --> C(TTSManager)
    C -- "Delegates synthesis & voice listing" --> D{ITTSProvider Interface}
    D --> E(OpenAITTSProvider)
    D --> F(ElevenLabsProvider)
    D --> G(EdgeTTSProvider)
    D --> H(AudioReaderTTSProvider)
    D --> I(MockTTSProvider)

    C -- "Manages queue, cache, config" --> C
    C -- "Emits events (synthesis, playback, queue)" --> A
    C -- "Uses types from" --> J(types.ts)

  1. TTSManager: The core class that orchestrates all TTS operations. It manages the lifecycle of TTS providers, selects the active provider, handles voice discovery, performs speech synthesis (including caching), and manages the playback queue. It also emits events for various stages of synthesis and playback.
  2. ITTSProvider: An interface that defines the contract for any TTS service integration. Each concrete TTS provider must implement this interface.
  3. Concrete TTS Providers: Classes like OpenAITTSProvider, ElevenLabsProvider, EdgeTTSProvider, and AudioReaderTTSProvider implement the ITTSProvider interface, providing specific logic to interact with their respective TTS APIs or local binaries. A MockTTSProvider is also included for testing and development.
  4. types.ts: This file centralizes all type definitions, configurations, and default values used throughout the module, ensuring consistency and clarity.

Key Components

TTSManager (src/talk-mode/tts-manager.ts)

The TTSManager is the central component of the talk-mode module. It extends EventEmitter to provide a rich set of events for monitoring its state and operations.

Initialization and Lifecycle:

Provider Management:

Voice Management:

Speech Synthesis:

Playback Queue Management:

Playback (Simulated):

Configuration and Stats:

ITTSProvider Interface (src/talk-mode/tts-manager.ts)

This interface defines the contract that all TTS provider implementations must adhere to.

Concrete TTS Providers (src/talk-mode/providers/)

The module includes several concrete implementations of ITTSProvider:

types.ts (src/talk-mode/types.ts)

This file defines all the essential data structures and interfaces:

Integration with the Codebase

The talk-mode module is designed to be a core utility for any part of the application requiring spoken output.

Incoming Calls:

Outgoing Calls:

This module provides a comprehensive and flexible foundation for integrating various text-to-speech capabilities into the application, abstracting away the complexities of individual providers and offering robust management of speech synthesis and playback.