src — embeddings

Module: src-embeddings Cohesion: 0.80 Members: 0

src — embeddings

The src/embeddings module is responsible for generating vector embeddings for various types of content, primarily text and multimodal (text + image). These embeddings are crucial for enabling semantic search, similarity comparisons, and Retrieval-Augmented Generation (RAG) within the application.

The module offers two main providers:

  1. EmbeddingProvider: For text-only embeddings, supporting local models, OpenAI, Grok, and mock implementations.
  2. MultimodalEmbeddingProvider: For embeddings that combine text and image inputs into a shared vector space, powered by Google's Gemini API.

Core Concepts

Vector Embeddings: Numerical representations of text or other data, where semantically similar items are closer in the vector space. Providers: Different services or models that generate these embeddings. The module abstracts away the specifics of each provider. Dimensions: The length of the embedding vector. Different models produce embeddings of different dimensions. Cosine Similarity: A common metric used to measure the similarity between two embedding vectors.

Text Embedding Provider (EmbeddingProvider)

The EmbeddingProvider class is the primary interface for generating text embeddings. It supports multiple backend services, prioritizing local execution for privacy and cost efficiency.

Overview

The EmbeddingProvider is an EventEmitter that manages the lifecycle and delegation of text embedding requests. It can be configured to use a local model (via @xenova/transformers), OpenAI's API, Grok's API, or a mock implementation for testing and fallback.

Configuration (EmbeddingConfig)

The provider is configured using an EmbeddingConfig object, which allows specifying the desired provider, model, API keys, cache directories, and batch sizes.

export type EmbeddingProviderType = 'local' | 'openai' | 'grok' | 'mock';

export interface EmbeddingConfig {
  provider: EmbeddingProviderType;
  modelName?: string; class="hl-cmt">// e.g., 'Xenova/all-MiniLM-L6-v2', 'text-embedding-3-small'
  apiKey?: string;
  apiEndpoint?: string; class="hl-cmt">// For custom API endpoints
  cacheDir?: string; class="hl-cmt">// For local models
  batchSize?: number; class="hl-cmt">// For local batch processing
}

A DEFAULT_CONFIG is provided, setting local as the default provider with the Xenova/all-MiniLM-L6-v2 model.

Initialization

The EmbeddingProvider requires explicit asynchronous initialization, especially when using local models.

Embedding Methods

Supported Providers

The EmbeddingProvider encapsulates the logic for interacting with different embedding sources:

Utility Methods

Singleton Access

The module provides singleton functions to ensure a single instance of EmbeddingProvider is used throughout the application:

Multimodal Embedding Provider (MultimodalEmbeddingProvider)

The MultimodalEmbeddingProvider class is designed for generating embeddings from both text and image inputs, placing them into a shared vector space. This enables powerful cross-modal search capabilities.

Overview

This provider leverages Google's Gemini API (gemini-embedding-2-preview) to create multimodal embeddings. It takes base64-encoded images and text strings as input.

Configuration (MultimodalEmbeddingConfig)

The MultimodalEmbeddingProvider is configured with an API key and optional model/dimensions:

export interface MultimodalEmbeddingConfig {
  apiKey: string;
  model?: string; class="hl-cmt">// Default: 'gemini-embedding-2-preview'
  dimensions?: number; class="hl-cmt">// Default: 768
  baseUrl?: string; class="hl-cmt">// Default: 'https://generativelanguage.googleapis.com/v1beta'
}

Embedding Methods

Singleton Access

Module Exports (index.ts)

The src/embeddings/index.ts file serves as the public API for the module, re-exporting all essential classes, types, and singleton functions from both embedding-provider.ts and multimodal-embedding-provider.ts.

Integration with the Codebase

This module is a foundational component for any feature requiring semantic understanding or similarity comparisons.

How to Use

  1. For Text Embeddings:
    import { initializeEmbeddingProvider } from './embeddings';

    async function getTextEmbedding(text: string) {
      const provider = await initializeEmbeddingProvider({ provider: 'openai', apiKey: process.env.OPENAI_API_KEY });
      const result = await provider.embed(text);
      console.log('Text embedding:', result.embedding);
    }

  1. For Multimodal Embeddings:
    import { getMultimodalEmbeddingProvider } from './embeddings';

    async function getMultimodalEmbedding(text: string, imageData: string) {
      const provider = getMultimodalEmbeddingProvider();
      if (!provider) {
        console.warn('Multimodal embeddings not available (API key missing).');
        return;
      }
      const results = await provider.embed([
        { type: 'text', content: text },
        { type: 'image', content: imageData, mimeType: 'image/jpeg' }
      ]);
      console.log('Multimodal embeddings:', results);
    }

Key Integrations

Architecture Overview

The following diagram illustrates the high-level architecture of the embeddings module, showing how client code interacts with the two main providers and their underlying mechanisms.

graph TD
    subgraph Text Embeddings
        EP[EmbeddingProvider] --> EP_INIT(initialize);
        EP --> EP_EMBED(embed/embedBatch);
        EP_EMBED --> EP_LOCAL(embedLocal/Batch);
        EP_EMBED --> EP_OPENAI(embedOpenAI/Batch);
        EP_EMBED --> EP_GROK(embedGrok/Batch);
        EP_EMBED --> EP_MOCK(embedMock/Batch);
        EP_LOCAL -- uses --> TRANSFORMERS[@xenova/transformers];
        EP_OPENAI -- calls --> OPENAI_API[OpenAI API];
        EP_GROK -- calls --> GROK_API[Grok API];
    end

    subgraph Multimodal Embeddings
        MEP[MultimodalEmbeddingProvider] --> MEP_EMBED(embed);
        MEP_EMBED -- calls --> GEMINI_API[Gemini API];
    end

    Client[Client Code] --> EP;
    Client --> MEP;