tests — optimization

Module: tests-optimization Cohesion: 0.80 Members: 0

tests — optimization

This documentation covers the Prompt Cache Module, which is primarily implemented in src/optimization/prompt-cache.ts and thoroughly tested by tests/optimization/prompt-cache.test.ts.

Prompt Cache Module

The Prompt Cache module is designed to optimize interactions with Large Language Models (LLMs) by caching frequently used or expensive prompt components. By storing and reusing hashes of system prompts, tool definitions, and contextual information, the module aims to significantly reduce API costs and improve response times. Research suggests prompt caching can lead to substantial cost reductions, potentially up to 90%.

Purpose

The primary goals of this module are:

Key Components

The module exposes several types and a central class for managing the prompt cache.

PromptCacheManager Class

This is the core class responsible for managing the in-memory cache. It handles caching logic, statistics tracking, configuration, and event emission.

Constructor: new PromptCacheManager(config?: Partial) Initializes a new cache manager instance. If no config is provided, it uses DEFAULT_CACHE_CONFIG.

CacheConfig Type

Defines the configuration options for the prompt cache.

type CacheConfig = {
  enabled: boolean; class="hl-cmt">// Whether caching is active
  maxEntries: number; class="hl-cmt">// Maximum number of entries the cache can hold
  ttlMs: number; class="hl-cmt">// Time-to-live for cache entries in milliseconds
  minTokensToCache: number; class="hl-cmt">// Minimum token count for content to be considered for caching
  costPerMillion: number; class="hl-cmt">// Estimated cost per million tokens for cost savings calculation
};

DEFAULT_CACHE_CONFIG: A constant providing sensible default values:

CacheEntry Type

Represents a single item stored in the cache.

type CacheEntry = {
  hash: string; class="hl-cmt">// Unique hash of the cached content
  timestamp: number; class="hl-cmt">// Unix timestamp of when the entry was last accessed/updated
  hitCount: number; class="hl-cmt">// Number of times this entry has been hit
  tokens: number; class="hl-cmt">// Estimated token count of the cached content
  type: "system" | "tools" | "context" | "full"; class="hl-cmt">// Type of content cached
};

CacheStats Type

Provides a snapshot of the cache's performance and state.

type CacheStats = {
  hits: number; class="hl-cmt">// Total cache hits
  misses: number; class="hl-cmt">// Total cache misses
  hitRate: number; class="hl-cmt">// Percentage of hits (hits / (hits + misses))
  totalTokensSaved: number; class="hl-cmt">// Cumulative tokens saved by using the cache
  estimatedCostSaved: number; class="hl-cmt">// Estimated monetary cost saved
  entries: number; class="hl-cmt">// Current number of entries in the cache
};

Core Functionality & API

The PromptCacheManager class provides the following public methods:

cacheSystemPrompt(prompt: string): string

Caches a system prompt string. Returns a unique hash for the prompt. If the prompt is already cached, it updates its timestamp and hitCount.

cacheTools(tools: Tool[]): string

Caches an array of tool definitions. Returns a unique hash for the tools. This is crucial for scenarios where the available tools remain constant across multiple LLM calls.

cacheContext(key: string, content: string): string

Caches arbitrary contextual content, identified by a key. This can be used for caching file contents, database schemas, or other dynamic but reusable context. Returns a unique hash.

isCached(content: string): boolean

Checks if the given content (e.g., a system prompt string) is already present in the cache.

getStats(): CacheStats

Retrieves the current statistics of the cache, including hits, misses, hit rate, and estimated cost savings.

formatStats(): string

Returns a human-readable formatted string of the current cache statistics, suitable for logging or display.

clear(): void

Empties the entire cache, resetting all entries and statistics.

warmCache(prompts: { system?: string; tools?: Tool[]; context?: Record }): void

Pre-populates the cache with initial prompt components. This is useful for ensuring common components are available immediately upon application startup. It can cache a system prompt, tools array, and a map of context items.

structureForCaching(messages: Message[]): Message[]

Reorders a list of LLM messages to optimize for caching. Specifically, it moves system messages to the beginning of the array, as they are often static and good candidates for caching. This helps ensure consistent hashing for the same logical prompt structure.

updateConfig(config: Partial): void

Updates the cache's configuration at runtime. Only the provided properties in config will be updated, others will retain their current values.

Events

The PromptCacheManager extends EventEmitter and emits the following events:

Singleton Access

For convenience and to ensure a single, global cache instance across the application, the module provides singleton access functions.

getPromptCacheManager(): PromptCacheManager

Returns the singleton instance of PromptCacheManager. If the manager has not been initialized yet, it will create one with DEFAULT_CACHE_CONFIG.

initializePromptCache(config?: Partial): PromptCacheManager

Initializes or re-initializes the singleton PromptCacheManager with the provided configuration. This should typically be called once at application startup to configure the cache.

Singleton Architecture

graph TD
    A[Application Code] --> B{getPromptCacheManager()};
    A --> C{initializePromptCache(config)};
    B --> D[PromptCacheManager Instance];
    C --> D;
    D -- Manages --> E[In-memory Cache];
    D -- Emits --> F[Cache Events];

Usage Example

import {
  initializePromptCache,
  getPromptCacheManager,
} from "./src/optimization/prompt-cache.js";

class="hl-cmt">// 1. Initialize the cache once at application startup
initializePromptCache({
  maxEntries: 500,
  ttlMs: 10 * 60 * 1000, class="hl-cmt">// 10 minutes
  minTokensToCache: 500,
});

class="hl-cmt">// 2. Get the manager instance anywhere in your code
const cacheManager = getPromptCacheManager();

class="hl-cmt">// 3. Warm the cache with common components
cacheManager.warmCache({
  system: "You are a helpful AI assistant.",
  context: {
    "file:utils.ts": "export function add(a, b) { return a + b; }",
  },
});

class="hl-cmt">// 4. Cache system prompts
const systemPrompt = "You are a senior software engineer.";
const systemHash = cacheManager.cacheSystemPrompt(systemPrompt);

class="hl-cmt">// 5. Cache tool definitions
const tools = [
  {
    type: "function" as const,
    function: { name: "search", description: "Search the web", parameters: {} },
  },
];
const toolsHash = cacheManager.cacheTools(tools);

class="hl-cmt">// 6. Check if content is cached
if (cacheManager.isCached(systemPrompt)) {
  console.log("System prompt is cached!");
}

class="hl-cmt">// 7. Listen for cache events
cacheManager.on("cache:hit", (data) => {
  console.log(`Cache HIT for ${data.type} with hash ${data.hash}`);
});

class="hl-cmt">// Trigger a hit
cacheManager.cacheSystemPrompt(systemPrompt);

class="hl-cmt">// 8. Get and format statistics
const stats = cacheManager.getStats();
console.log(cacheManager.formatStats());

class="hl-cmt">// 9. Update configuration dynamically
cacheManager.updateConfig({ enabled: false }); class="hl-cmt">// Disable caching temporarily

Integration with LLM Calls

When constructing prompts for LLM APIs, you would typically:

  1. Use cacheManager.cacheSystemPrompt() for the system message.
  2. Use cacheManager.cacheTools() for the tool definitions.
  3. Use cacheManager.cacheContext() for any other static contextual information.
  4. Combine the hashes (or the original content if not cached) into your final prompt structure. The actual mechanism for using the hashes to reduce API calls would depend on the LLM provider's API (e.g., if they support sending hashes instead of full content, or if you're managing the full prompt construction locally and only sending the diff). The structureForCaching method helps prepare messages for consistent hashing.