src — performance
src — performance
The src/performance module is the central hub for all performance-related optimizations, monitoring, and benchmarking within the codebase. Its primary goal is to ensure the application, especially when interacting with LLMs and external tools, operates efficiently, quickly, and within defined resource constraints.
This module provides mechanisms for:
- Reducing startup time: Through lazy loading of heavy dependencies.
- Optimizing runtime execution: Via intelligent caching of tool calls and API responses, and efficient management of external requests.
- Monitoring and analysis: Offering detailed metrics for various operations, resource usage, and comprehensive benchmarking capabilities.
Module Architecture
The performance module is structured around a central PerformanceManager that orchestrates several specialized components.
graph TD
subgraph Core Performance
PM[PerformanceManager]
LL[LazyLoader]
TC[ToolCache]
RO[RequestOptimizer]
end
subgraph Utilities
BS[BenchmarkSuite]
MM[MemoryMonitor]
ST[StartupTimer]
SC[SemanticCache]
end
PM -- Manages/Integrates --> LL
PM -- Manages/Integrates --> TC
PM -- Manages/Integrates --> RO
PM -- Uses --> SC
TC -- Uses --> SC
src/performance/index.ts -- Re-exports --> PM
src/performance/index.ts -- Re-exports --> LL
src/performance/index.ts -- Re-exports --> TC
src/performance/index.ts -- Re-exports --> RO
src/performance/index.ts -- Re-exports --> BS
src/performance/index.ts -- Re-exports --> MM
src/performance/index.ts -- Re-exports --> ST
MM -- from ../utils --> src/performance/index.ts
ST -- from ../utils --> src/performance/index.ts
SC -- from ../utils --> src/performance/performance-manager.ts
SC -- from ../utils --> src/performance/tool-cache.ts
Key Components
PerformanceManager: The central orchestrator.LazyLoader: Manages on-demand loading of modules.ToolCache: Caches results of deterministic tool calls.RequestOptimizer: Optimizes external API requests.BenchmarkSuite: Provides comprehensive LLM performance benchmarking.- Re-exported Utilities:
MemoryMonitorandStartupTimerfromsrc/utils.
1. PerformanceManager (src/performance/performance-manager.ts)
The PerformanceManager is the core component for managing and coordinating all performance-related aspects of the application. It acts as a unified interface for enabling/disabling optimizations, recording metrics, and retrieving performance summaries.
Purpose
- Orchestration: Initializes and integrates
LazyLoader,ToolCache,RequestOptimizer, andSemanticCache. - Monitoring: Provides a centralized mechanism to
recordMetricfor any operation, tracking duration, success, and cache status. - Budgeting: Can emit events if an operation exceeds a configured performance budget.
- Reporting: Aggregates statistics from all integrated components into a comprehensive
PerformanceSummary.
How it Works
Upon initialize(), the manager sets up instances of LazyLoader, ToolCache, RequestOptimizer, and SemanticCache based on its configuration. It then subscribes to events from these components (e.g., module:loaded from LazyLoader, hit/miss from ToolCache, success/failure/deduplicated from RequestOptimizer) to automatically record performance metrics.
Developers can use measureOperation() (a utility function wrapping manager.measure()) to easily track the performance of any asynchronous function.
Core API
initialize(): Promise: Sets up all configured performance systems.recordMetric(metric: Omit: Manually records a performance metric.): void measure: Measures the execution time of an async function and records it.(operation: string, fn: () => Promise , metadata?: Record ): Promise getSummary(): PerformanceSummary: Returns an aggregated summary of all performance metrics and component stats.clearCaches(): void: Clears all managed caches (ToolCache,SemanticCache).invalidateForFile(filePath: string): void: Invalidates cache entries related to a specific file.resetStats(): void: Resets all collected metrics and component statistics.updateConfig(config: Partial: Updates the manager's configuration.): void
Configuration (PerformanceConfig)
Controls which optimizations are enabled, performance budgets, and metric retention.
interface PerformanceConfig {
enabled: boolean; class="hl-cmt">// Overall enable/disable switch
lazyLoading: boolean;
toolCaching: boolean;
requestOptimization: boolean;
apiCaching: boolean;
budgetMs: number; class="hl-cmt">// Threshold for 39;budget:exceeded39; event
enableMetrics: boolean;
metricsRetention: number; class="hl-cmt">// How many metrics to keep
}
Events
The PerformanceManager extends EventEmitter and emits various events:
initialized: When all performance systems are set up.metric: When a new metric is recorded.budget:exceeded: When an operation exceeds the configuredbudgetMs.caches:cleared: When all caches are cleared.cache:invalidated: When a cache entry is invalidated.stats:reset: When statistics are reset.config:updated: When the configuration is changed.error: For errors originating from managed components.
Singleton Access
The getPerformanceManager() function ensures a single instance of the manager throughout the application lifecycle. initializePerformanceManager() is an async helper to get and initialize the manager.
2. LazyLoader (src/performance/lazy-loader.ts)
The LazyLoader is designed to improve application startup time by deferring the loading of heavy modules until they are actually needed.
Purpose
- Faster Startup: Avoids loading all dependencies at once.
- Resource Efficiency: Only loads modules that are actively used.
- Controlled Loading: Allows for priority-based preloading and parallel loading.
How it Works
Modules are register()ed with a loader function (typically an import()) and a name. When get(name) is called, the module's loader is executed only if the module hasn't been loaded yet. Once loaded, the instance is cached for subsequent calls.
The schedulePreload() and scheduleIdlePreload() methods allow for intelligent background loading of modules after initial startup or during idle times, based on configured priorities and dependencies.
Core API
register: Registers a module for lazy loading.(name: string, loader: () => Promise , options?: { priority?: number; dependencies?: string[] }): void get: Retrieves a module, loading it if necessary.(name: string): Promise isLoaded(name: string): boolean: Checks if a module is already loaded.preload(moduleNames?: string[]): Promise: Manually triggers preloading of specified or configured modules.schedulePreload(): void: Schedules preloading of configured modules after a delay.scheduleIdlePreload(moduleNames: string[]): void: Schedules preloading of modules during idle time.unload(name: string): boolean: Unloads a module to free memory.getMetrics(): LoadMetrics[]: Returns detailed load metrics for each module.getStats(): Provides a summary of loaded modules and total/average load times.getOptimizationHints(): Suggests improvements based on load metrics.
Configuration (LazyLoaderConfig)
interface LazyLoaderConfig {
preloadDelay: number; class="hl-cmt">// Delay before starting preload
preloadModules: string[]; class="hl-cmt">// Modules to preload automatically
enableMetrics: boolean;
maxParallelLoads: number; class="hl-cmt">// Max concurrent loads during preload
idlePreload: boolean; class="hl-cmt">// Enable idle-time preloading
}
LoadPriority constants (CRITICAL, HIGH, NORMAL, LOW, DEFERRED) help categorize modules for preloading.
Events
module:registered: When a module is registered.module:loaded: When a module successfully loads.module:error: If a module fails to load.preload:complete: When a preload operation finishes.preload:error: If an error occurs during preloading.
Integration
The module provides registerCommonModules(), initializeLazyLoader(), and initializeCLILazyLoader() to quickly set up the loader with common application dependencies and specific strategies for CLI startup. createDeferredLoader() is a helper for deferring initialization until after initial UI render.
Singleton Access
getLazyLoader() provides a singleton instance. resetLazyLoader() clears and resets it.
3. ToolCache (src/performance/tool-cache.ts)
The ToolCache optimizes tool calls by caching their results, especially for deterministic operations. It leverages semantic similarity to match similar queries, not just exact matches.
Purpose
- Reduce Redundant Work: Avoids re-executing expensive or time-consuming tool calls with similar inputs.
- Improve Latency: Serves results instantly from cache.
- Cost Savings: Reduces API calls to external services if tool calls involve them.
How it Works
ToolCache wraps a SemanticCache instance. When getOrExecute() is called, it first checks if the tool call is isCacheable() (based on tool name, arguments, and exclusion patterns). If cacheable, it attempts to retrieve a semantically similar result from the underlying SemanticCache. If a hit occurs, the cached result is returned. Otherwise, the executeFn is called, its result is stored in the cache, and then returned.
Mutable tools (e.g., bash, create_file) are explicitly excluded from caching.
Core API
isCacheable(toolName: string, args: Record: Determines if a given tool call can be cached.): boolean getOrExecute(toolName: string, args: Record: Retrieves a cached result or executes the tool and caches its output., executeFn: () => Promise ): Promise invalidate(toolName?: string, pattern?: string | RegExp): number: Invalidates cache entries by tool name or a regex pattern.invalidateForFile(filePath: string): number: Invalidates cache entries that might be affected by changes to a specific file.clear(): void: Clears the entire tool cache.getStats(): ToolCacheStats: Returns statistics on cache hits, misses, and estimated time saved.resetStats(): void: Resets cache statistics.getCacheInfo(): Provides detailed cache information including config and underlyingSemanticCachestats.
Configuration (ToolCacheConfig)
interface ToolCacheConfig {
enabled: boolean;
ttlMs: number; class="hl-cmt">// Time-to-live for cache entries
maxEntries: number;
similarityThreshold: number; class="hl-cmt">// For semantic matching
cacheableTools: Set<string>; class="hl-cmt">// List of tools that can be cached
excludePatterns: RegExp[]; class="hl-cmt">// Patterns in args that prevent caching
}
MUTABLE_TOOLS is a hardcoded set of tools that are never cached.
Events
The ToolCache forwards cache:hit and cache:miss events from its internal SemanticCache as hit and miss respectively.
Integration
withCache(): A utility function to easily wrap a tool execution with caching logic.@CacheableDecorator: A decorator to apply caching to methods directly, simplifying integration for tool implementations.
Singleton Access
getToolCache() provides a singleton instance. resetToolCache() disposes and resets it.
4. RequestOptimizer (src/performance/request-optimizer.ts)
The RequestOptimizer is designed to make external API requests more robust and efficient by managing concurrency, batching, deduplication, and retries.
Purpose
- Concurrency Control: Prevents overwhelming external APIs or local resources.
- Deduplication: Avoids sending identical requests concurrently.
- Resilience: Implements retry logic with exponential backoff for transient failures.
- Batching: Groups requests within a short window to potentially reduce overhead (though explicit batching logic is not fully implemented in the provided code, the
batchWindowMssuggests this intent).
How it Works
Requests are submitted via execute() with a key (for deduplication) and an executeFn. Requests are added to an internal queue and processed by processQueue() respecting maxConcurrent limits.
- Deduplication: If
deduplicateis enabled and a request with the samekeyis already pending, the existing promise is returned. - Retries:
executeWithRetry()handles retries with exponential backoff for failed requests, up tomaxRetries. - Timeout: Each request is wrapped with a timeout.
Core API
execute: Queues and executes a request with optimizations.(key: string, executeFn: () => Promise , options?: { priority?: number; deduplicate?: boolean }): Promise executeImmediate: Executes a request immediately without queuing, but still applies retries and timeout.(executeFn: () => Promise , options?: { retries?: number; timeout?: number }): Promise getStats(): RequestStats: Returns statistics on total, successful, failed, retried, and deduplicated requests, along with average latency and current concurrency.resetStats(): void: Resets all collected request statistics.clear(): void: Clears the pending request queue and deduplication map.getQueueStatus(): Provides current queue and concurrency status.
Configuration (RequestConfig)
interface RequestConfig {
maxConcurrent: number; class="hl-cmt">// Max parallel requests
batchWindowMs: number; class="hl-cmt">// Window for batching (currently used for scheduling queue processing)
maxRetries: number;
retryBaseDelayMs: number; class="hl-cmt">// Base delay for exponential backoff
timeoutMs: number; class="hl-cmt">// Timeout for individual requests
deduplicate: boolean;
}
Events
success: When a request completes successfully.failure: When a request fails after all retries.deduplicated: When a request is deduplicated.retry: When a request is retried.
Integration
executeParallel(): A utility function to execute multiple requests in parallel with a specified concurrency limit.batchRequests(): A utility function to batch requests, leveraging the optimizer's deduplication.
Singleton Access
getRequestOptimizer() provides a singleton instance. resetRequestOptimizer() clears and resets it.
5. BenchmarkSuite (src/performance/benchmark-suite.ts)
The BenchmarkSuite provides a comprehensive framework for measuring the performance of LLM interactions.
Purpose
- Quantify LLM Performance: Measure key metrics like Time To First Token (TTFT), Tokens Per Second (TPS), and overall latency.
- Resource Usage: Track VRAM usage (if enabled) and estimate cost.
- Comparative Analysis: Allows for comparing different models or configurations.
- Regression Testing: Identify performance regressions over time.
How it Works
The run() method orchestrates the benchmarking process:
- Warmup Runs: Executes a few runs to "warm up" the LLM or system, preventing initial cold-start penalties from skewing results.
- Benchmark Runs: Executes the specified number of runs, either
sequentiallyorconcurrentlybased on configuration. executeRun(): For each run, it calls a providedBenchmarkCallback(which typically wraps an LLM API call), measures TTFT, total time, token counts, and calculates TPS and cost.calculateSummary(): After all runs, it aggregates the results, calculates percentile statistics (p50, p95, p99), averages, and standard deviations for key metrics.
Core API
constructor(config: BenchmarkConfig): Initializes the suite with configuration.run(model: string, callback: BenchmarkCallback): Promise: Executes the benchmark. Thecallbackis the function that interacts with the LLM.formatResults(results: BenchmarkResults): string: Formats the benchmark results into a human-readable string.exportJSON(results: BenchmarkResults): string: Exports results as a JSON string.compare(baseline: BenchmarkResults, current: BenchmarkResults): BenchmarkComparison: Compares two sets of benchmark results.getConfig(): Required: Returns the current configuration.updateConfig(config: Partial: Updates the configuration.): void
Configuration (BenchmarkConfig)
interface BenchmarkConfig {
warmupRuns?: number;
runs?: number;
concurrency?: number;
timeout?: number;
monitorVRAM?: boolean;
prompts?: BenchmarkPrompt[]; class="hl-cmt">// Prompts to use for benchmarking
}
DEFAULT_PROMPTS provides a set of diverse prompts for common use cases.
Events
The BenchmarkSuite extends EventEmitter and emits progress events:
phase: Indicates the current phase (warmup or benchmark).warmup: Progress during warmup runs.run: Progress during actual benchmark runs.runComplete: Details of a single run's result.batchComplete: Details of a batch of concurrent runs.complete: When the entire benchmark finishes, providing theBenchmarkResults.
Integration
- Uses
countTokensandcalculateCostfrom../utils/token-counter.jsfor token and cost estimation. - The
BenchmarkCallbackis a flexible interface allowing any LLM interaction to be benchmarked.
Singleton Access
getBenchmarkSuite() provides a singleton instance. resetBenchmarkSuite() clears and resets it.
6. Re-exported Utilities (src/performance/index.ts)
The src/performance/index.ts file re-exports several utility modules from src/utils that are crucial for performance monitoring and analysis, making them easily accessible under the performance module namespace.
MemoryMonitor (../utils/memory-monitor.js)
Provides functionality to monitor application memory usage, including RSS, heap total, and heap used. It can track memory pressure and provide snapshots over time.
- Key Functions:
getMemoryMonitor,startMemoryMonitoring,stopMemoryMonitoring,getMemoryUsage,getMemoryPressure. - Types:
MemorySnapshot,MemoryMetrics,MemoryMonitorConfig.
StartupTimer (../utils/startup-timing.js)
A utility for measuring and tracking different phases of application startup. This helps identify bottlenecks and optimize initialization sequences.
- Key Functions:
initStartupTiming,startPhase,endPhase,measurePhase,completeStartup,getStartupMetrics,getElapsedTime,timedImport,timedLazy. - Types:
StartupPhase,StartupMetrics.
How to Contribute and Extend
- Adding New Lazy Modules: Use
getLazyLoader().register()to add new heavy modules that should be loaded on demand. Consider theirpriorityanddependencies. - Caching New Tools: Use the
withCache()helper or the@Cacheabledecorator for new deterministic tool implementations. Ensure the tool is added toToolCacheConfig.cacheableTools. - Optimizing API Calls: Wrap external API calls with
getRequestOptimizer().execute()to benefit from deduplication, concurrency control, and retries. - Benchmarking New Models/Configurations: Implement a
BenchmarkCallbackfor your LLM interaction and usegetBenchmarkSuite().run()to gather performance data. - Adding New Performance Metrics: Use
getPerformanceManager().recordMetric()to track custom operations. - Extending Monitoring: The
PerformanceManageremits events for various activities. Listen to these events to integrate with external monitoring systems or custom dashboards. - Configuration: All components expose
updateConfig()methods, allowing dynamic adjustment of performance parameters at runtime.