benchmarks

Module: benchmarks Cohesion: 0.80 Members: 0

benchmarks

The benchmarks module provides a suite of performance measurement tools for the Code Buddy project. Its primary purpose is to identify and track performance regressions in key areas: the CLI's startup time and the execution speed of various internal "tools" or common operations. By regularly running these benchmarks, developers can ensure that performance remains optimal and quickly pinpoint bottlenecks introduced by new features or changes.

This module is designed to be run independently, typically as part of a CI/CD pipeline or during local development when optimizing performance.

The module is composed of two main benchmark scripts:

  1. startup.bench.ts: Focuses on the overall startup time of the Code Buddy CLI.
  2. tools.bench.ts: Measures the execution time of individual, common operations (tools) that Code Buddy might perform.

1. Startup Benchmark (startup.bench.ts)

This benchmark measures how quickly the Code Buddy CLI initializes and becomes responsive. It simulates a user invoking the CLI for basic information, capturing the end-to-end time.

1.1 Purpose

The startup.bench.ts script aims to:

1.2 How to Run

Execute the benchmark directly using tsx:

npx tsx benchmarks/startup.bench.ts

1.3 Configuration

The behavior of the benchmark can be adjusted using environment variables:

1.4 Key Metrics Measured

The benchmark reports several metrics, primarily focusing on the totalTime taken for the CLI to execute codebuddy --help.

1.5 Core Logic and Execution Flow

The runBenchmark function orchestrates the entire process:

  1. CLI Build Check: Verifies that the compiled CLI (dist/index.js) exists. If not, it prompts the user to run npm run build.
  2. Warmup Phase: Executes WARMUP_RUNS of the measureStartup function. These results are discarded.
  3. Benchmark Phase: Executes BENCHMARK_RUNS of the measureStartup function, collecting StartupResult objects.
  4. Summary Calculation: The collected StartupResults are passed to calculateSummary, which computes statistical aggregates (min, max, avg, percentiles, std dev) using percentile and standardDeviation.
  5. Summary Formatting: formatSummary takes the BenchmarkSummary and generates a human-readable report, including a performance rating from getPerformanceRating.
  6. Additional Metrics:

  1. Output and Exit: All results are printed to the console. The script exits with a non-zero code if any benchmark runs failed or if the overall performance is critically slow (p50 >= 2000ms).

measureStartup Function

This asynchronous function is central to the startup benchmark. It spawns a new node process to execute CLI_PATH with the --help argument. It captures stdout and stderr and measures the total time from spawn to close. A fake GROK_API_KEY is injected into the environment to prevent interactive prompts that would skew results. A timeout of 30 seconds is implemented to prevent hung processes.

measureVersionCheck Function

Similar to measureStartup, this function spawns the CLI with --version. However, it specifically measures the time until the first byte of output is received, then immediately kills the process. This provides a lower bound on "time to first interaction".

measureModuleImports Function

This function dynamically imports a predefined list of modules and measures the time each import takes. This helps pinpoint heavy dependencies.

Startup Benchmark Flow

graph TD
    A[runBenchmark] --> B{CLI Built?};
    B -- No --> C[Exit Error];
    B -- Yes --> D[Warmup Runs];
    D --> E[measureStartup];
    E --> F[Benchmark Runs];
    F --> G[measureStartup];
    G --> H[Collect StartupResult[]];
    H --> I[calculateSummary];
    I --> J[formatSummary];
    J --> K[measureVersionCheck];
    K --> L[measureModuleImports];
    L --> M[Display Additional Metrics];
    M --> N[Exit Status];

1.6 Output Interpretation

The output provides a clear summary of startup times, including statistical measures. The "Performance" rating offers a quick health check. If the rating is "Slow" or "Critical", or if individual module import times are high, it indicates areas for optimization (e.g., lazy loading modules, reducing initial bundle size).


2. Tools Benchmark (tools.bench.ts)

This benchmark focuses on the performance of specific, common operations (referred to as "tools") that Code Buddy might execute. These tools often involve file system operations, Git commands, token counting, or other utility functions.

2.1 Purpose

The tools.bench.ts script aims to:

2.2 How to Run

Execute the benchmark directly using tsx:

npx tsx benchmarks/tools.bench.ts

2.3 Configuration

The benchmark's behavior can be customized with environment variables:

2.4 Tool Definitions (TOOL_BENCHMARKS)

The core of this benchmark is the TOOL_BENCHMARKS array, which defines each operation to be tested. Each entry is a ToolBenchmark object with the following properties:

Examples of defined tools:

2.5 Key Metrics Measured

For each tool, the benchmark reports:

2.6 Core Logic and Execution Flow

The runBenchmark function orchestrates the tools benchmark:

  1. Tool Filtering: If the TOOLS environment variable is set, TOOL_BENCHMARKS is filtered to include only the specified tools.
  2. Individual Tool Benchmarking:

  1. Summary Generation: After all tools are benchmarked, runBenchmark categorizes them into fast, medium, and slow groups.
  2. Results Formatting: formatResults generates a comprehensive report, including a table of tool performance and the speed categories.
  3. Parallel Execution Test: benchmarkParallelExecution is called to run a subset of parallelizable tools both sequentially and in parallel, calculating the speedup.
  4. Optimization Recommendations: Based on the results, the script provides general recommendations for improving performance (e.g., focusing on slow or uncached tools).
  5. Output and Save: All results are printed to the console. A detailed JSON report (.benchmark-results.json) is saved to the project root.

benchmarkTool Function

This function encapsulates the logic for benchmarking a single ToolBenchmark. It handles setup, warmup, repeated execution, time measurement, error handling, and teardown, returning a ToolResult object.

benchmarkParallelExecution Function

This function demonstrates the potential benefits of parallel execution. It selects a few parallelizable tools, runs them one after another, then runs them all concurrently using Promise.all, and reports the observed speedup.

Tools Benchmark Flow

graph TD
    A[runBenchmark] --> B{Filter Tools?};
    B -- Yes --> C[Filtered Tools];
    B -- No --> D[All Tools];
    C --> E(Loop ToolBenchmarks);
    D --> E;
    E --> F[benchmarkTool];
    F --> G[Collect ToolResult];
    G --> H[Calculate Summary];
    H --> I[formatResults];
    I --> J[benchmarkParallelExecution];
    J --> K[Display Parallel Results];
    K --> L[Display Recommendations];
    L --> M[Save Results to JSON];

2.7 Output Interpretation

The output table provides a quick overview of each tool's performance. Tools categorized as [SLOW] or [MEDIUM] (especially if they are frequently used) are prime candidates for optimization. The Cache and Parallel columns indicate opportunities for performance gains through caching mechanisms or concurrent execution. The "Parallel Execution Test" provides empirical data on how much speedup can be expected from running multiple parallelizable tasks simultaneously.


3. Common Utilities

Both benchmark scripts utilize a few common utility functions:


4. Integration with Codebase

The benchmarks module interacts with the rest of the Code Buddy codebase in several ways:


5. Contributing to Benchmarks

5.1 Extending the Startup Benchmark

5.2 Adding New Tools to the Tools Benchmark

To add a new tool for benchmarking:

  1. Identify the Operation: Determine a specific, self-contained operation that Code Buddy performs and whose performance is critical.
  2. Implement ToolBenchmark: Create a new object conforming to the ToolBenchmark interface.

  1. Add to TOOL_BENCHMARKS: Append your new ToolBenchmark object to the TOOL_BENCHMARKS array in tools.bench.ts.

Example:

class="hl-cmt">// In benchmarks/tools.bench.ts
const TOOL_BENCHMARKS: ToolBenchmark[] = [
  class="hl-cmt">// ... existing tools ...
  {
    name: 'new_complex_calculation',
    cacheable: true, class="hl-cmt">// If the output is deterministic for same input
    parallelizable: true, class="hl-cmt">// If it doesn't interfere with other operations
    async setup() {
      class="hl-cmt">// Optional: Prepare any data needed for the calculation
    },
    async execute() {
      class="hl-cmt">// Perform the complex calculation here
      let result = 0;
      for (let i = 0; i < 1000000; i++) {
        result += Math.sqrt(i);
      }
      class="hl-cmt">// console.log(result); // Avoid logging in execute for clean timing
    },
    async teardown() {
      class="hl-cmt">// Optional: Clean up any resources
    },
  },
];