src — observability

Module: src-observability Cohesion: 0.80 Members: 0

src — observability

The src/observability module provides comprehensive monitoring, logging, and tracing capabilities for Code Buddy. It's designed to give developers and users insights into the system's performance, resource usage, agent behavior, and error states.

This module encompasses:

Architecture Overview

The observability module acts as a central hub for various types of operational data. It collects data from different parts of the Code Buddy application and provides multiple ways to consume and visualize this information.

graph TD
    subgraph Data Sources
        A[API Requests] --> MC
        B[Tool Executions] --> MC
        C[Errors] --> MC
        D[Session Events] --> MC
        E[Custom Metrics] --> MC
        F[Agent Run Events] --> RS
    end

    MC[MetricsCollector] -- emits events --> TD[TerminalDashboard]
    MC -- provides data --> PE[PrometheusExporter]
    MC -- provides data --> DS[DashboardState]

    RS[RunStore] -- provides data --> RV[RunViewer]

    subgraph Global Services
        G[OpenTelemetry]
        H[Sentry]
    end

    init[initObservability] --> G
    init --> H

    style MC fill:#f9f,stroke:#333,stroke-width:2px
    style TD fill:#bbf,stroke:#333,stroke-width:2px
    style PE fill:#bbf,stroke:#333,stroke-width:2px
    style RS fill:#f9f,stroke:#333,stroke-width:2px
    style RV fill:#bbf,stroke:#333,stroke-width:2px

Core Components

1. Real-time Metrics & Dashboard (dashboard.ts)

This file defines the core in-memory metrics collection system and its immediate consumers.

MetricsCollector Class

The MetricsCollector is the central, in-memory repository for real-time operational metrics. It extends EventEmitter, allowing other parts of the application to subscribe to metric updates.

Key Responsibilities:

Data Structures:

Usage Pattern: Most interactions with MetricsCollector happen via the getMetricsCollector() singleton function.

import { getMetricsCollector } from './observability/dashboard.js';

const collector = getMetricsCollector();

class="hl-cmt">// Record an API request
collector.recordAPIRequest({
  provider: 'openai',
  model: 'gpt-4',
  promptTokens: 100,
  completionTokens: 50,
  cost: 0.001,
  latency: 500,
  success: true,
});

class="hl-cmt">// Record a tool execution
collector.recordToolExecution({
  name: 'bash',
  duration: 1200,
  success: true,
});

class="hl-cmt">// Listen for events
collector.on('metric', (data) => {
  console.log(`New metric: ${data.name} = ${data.value}`);
});

TerminalDashboard Class

Provides a human-readable, ASCII-art representation of the MetricsCollector's state, suitable for display in a terminal.

Key Responsibilities:

Usage Pattern: Typically instantiated via getTerminalDashboard() and used for CLI monitoring.

import { getTerminalDashboard } from './observability/dashboard.js';

const dashboard = getTerminalDashboard();

class="hl-cmt">// Print once
console.log(dashboard.render());

class="hl-cmt">// Or start live refresh
dashboard.startLiveRefresh(1000, (output) => {
  class="hl-cmt">// Clear console and print new output
  process.stdout.write('\x1Bc');
  console.log(output);
});

PrometheusExporter Class

Converts the MetricsCollector's data into a Prometheus-compatible text format, allowing integration with external monitoring systems like Prometheus and Grafana.

Key Responsibilities:

Usage Pattern: Used when Code Buddy needs to expose its metrics to a Prometheus server.

import { getPrometheusExporter } from './observability/dashboard.js';

const exporter = getPrometheusExporter();
const prometheusMetrics = exporter.export();
class="hl-cmt">// This string can be served via an HTTP endpoint for Prometheus to scrape.

2. Agent Run Persistence (run-store.ts, run-viewer.ts)

This subsystem focuses on persistently storing detailed information about individual agent runs, enabling historical analysis, debugging, and replay.

RunStore Class

The RunStore manages the lifecycle and storage of agent runs on the local filesystem. Each run gets its own directory under ~/.codebuddy/runs/run_/, containing events.jsonl, metrics.json, and an artifacts/ directory.

Key Responsibilities:

Data Structures:

Usage Pattern: The RunStore is typically accessed via RunStore.getInstance(). The _activeStore global variable allows for simplified event emission from various parts of the codebase.

import { RunStore, getActiveRunStore, setActiveRunStore } from './observability/run-store.js';

const store = RunStore.getInstance();

class="hl-cmt">// Start a new run
const runId = store.startRun('Fix bug in file X', { tags: ['bugfix', 'critical'] });

class="hl-cmt">// Set as active for convenience
setActiveRunStore(store);

class="hl-cmt">// Emit events during the run
getActiveRunStore()?.appendEvent('step_start', { description: 'Analyzing problem' });
getActiveRunStore()?.appendEvent('tool_call', { toolName: 'bash', args: { command: 'ls -la' } });

class="hl-cmt">// Save an artifact
store.saveArtifact(runId, 'plan.md', '# Plan to fix bug...');

class="hl-cmt">// Update metrics
store.updateMetrics(runId, { totalTokens: 500, totalCost: 0.01 });

class="hl-cmt">// End the run
store.endRun(runId, 'completed');
setActiveRunStore(null);

class="hl-cmt">// Later, list runs
const recentRuns = store.listRuns();

RunViewer Functions (run-viewer.ts)

Provides utility functions for displaying and interacting with RunStore data in the terminal.

Key Functions:

Usage Pattern: These functions are typically invoked from CLI commands (e.g., buddy run show , buddy run tail ).

import { showRun, tailRun, listRuns } from './observability/run-viewer.js';

class="hl-cmt">// From a CLI command handler
async function handleShowCommand(runId: string) {
  await showRun(runId);
}

async function handleTailCommand(runId: string) {
  await tailRun(runId);
}

function handleListCommand() {
  listRuns();
}

3. Tool Performance Tracking (tool-metrics.ts)

This module specifically tracks the performance and reliability of individual tools, which can be crucial for optimizing agent behavior, especially in RAG scenarios.

ToolMetricsTracker Class

Maintains a rolling window of latency and success/failure counts for each tool.

Key Responsibilities:

Data Structures:

Usage Pattern: Accessed via getToolMetricsTracker() and used by the tool execution layer to record performance and by the agent's decision-making logic to select tools.

import { getToolMetricsTracker } from './observability/tool-metrics.js';

const tracker = getToolMetricsTracker();

class="hl-cmt">// When a tool executes
tracker.record('bash', true, 150);
tracker.record('file_read', false, 300);

class="hl-cmt">// To get a reliability score for RAG
const bashReliability = tracker.getReliabilityScore('bash'); class="hl-cmt">// e.g., 0.9
const fileReadReliability = tracker.getReliabilityScore('file_read'); class="hl-cmt">// e.g., 0.4

class="hl-cmt">// To display a summary
console.log(tracker.formatSummary());

4. Distributed Tracing & Error Reporting (index.ts, tracing.ts)

This part of the module handles global observability integrations, specifically for distributed tracing and error reporting.

initObservability() Function (index.ts)

The main entry point for initializing global observability services.

Key Responsibilities:

Usage Pattern: Called once at application startup.

import { initObservability } from './observability/index.js';

class="hl-cmt">// In your main application entry point
initObservability();

initTracing() Function (tracing.ts)

Initializes the OpenTelemetry Node.js SDK for distributed tracing.

Key Responsibilities:

Usage Pattern: Called internally by initObservability(). Developers can add more specific instrumentations here if needed for custom modules.

Sentry Integration (index.ts)

Integrates with Sentry for error reporting.

Key Responsibilities:

Usage Pattern: Errors caught by Sentry's default handlers or explicitly reported via Sentry.captureException() will be sent to the configured Sentry DSN.

Integration Points & Usage Patterns

Configuration & Environment Variables

Future Considerations