src — analytics
src — analytics
This document provides a comprehensive overview of the src/analytics module, detailing its purpose, architecture, key components, and integration points within the codebase.
Analytics Module Overview
The src/analytics module serves as the central hub for all analytical capabilities within the application. Its primary goal is to provide insights into user behavior, system performance, resource consumption (especially LLM API costs), and codebase health. This includes:
- Cost Management: Tracking, predicting, and alerting on LLM API spending.
- Usage Monitoring: Recording user sessions, messages, and tool interactions.
- Codebase Insights: Analyzing code complexity, evolution, and modification patterns.
- Operational Metrics: Exposing real-time performance data for observability.
- Return on Investment (ROI): Quantifying the value generated by the application.
The module employs various strategies for data collection and storage, ranging from in-memory real-time dashboards to file-backed persistence and a robust SQLite-based system for long-term analytics.
Core Concepts
Several core concepts underpin the analytics module:
- Event-Driven Tracking: Most analytics components react to discrete events (e.g.,
message,tool_call,session_start) to update their internal state and metrics. - Layered Persistence:
- In-memory: For real-time, short-lived session metrics (
MetricsDashboard). - File-backed: For session-level aggregates and configuration (
AnalyticsDashboard,ROITracker,ToolAnalytics). - SQLite Database: For long-term, aggregated daily/weekly/monthly analytics (
PersistentAnalytics).
- Cost Centralization: LLM model pricing is sourced from
src/config/model-pricing.ts, ensuring consistent cost calculations across all analytics components. - Code Analysis: Leveraging Git history and static code analysis to provide insights into the codebase itself, independent of runtime behavior.
- Observability: Providing standard interfaces (like Prometheus) for external monitoring systems.
Architecture and Data Flow
The analytics module is composed of several specialized sub-modules, each handling a specific aspect of data collection, analysis, or reporting.
graph TD
subgraph "Application Events"
A[User Interactions]
B[LLM API Calls]
C[Tool Executions]
D[File System Changes]
E[Git Commits]
end
subgraph "Analytics Module"
subgraph "Runtime Tracking"
F[MetricsDashboard]
G[AnalyticsDashboard]
H[BudgetAlertManager]
I[CostPredictor]
J[ToolAnalytics]
K[ROI Tracker]
end
subgraph "Persistent & Reporting"
L[PersistentAnalytics]
M[PrometheusExporter]
N[Code Evolution]
O[Codebase Heatmap]
P[Complexity Analyzer]
end
end
A,B,C,D -- Emit Events --> F,G,J,K
G -- Triggers --> H
F,G,J,K -- Aggregate Data --> L
F,G,J,K -- Expose Metrics --> M
L -- Provides Historical Data --> I
E -- Analyzes --> N,O
D -- Analyzes --> P
I -- Predicts Cost --> B
L -- Stores Aggregates --> SQLite DB
G,J,K -- Persist Data --> Local JSON Files
M -- Scraped By --> Prometheus/Grafana
Key Distinctions between Dashboards:
MetricsDashboard(src/analytics/metrics-dashboard.ts): An in-memory, event-driven dashboard primarily focused on real-time metrics for the current session. It aggregates token usage, tool executions, costs, and latencies, providing a snapshot of ongoing activity. It does not persist data to disk itself.AnalyticsDashboard(src/analytics/dashboard.ts): A more comprehensive dashboard that tracks session, usage, cost, and performance metrics. It uses LRU caches for in-memory storage and persists data to local JSON files. It also integratesBudgetAlertManagerfor cost monitoring. This dashboard is designed for longer-term, but still local, historical data.PersistentAnalytics(src/analytics/persistent-analytics.ts): The most robust analytics component, backed by an SQLite database. It aggregates data daily, weekly, and monthly, providing long-term trends, cost budgeting, and export capabilities. It's designed for durable, cross-session analytics.
Key Components
1. BudgetAlertManager (src/analytics/budget-alerts.ts)
- Purpose: Monitors accumulated costs against defined budget thresholds and emits alerts when limits are approached or exceeded.
- How it Works:
- Initialized with
warningThreshold(default 70%) andcriticalThreshold(default 90%). - The
check(currentCost, limit)method evaluates the current spending against the budget. - Alerts (
warning,critical,limit_reached) are emitted as'alert'events. - Alerts are deduplicated per type within a session until
reset()is called. - API:
new BudgetAlertManager(config?: Partial: Creates an instance.) check(currentCost: number, limit: number): BudgetAlert | null: Checks cost and returns an alert if a threshold is crossed.getAlerts(): BudgetAlert[]: Returns a history of emitted alerts.reset(): void: Clears all alert state.updateConfig(config: Partial: Updates thresholds.): void - Connections: Used by
AnalyticsDashboard'scheckBudgetAlert()to trigger alerts.
2. Code Evolution Report (src/analytics/code-evolution.ts)
- Purpose: Generates reports on how a codebase evolves over time, tracking metrics like lines of code (LOC), file count, and language distribution.
- How it Works:
- Uses Git commands (
git log,git ls-tree,git show) to sample commits over a specified period. - For each sampled commit, it analyzes the codebase state to count LOC, files, and language breakdown.
- Calculates summary statistics (LOC change, file change, commit velocity) and trends.
- API:
generateEvolutionReport(options?: EvolutionOptions): EvolutionReport: The main entry point to generate a report.formatEvolutionReport(report: EvolutionReport): string: Formats the report for terminal display.exportEvolutionData(report: EvolutionReport): string: Exports the full report as JSON.exportEvolutionCSV(report: EvolutionReport): string: Exports key data points as CSV.- Connections: Directly interacts with the local Git repository via
child_process.execSync.
3. Codebase Heatmap (src/analytics/codebase-heatmap.ts)
- Purpose: Visualizes file modification patterns to identify "hotspots" – files or directories that are frequently changed.
- How it Works:
- Uses Git commands (
git log --name-only,git log --numstat) to gather commit history, authors, and line changes for files within a specified timeframe. - Calculates a "churn score" (additions + deletions) and assigns a
heatLevel(coldtoburning) based on churn and commit frequency. - Aggregates data to identify top authors and directory-level activity.
- API:
generateHeatmap(options?: HeatmapOptions): HeatmapData: Generates the heatmap data.formatHeatmap(data: HeatmapData): string: Formats the heatmap for terminal display.getDirectoryHeatmap(data: HeatmapData): Map: Aggregates churn scores by directory.- Connections: Directly interacts with the local Git repository via
child_process.execSync.
4. Complexity Analyzer (src/analytics/complexity-analyzer.ts)
- Purpose: Performs static analysis of TypeScript/JavaScript code to measure cyclomatic complexity, cognitive complexity, lines of code, and maintainability index.
- How it Works:
- Scans specified files using
fast-glob. - For each file, it reads the content and uses regex patterns to
extractFunctions(). analyzeFunctionContent()then calculates complexity metrics for each function by counting decision points, logical operators, and tracking nesting levels.- Assigns a
rating(A-F) based on cyclomatic complexity. - Calculates a simplified maintainability index for files.
- API:
analyzeComplexity(options?: AnalyzerOptions): Promise: The main function to analyze a codebase.formatComplexityReport(report: ComplexityReport): string: Formats the report for terminal display.exportComplexityJSON(report: ComplexityReport): string: Exports the full report as JSON.exportComplexityCSV(report: ComplexityReport): string: Exports function-level data as CSV.- Connections: Uses
fs-extrafor file I/O andfast-globfor file discovery.
5. CostPredictor (src/analytics/cost-predictor.ts)
- Purpose: Estimates the cost of an LLM API request before it is executed.
- How it Works:
- Estimates input tokens from message content using a character-based heuristic (
CHARS_PER_TOKEN). - Estimates output tokens based on historical average output tokens from a
CostTrackerinstance. - Applies model-specific pricing (from
getPricingPer1kinsrc/config/model-pricing.ts) to calculate the estimated cost. - Determines a
confidencelevel for the prediction based on the amount of historical data available. - API:
new CostPredictor(costTracker: CostTracker): Constructor, requires aCostTrackerinstance.predict(messages: Array<{ role: string; content: string }>, model: string): CostPrediction: Estimates cost for a given set of messages and model.getAverageCostPerRequest(): number: Returns the average cost from historical data.getCostTrend(): 'increasing' | 'decreasing' | 'stable': Analyzes recent cost trends.- Connections:
- Outgoing:
getPricingPer1k(fromsrc/config/model-pricing.ts),CostTracker.getReport()(fromsrc/utils/cost-tracker.ts). - Incoming:
processUserMessageStream(fromsrc/agent/codebuddy-agent.ts) uses this to show cost predictions.
6. AnalyticsDashboard (src/analytics/dashboard.ts)
- Purpose: A comprehensive, file-backed analytics dashboard for tracking user sessions, LLM usage, tool calls, costs, and performance metrics. It provides a local, persistent view of activity.
- How it Works:
- Manages data in several
LRUCacheinstances (sessions, tools, daily stats) to limit memory usage and retain recent data. - Persists these caches to JSON files in
~/.codebuddy/analyticsat regular intervals. - Tracks events like
session_start,message,tool_call, and aggregates them into various metrics. - Integrates
BudgetAlertManagerlogic to emitbudget:alertevents. - Provides methods to retrieve aggregated metrics and export data in different formats.
- API:
getAnalyticsDashboard(config?: Partial: Singleton accessor.): AnalyticsDashboard startSession(model?: string): string: Initiates a new session.endSession(): void: Concludes the current session.trackMessage(tokensInput: number, tokensOutput: number, model?: string): void: Records LLM message usage and cost.trackToolCall(toolName: string, success: boolean, duration: number, details?: Record: Records tool execution details.): void trackEvent(type: string, data: Record: Generic event tracking.): void getUsageMetrics(): UsageMetrics,getCostMetrics(): CostMetrics,getPerformanceMetrics(): PerformanceMetrics,getToolMetrics(): ToolMetrics[],getDailyStats(): DailyStats[],getRecentSessions(): SessionMetrics[]: Methods to retrieve various aggregated metrics.exportData(format?: 'json' | 'csv' | 'markdown'): Promise: Exports all collected data.renderDashboard(): string: Generates a formatted terminal display of key metrics.setBudget(amount: number): void: Sets a budget limit.reset(): Promise: Clears all analytics data.dispose(): void: Cleans up resources and saves data.- Connections:
- Outgoing:
LRUCache(fromsrc/utils/lru-cache.ts),fs-extra,path,os,getPricingPer1M(fromsrc/config/model-pricing.ts). - Incoming:
getAnalyticsDashboard(fromsrc/database/integration.ts), various tests.
7. MetricsDashboard (src/analytics/metrics-dashboard.ts)
- Purpose: An in-memory, event-driven dashboard for real-time metrics specific to the current session. It's designed for immediate feedback and monitoring of ongoing activity.
- How it Works:
- Aggregates token usage, tool executions, request latencies, and costs as events occur.
- Maintains internal counters and arrays (e.g.,
latencies) to calculate averages and percentiles. - Emits events (
tokenUsage,toolExecution,requestComplete) for external listeners. - Does not persist data to disk; its state is reset with the session.
- API:
getMetricsDashboard(): MetricsDashboard: Singleton accessor.recordTokenUsage(usage: { promptTokens: number; completionTokens: number; cachedTokens?: number }): void: Records token counts.recordToolExecution(execution: { toolName: string; success: boolean; duration: number; error?: string }): void: Records tool call outcomes and durations.recordRequestComplete(request: { latency: number; model: string; cost?: number }): void: Records LLM request latency and cost.recordCacheHit(): void,recordMessage(type: 'user' | 'assistant'): void,recordToolRound(): void: Records other session events.getMetrics(): DashboardMetrics: Returns a snapshot of all current metrics.formatMetrics(): string: Formats metrics for terminal display.exportMetrics(): string: Exports current metrics as JSON.resetSession(): void: Resets all session-specific metrics.- Connections:
- Incoming:
recordToolExecution(fromsrc/analytics/metrics-dashboard.tsitself, likely a typo in call graph, should be external calls to this method).
8. PersistentAnalytics (src/analytics/persistent-analytics.ts)
- Purpose: Provides a robust, SQLite-backed system for long-term analytics, cost tracking, and budgeting. It aggregates data daily, weekly, and monthly.
- How it Works:
- Leverages
AnalyticsRepository(fromsrc/database/repositories/analytics-repository.ts) for all database interactions. - Records individual
UsageEvents andsessionstarts, which are then aggregated by the repository. - Tracks session, daily, weekly, and monthly costs against configurable budgets.
- Emits
budget:alertevents when thresholds are crossed. - Provides comprehensive summaries and trend analysis over various periods.
- API:
getPersistentAnalytics(budget?: Partial: Singleton accessor.): PersistentAnalytics record(event: UsageEvent): void: Records a single usage event.recordSession(projectId?: string): void: Records a new session start.calculateCost(model: string, tokensIn: number, tokensOut: number): number: Calculates cost for given tokens and model.getSessionCost(): number,getDailyCost(): number,getWeeklyCost(): number,getMonthlyCost(): number: Retrieve current costs.getSummary(filter?: AnalyticsFilter): AnalyticsSummary: Retrieves aggregated analytics for a period.getDailySummaries(days?: number): Retrieves daily cost/request/token summaries.getBudgetStatus(): Returns current budget usage.setBudget(budget: Partial: Updates budget limits.): void export(filter?: AnalyticsFilter): Analytics[],exportCSV(filter?: AnalyticsFilter): string: Exports raw or CSV data.formatDashboard(): string: Formats a summary dashboard for terminal display.cleanup(daysToKeep?: number): number: Deletes old analytics data from the database.- Connections:
- Outgoing:
getAnalyticsRepository()(fromsrc/database/repositories/analytics-repository.ts),getPricingPer1M(fromsrc/config/model-pricing.ts). - Incoming:
getPersistentAnalytics(fromsrc/database/integration.ts),updateSessionStats(fromsrc/database/integration.ts).
9. PrometheusExporter (src/analytics/prometheus-exporter.ts)
- Purpose: Exposes application metrics in a format consumable by Prometheus, allowing for external monitoring and visualization (e.g., with Grafana).
- How it Works:
- Manages various metric types:
counter,gauge,histogram. - Provides methods (
inc,set,observe) to update metric values. - Starts an internal HTTP server that serves the
/metricsendpoint, returning metrics in Prometheus text format. - Includes
createMetricsCollector()as an adapter to easily integrate with application events. - API:
getPrometheusExporter(config?: Partial: Singleton accessor.): PrometheusExporter registerMetric(definition: MetricDefinition): void: Defines a new metric.inc(name: string, value?: number, labels?: Record: Increments a counter.): void set(name: string, value: number, labels?: Record: Sets a gauge value.): void observe(name: string, value: number, labels?: Record: Records a value for a histogram.): void formatMetrics(): string: Generates the Prometheus text format output.start(): Promise,stop(): Promise: Manages the HTTP server lifecycle.reset(): void: Resets all metric values.getValue(name: string, labels?: Record: Retrieves a current metric value.): number | undefined pushToGateway(gatewayUrl: string, jobName: string): Promise: Pushes metrics to a Prometheus Pushgateway.createMetricsCollector(exporter: PrometheusExporter): Factory for an event listener object that updates metrics.- Connections:
- Outgoing: Node.js
httpmodule,fetchfor Pushgateway. - Incoming:
startServer(fromsrc/server/index.ts) initializes and starts the exporter.
10. ROITracker (src/analytics/roi-tracker.ts)
- Purpose: Tracks the Return on Investment (ROI) for tasks completed with the application, estimating time saved and calculating productivity gains.
- How it Works:
- Records
TaskCompletionevents, including API cost, actual time spent, and estimated manual time (based on task type and lines of code). - Calculates various ROI metrics such as total time saved, productivity multiplier, and net value (based on a configurable
hourlyRate). - Persists task data to a local JSON file.
- API:
new ROITracker(config?: Partial: Constructor.) recordTask(task: Omit: Records a completed task.): void getReport(days?: number): ROIReport: Generates an ROI report for a specified period.reset(): void: Clears all recorded tasks.- Connections:
- Outgoing:
fs-extrafor file I/O. - Incoming: Used by
cat49ROIExtendedandcat19ROITrackerscripts.
11. ToolAnalytics (src/analytics/tool-analytics.ts) - Inferred from index.ts and Call Graph
- Purpose: Tracks detailed statistics for individual tool executions, including success/failure rates, average durations, and provides suggestions for tool usage.
- How it Works (Inferred):
- Records each tool execution, its outcome (success/failure), and duration.
- Aggregates these events to calculate overall and per-tool success rates and average execution times.
- Likely persists this data to a local file for historical analysis.
- Provides methods to query tool performance and suggest tools based on historical success.
- API (Inferred):
getToolAnalytics(): ToolAnalytics: Singleton accessor.recordSuccess(toolName: string, duration: number, details?: Record): void recordFailure(toolName: string, duration: number, error?: string, details?: Record): void getSnapshot(): ToolAnalyticsSnapshot: Returns a summary of tool stats.clear(): void: Clears all tool analytics data.save(): Promise,load(): Promise: Persists/loads data.formatAnalytics(): string: Formats for terminal display.exportToJson(): string: Exports data as JSON.getHighestSuccessRate(): ToolStats | undefined,getLowestSuccessRate(): ToolStats | undefined: Query for best/worst performing tools.suggestTools(context: string): ToolSuggestion[]: Suggests tools based on context and history.- Connections:
- Outgoing:
writeFile,readFile(likely viasrc/sandbox/e2b-sandbox.tsfor file operations). - Incoming:
executeTool(fromsrc/agent/tool-handler.ts),handleToolAnalytics(fromcommands/handlers/core-handlers.ts), various tests.
Integration Points
The analytics module is deeply integrated throughout the application:
src/config/model-pricing.ts: Provides the canonical source for LLM model pricing, used byCostPredictor,AnalyticsDashboard, andPersistentAnalytics.src/utils/cost-tracker.ts:CostPredictorrelies on theCostTrackerfor historical output token averages and cost trends.src/database/repositories/analytics-repository.ts:PersistentAnalyticsuses this repository for all its SQLite database operations.src/sandbox/e2b-sandbox.ts: File I/O operations forToolAnalytics(and potentially others likeComplexityAnalyzer'sreadFile) might be routed through a sandbox wrapper.src/agent/codebuddy-agent.ts: The main agent usesCostPredictorto provide cost estimates to the user before making LLM calls.src/server/index.ts: Initializes and starts thePrometheusExporterto expose metrics for external monitoring.commands/handlers/core-handlers.ts: Handles commands related to tool analytics, interacting withToolAnalytics.src/database/integration.ts: Provides access toPersistentAnalyticsandAnalyticsDashboardinstances for other parts of the application.
Usage Patterns
Developers typically interact with the analytics module in a few ways:
- Recording Events:
- When an LLM message is sent/received:
AnalyticsDashboard.trackMessage(),MetricsDashboard.recordTokenUsage(),PersistentAnalytics.record(). - When a tool is executed:
AnalyticsDashboard.trackToolCall(),MetricsDashboard.recordToolExecution(),ToolAnalytics.recordSuccess()/recordFailure(). - When a session starts/ends:
AnalyticsDashboard.startSession()/endSession(),PersistentAnalytics.recordSession().
- Retrieving Metrics/Reports:
- To display real-time session stats:
MetricsDashboard.getMetrics(),MetricsDashboard.formatMetrics(). - To show a local dashboard:
AnalyticsDashboard.renderDashboard(). - For long-term trends or budget status:
PersistentAnalytics.getSummary(),PersistentAnalytics.getBudgetStatus(),PersistentAnalytics.formatDashboard(). - For code analysis:
generateEvolutionReport(),generateHeatmap(),analyzeComplexity(). - For ROI:
ROITracker.getReport().
- Configuration:
- Setting budget limits:
AnalyticsDashboard.setBudget(),PersistentAnalytics.setBudget(). - Configuring analysis options:
generateEvolutionReport({ days: 180 }).
- Observability:
- The
PrometheusExporterruns in the background, making metrics available at its configured HTTP endpoint for scraping by Prometheus.
Contribution Guidelines
When contributing to the analytics module:
- Choose the Right Component:
- For real-time, in-memory session-specific metrics, consider
MetricsDashboard. - For local, file-backed session aggregates and user-facing dashboards, use
AnalyticsDashboard. - For long-term, durable, aggregated analytics and budgeting, use
PersistentAnalytics(viaAnalyticsRepository). - For code-specific analysis (complexity, evolution), create new functions or extend existing ones in
code-evolution.ts,codebase-heatmap.ts, orcomplexity-analyzer.ts. - For external monitoring, integrate with
PrometheusExporter. - Consistency: Ensure new metrics or data points are consistently tracked across relevant components if they serve multiple purposes (e.g., cost should be in
CostPredictor,AnalyticsDashboard,PersistentAnalytics). - Model Pricing: Always use
getPricingPer1korgetPricingPer1Mfromsrc/config/model-pricing.tsfor cost calculations, rather than hardcoding values. - Performance: Be mindful of performance, especially when dealing with large datasets or frequent updates.
LRUCacheis used inAnalyticsDashboardto manage memory. Git operations can be slow, so optimize sampling where possible. - Testing: Add unit tests for new functionality and ensure existing tests pass.
- Documentation: Update relevant interfaces and JSDoc comments for any new functions or classes.