src — knowledge

Module: src-knowledge Cohesion: 0.80 Members: 0

src — knowledge

The src/knowledge module is the core intelligence layer of the CodeBuddy system, responsible for building, maintaining, querying, and analyzing a comprehensive Knowledge Graph of the codebase. This graph represents the structural and semantic relationships between various code entities (modules, classes, functions, interfaces, architectural layers, design patterns, etc.).

By transforming raw source code and project metadata into a structured graph, this module enables advanced AI agent capabilities such as:

Core Concept: The Knowledge Graph

At the heart of this module is the KnowledgeGraph class (defined in src/knowledge/knowledge-graph.ts, though not provided in the prompt). It acts as an in-memory, RDF-like triple store, storing relationships as (subject, predicate, object) triples, optionally with metadata.

Key KnowledgeGraph Operations (inferred from usage):

Entities in the graph follow a consistent naming convention (e.g., mod:src/utils/logger, cls:KnowledgeGraph, fn:add, iface:ImpactOptions, layer:agent, pat:singleton).

Architecture Overview

The knowledge graph is built and utilized through a series of specialized components:

graph TD
    A[CartographyResult] --> B(code-graph-populator)
    C[Source Files] --> D(code-graph-deep-populator)
    B --> KG[KnowledgeGraph]
    D --> KG
    KG --> E(code-graph-context-provider)
    KG --> F(graph-analytics)
    KG --> G(impact-analyzer)
    KG --> H(graph-visualizer)
    E --> I[LLM Context]
    F --> J[Insights]
    G --> K[Impact Report]
    H --> L[HTML Viz]
    M[User Message] --> E
    N[File Changes] --> O(graph-updater)
    O --> KG
    KG --> P(graph-embeddings)
    Q[Semantic Query] --> P
    P --> R[Semantic Search Results]

Graph Population

The graph is populated in stages, combining high-level architectural insights with deep code analysis.

code-graph-populator.ts (Initial Population)

This module (populateCodeGraph) takes a CartographyResult (generated by src/agent/repo-profiling/cartography.ts) and translates its findings into KnowledgeGraph triples. This includes:

This initial population provides a broad, high-level understanding of the project structure.

code-graph-deep-populator.ts (Deep Population)

The populateDeepCodeGraph function enriches the graph with fine-grained structural details by scanning source files. It's designed for on-demand use and delegates to language-specific regex scanners (found in src/knowledge/scanners/).

Process:

  1. File Discovery: Walks source directories (src, app, etc.), skipping common build/test directories and large files.
  2. Symbol Extraction (First Pass): For each relevant source file, it uses getScannerForExt (from src/knowledge/scanners/index.ts) to get a language-specific scanner (e.g., TypeScriptScanner, PythonScanner). The scanner extracts SymbolDef (classes, functions, methods), CallSite (function calls), and InheritanceInfo (extends/implements).
  3. Hierarchy Population: Adds cls:Class definedIn mod:module, cls:Class hasMethod fn:Method, and fn:Function definedIn mod:module triples.
  4. Inheritance Population: Adds cls:Class extends cls:Parent and cls:Class implements iface:Interface triples.
  5. Call Graph Resolution: Resolves CallSite entries to specific fn: entities and adds caller calls callee triples.
  6. Dynamic Imports: Scans for await import() patterns to add mod:source imports mod:target edges.

This deep population provides the detailed call graph and class hierarchy crucial for impact analysis and refactoring.

graph-updater.ts (Incremental Updates)

The updateGraphForFile function handles changes to individual files. When a file is modified or deleted, it:

  1. Identifies and removes all existing triples related to the module and its contained entities.
  2. Re-reads the file (if it still exists).
  3. Re-scans the file using the appropriate language scanner.
  4. Re-populates the graph with new triples for that module and its entities.

This ensures the graph remains up-to-date without requiring a full rebuild, which is critical for responsiveness.

Graph Persistence

code-graph-persistence.ts

This module handles saving and loading the KnowledgeGraph to/from disk.

This allows the graph to persist across sessions, avoiding expensive rebuilds on every startup.

Graph Analysis & Insights

code-graph-context-provider.ts (LLM Context)

The buildCodeGraphContext function is a key component for providing relevant code context to LLMs. It intelligently extracts entities from a user's message and augments them with graph data.

Context Sources:

  1. User Message Entities: Uses extractEntities (with ENTITY_PATTERNS) to find file paths, class names, module names, etc.
  2. Recent Files: Prioritizes files recently read or written by the agent (trackRecentFile).
  3. Error Messages: Uses extractErrorEntities (with ERROR_PATTERNS) to find function names from stack traces.
  4. Code Review Context: buildReviewContext provides specialized context for review scenarios, including dependencies of recently touched modules.
  5. Semantic Fallback: If no exact entity matches, semanticFallbackSync attempts a fuzzy match using a pre-built embedding index (see graph-embeddings.ts).

The collected entities are resolved against the KnowledgeGraph, sorted by PageRank, and structured into a concise text block, capped at MAX_CONTEXT_CHARS (800 characters) to manage token budgets.

graph-pagerank.ts (Entity Importance)

The computePageRank function implements the PageRank algorithm to assign importance scores to every entity in the graph. It considers calls and imports as "links." Entities with more incoming links from important sources will have higher PageRank.

community-detection.ts (Architectural Subsystems)

This module's detectCommunities function uses a deterministic Label Propagation algorithm to identify tightly-connected clusters of modules (architectural subsystems).

(Note: community-detector.ts provides an alternative, simpler Label Propagation implementation, also used in some contexts, but community-detection.ts is more integrated with PageRank and used by graph-analytics.ts and graph-visualizer.ts.)

graph-analytics.ts (Actionable Insights)

This module provides advanced analytical capabilities:

graph-drift.ts (Architecture Monitoring)

The detectDrift function compares the current KnowledgeGraph against a saved snapshot (.codebuddy/code-graph-snapshot.json) to identify architectural changes over time.

This module provides a GraphEmbeddingIndex for semantic search over graph entities.

This enables fuzzy search capabilities, allowing users to find relevant code even without exact entity names.

impact-analyzer.ts (Transitive Impact)

The analyzeImpact function performs transitive impact analysis using Breadth-First Search (BFS) on the graph.

Visualization

graph-visualizer.ts

The generateVisualization function creates a self-contained HTML file (.codebuddy/graph-viz.html) that visualizes the KnowledgeGraph using D3.js.

Integration Points

The src/knowledge module is deeply integrated throughout the CodeBuddy system:

This module forms the "brain" of CodeBuddy, enabling it to understand, reason about, and interact with complex codebases effectively.