src — database
src — database
The src/database module is the core persistence layer for Code-Buddy, providing robust and efficient data storage using SQLite. It manages the database connection, schema, migrations, and offers a structured way to interact with various data entities like memories, sessions, code embeddings, analytics, and a general-purpose cache.
This documentation aims to provide developers with a comprehensive understanding of the module's architecture, key components, and how to effectively use and contribute to it.
1. Module Overview
The src/database module centralizes all data storage for Code-Buddy. It replaces previous file-based JSON storage with a single, transactional SQLite database, offering improved performance, data integrity, and query capabilities.
Key Features:
- Centralized Database Management: Handles SQLite connection, configuration, and lifecycle.
- Schema Management & Migrations: Defines the database schema and provides a mechanism to migrate existing data from older versions or file-based storage.
- Structured Data Access (Repositories): Provides dedicated repository classes for different data domains (Memories, Sessions, Embeddings, Analytics, Cache), abstracting raw SQL operations.
- Semantic Search Capabilities: Stores and queries vector embeddings for memories and code chunks.
- Usage Analytics & Learning: Tracks tool usage, repair attempts, and overall system analytics.
- High-Level Integration Layer: Offers a simplified facade (
DatabaseIntegration) for common database operations, often combining logic from multiple repositories and external services. - Singleton Pattern: Ensures a single instance of the
DatabaseManagerand each repository throughout the application lifecycle.
2. Architecture
The database module is structured in layers, promoting separation of concerns and maintainability.
graph TD
subgraph Application Layer
OtherModules[Other Code-Buddy Modules]
end
subgraph Database Module
DI[DatabaseIntegration]
subgraph Repositories
MR[MemoryRepository]
SR[SessionRepository]
AR[AnalyticsRepository]
ER[EmbeddingRepository]
CR[CacheRepository]
end
DM[DatabaseManager]
S[Schema & Migrations]
end
subgraph External Dependencies
EP[EmbeddingProvider]
PL[PersistentLearning]
PA[PersistentAnalytics]
end
OtherModules --> DI
DI -- orchestrates --> Repositories
DI -- uses --> DM
DI -- interacts with --> EP
DI -- interacts with --> PL
DI -- interacts with --> PA
Repositories -- uses --> DM
DM -- manages --> S
Explanation of Layers:
DatabaseManager: The foundational layer. It's responsible for establishing and managing the SQLite connection, running schema migrations, and providing the rawbetter-sqlite3database instance to repositories. It also handles global database operations likevacuumandbackup.schema.ts: Defines the database tables, indices, and TypeScript interfaces for all data entities. It also contains the SQL scripts for schema creation and migrations.- Repositories: These classes (
MemoryRepository,SessionRepository, etc.) encapsulate the business logic for interacting with specific tables. They perform CRUD operations, complex queries, and data transformations (e.g., serializing/deserializing embeddings and JSON metadata). Each repository obtains its database instance from theDatabaseManager. DatabaseIntegration: This is a high-level facade designed to simplify common database interactions for other parts of the application. It often combines operations from multiple repositories and integrates with external services like theEmbeddingProvider,PersistentLearning, andPersistentAnalyticsto provide a unified API for complex workflows (e.g., adding a memory with an embedding, or searching code).migration.ts: A utility module specifically for migrating data from older Code-Buddy storage formats (JSON files) into the new SQLite database. It uses theDatabaseManagerand repositories to perform these operations.index.ts: The public entry point for the module, re-exporting all public classes, types, and singleton accessors. It also provides convenience functions likeinitializeDatabaseSystemandresetDatabaseSystem.
3. Key Components
3.1. DatabaseManager (src/database/database-manager.ts)
The DatabaseManager is the central hub for all database operations. It's a singleton class that ensures only one connection to the SQLite database is active at any time.
Responsibilities:
- Connection Management: Opens and closes the
better-sqlite3database connection. - Configuration: Accepts
DatabaseConfig(e.g.,dbPath,inMemory,walMode,verbose) to customize database behavior. Defaults to~/.codebuddy/codebuddy.db. - Schema & Migrations: Automatically runs
runMigrations()duringinitialize()to ensure the database schema is up-to-date. - Performance Optimizations: Applies
PRAGMAsettings likejournal_mode = WAL,synchronous = NORMAL,cache_size, andtemp_storefor better performance. - Raw Access: Provides direct access to the
better-sqlite3instance viagetDatabase()for advanced use cases or repository implementations. - Utility Operations: Offers methods for
exec,prepare,transaction,vacuum,backup, andclearAll(dangerous!). - Statistics:
getDatabaseStats()provides insights into database size, table counts, and data volumes. - Event Emitter: Extends
TypedEventEmitterto emit lifecycle events likedb:initialized,db:error,db:migration,db:closed, etc.
Usage:
import { getDatabaseManager, initializeDatabase } from 39;./database-manager.js39;;
async function setupDatabase() {
const dbManager = await initializeDatabase({ inMemory: true }); class="hl-cmt">// Or specify dbPath
console.log(39;Database initialized:39;, dbManager.isInitialized());
const stats = dbManager.getDatabaseStats();
console.log(dbManager.formatStats());
class="hl-cmt">// Get the raw database instance for direct queries (use with caution)
const db = dbManager.getDatabase();
db.exec(39;CREATE TABLE IF NOT EXISTS test (id INTEGER PRIMARY KEY)39;);
dbManager.close();
}
3.2. schema.ts
This file defines the entire database schema using SQL CREATE TABLE statements and corresponding TypeScript interfaces. It's the single source of truth for the database structure.
Key Contents:
SCHEMA_VERSION: A constant indicating the current version of the database schema.SCHEMA_SQL: A multi-line string containing allCREATE TABLEandCREATE INDEXstatements for the initial schema.MIGRATIONS: An object mapping schema versions to their respective SQL migration scripts. This allows for incremental schema updates.- TypeScript Interfaces:
Memory,MemoryTypeSession,MessageCodeEmbeddingToolStatsRepairLearningAnalyticsConventionCheckpoint,CheckpointFile- These interfaces ensure type safety when interacting with database entities.
Contribution Note: When adding new tables or modifying existing ones, update SCHEMA_VERSION, add a new entry to MIGRATIONS with the necessary ALTER TABLE or CREATE TABLE statements, and update/add corresponding TypeScript interfaces.
3.3. DatabaseMigration (src/database/migration.ts)
The DatabaseMigration class is responsible for migrating data from Code-Buddy's legacy JSON file-based storage to the new SQLite database. This is a critical step for users upgrading from older versions.
Responsibilities:
- Detecting Legacy Data:
needsMigration()andgetMigrationStatus()check for the existence of old JSON files (memories.json, sessions/, semantic-cache.json, cost-history.json). - Data Transformation: Reads old JSON data, transforms it into the new database schema format, and inserts it into the respective repositories.
- Progress & Error Reporting: Emits
progressandcompleteevents, and collects errors during the migration process. - Cleanup: Optionally renames old JSON files to
.migratedafter successful migration.
Migration Flow:
initializeDatabaseIntegration()(orinitializeDatabaseSystem()) callsneedsMigration().- If migration is needed,
runMigration()is invoked. DatabaseMigration.migrate()orchestrates the migration of:
memories.json(user scope)sessions/directory (individual session files)cache/semantic-cache.jsoncost-history.json(analytics)
- Each migration step uses the appropriate repository (
MemoryRepository,SessionRepository, etc.) to insert the data.
Usage:
import { needsMigration, runMigration, getMigrationStatus } from 39;./migration.js39;;
async function performMigrationIfNeeded() {
if (needsMigration()) {
console.log(39;Legacy data detected. Starting migration...39;);
const status = getMigrationStatus();
console.log(39;Files to migrate:39;, status.files.filter(f => f.exists).map(f => f.path));
const migrationResult = await runMigration({ verbose: true, deleteAfterMigration: true });
if (migrationResult.success) {
console.log(39;Migration complete:39;, migrationResult.migratedItems);
} else {
console.error(39;Migration failed:39;, migrationResult.errors);
}
} else {
console.log(39;No migration needed.39;);
}
}
3.4. Repositories (src/database/repositories/)
The repositories directory contains specialized classes for interacting with specific data entities. Each repository provides a clear API for CRUD operations and domain-specific queries. They all depend on DatabaseManager to get the underlying better-sqlite3 instance.
Common Patterns:
- Constructor: Takes an optional
Database.Databaseinstance, defaulting togetDatabaseManager().getDatabase(). - Singleton: Each repository has a
get...Repository()function andreset...Repository()for testing. - Serialization/Deserialization: Handles conversion of complex types (e.g.,
Float32Arrayfor embeddings,Recordfor metadata) to/from SQLite'sBLOBorTEXT(JSON) types.
3.4.1. MemoryRepository (memory-repository.ts)
Manages Memory entities, which represent persistent knowledge.
create(memory): Adds a new memory. HandlesON CONFLICTto update importance and access count if content already exists.getById(id): Retrieves a memory by its ID, updating access stats.find(filter): Searches memories based ontype,scope,projectId,minImportance,limit, andoffset.searchSimilar(embedding, filter, topK): Performs semantic search using cosine similarity against stored embeddings.update(id, updates): Modifies an existing memory.delete(id): Removes a memory.deleteExpired(): Cleans up memories past theirexpires_atdate.getStats(): Provides statistics on memory count, types, and importance.
3.4.2. SessionRepository (session-repository.ts)
Manages Session and Message entities, representing conversation history.
createSession(session): Creates a new conversation session.getSessionById(id): Retrieves a session.getSessionWithMessages(id): Retrieves a session along with all its messages.findSessions(filter): Searches sessions based onprojectId,isArchived,model, and pagination options.updateSessionStats(id, stats): Increments token counts, cost, and tool call counts for a session.setArchived(id, archived): Archives or unarchives a session.deleteSession(id): Deletes a session and its associated messages (due toON DELETE CASCADE).addMessage(message): Adds a message to a session, updating the session'smessage_count.getMessages(sessionId, limit): Retrieves messages for a given session.getRecentMessages(sessionId, limit): Retrieves the most recent messages for context.deleteMessages(sessionId, fromId): Deletes messages from a session, optionally from a specific message ID onwards.getStats(): Provides statistics on total sessions, messages, costs, and tokens.getCostByModel(): Summarizes costs per model used.
3.4.3. AnalyticsRepository (analytics-repository.ts)
Manages Analytics, ToolStats, and RepairLearning entities, tracking usage and learning.
recordAnalytics(data): Records daily aggregated analytics (tokens, cost, requests, errors, etc.). UsesON CONFLICTto update existing daily records.getAnalytics(filter): Retrieves analytics records.getDailySummary(days): Provides a summary of daily usage over a period.getTotalCost(filter): Calculates total cost for a given period/project.recordToolUsage(toolName, success, timeMs, cacheHit, projectId): Records statistics for tool invocations. UsesON CONFLICTto update existing tool stats.getToolStats(projectId): Retrieves tool usage statistics.getTopTools(limit): Identifies most used tools.recordRepairAttempt(errorPattern, errorType, strategy, success, attempts, options): Records outcomes of repair attempts, learning which strategies work for specific error patterns.getBestStrategies(errorPattern, filter, limit): Recommends repair strategies based on past success rates.getRepairStats(): Provides overall statistics on repair learning.deleteOldAnalytics(daysToKeep): Cleans up old analytics data.resetStats(): Clears all analytics and tool stats (for testing/reset).
3.4.4. EmbeddingRepository (embedding-repository.ts)
Manages CodeEmbedding entities, specifically for indexing and searching code.
upsert(embedding): Creates or updates a code embedding based onproject_id,file_path, andchunk_index.bulkUpsert(embeddings): Efficiently inserts/updates multiple embeddings within a transaction.getById(id): Retrieves an embedding by its ID.find(filter): Searches embeddings based onprojectId,filePath,symbolType,symbolName, andlanguage.searchSimilar(queryEmbedding, filter, topK): Performs semantic search for code chunks using cosine similarity.searchBySymbol(symbolName, filter, limit): Searches for code chunks by symbol name.deleteForFile(projectId, filePath): Removes all embeddings associated with a specific file.deleteForProject(projectId): Removes all embeddings for an entire project.deleteStale(projectId, existingFiles): Deletes embeddings for files that no longer exist in a project.needsReindex(projectId, filePath, contentHash): Checks if a file's content has changed, indicating a need for re-embedding.getStats(projectId): Provides statistics on total embeddings, files, and distribution by language/symbol type.
3.4.5. CacheRepository (cache-repository.ts)
Provides a general-purpose key-value cache with optional Time-To-Live (TTL) and semantic search capabilities.
get: Retrieves a cached value, updating its hit count. Returns(key) nullif expired or not found.set: Stores a value with an optional(key, value, options) ttlMs,category, andembedding.getOrCompute: Retrieves from cache or computes and stores if not found/expired.(key, computeFn, options) delete(key): Removes a specific cache entry.deleteByPattern(pattern): Deletes entries whose keys match a string pattern or RegExp.deleteByCategory(category): Deletes all entries belonging to a specific category.deleteExpired(): Cleans up all expired cache entries.clear(): Empties the entire cache.has(key): Checks if a non-expired key exists.keys(filter): Returns all active cache keys, optionally filtered by category.searchSimilar(queryEmbedding, category, topK): Performs semantic search within the cache for entries with embeddings.getStats(): Provides statistics on cache size, hits, and category distribution.getSizeEstimate(): Estimates the total size of cached data in bytes.
3.5. DatabaseIntegration (src/database/integration.ts)
The DatabaseIntegration class acts as a high-level facade, simplifying interactions with the database system for other parts of Code-Buddy. It orchestrates calls to multiple repositories and integrates with external modules.
Responsibilities:
- Orchestrated Initialization: Its
initialize()method handles: - Initializing the
DatabaseManager. - Running migrations if
autoMigrateis enabled. - Initializing the
EmbeddingProvider. - Unified API: Provides methods that often combine logic from several underlying repositories. For example:
addMemory(): Creates a memory and optionally generates its embedding usingEmbeddingProvider.searchMemories(): Can perform both keyword and semantic search (usingEmbeddingProviderandMemoryRepository).createSession(): Creates a session and records it inPersistentAnalytics.updateSessionStats(): Updates session stats and records analytics.indexCodeChunk(): Generates an embedding for a code chunk and stores it inEmbeddingRepository.getOrComputeCached(): UsesCacheRepositoryto retrieve or compute a value.- External Module Integration: Directly interacts with
EmbeddingProvider,PersistentLearning, andPersistentAnalyticsto provide a seamless experience. - Event Emitter: Emits
initialized,migration:starting,migration:complete, andwarningevents.
Usage:
import { getDatabaseIntegration, initializeDatabaseIntegration } from 39;./integration.js39;;
async function useDatabaseIntegration() {
const dbIntegration = await initializeDatabaseIntegration({
dbPath: 39;./my-codebuddy.db39;,
autoMigrate: true,
embeddingProvider: 39;local39;,
});
class="hl-cmt">// Memory operations
const memory = await dbIntegration.addMemory(39;How to write a good commit message39;, {
type: 39;instruction39;,
generateEmbedding: true,
});
const searchResults = await dbIntegration.searchMemories(39;commit best practices39;, { semantic: true, limit: 3 });
class="hl-cmt">// Session operations
const session = dbIntegration.createSession({ name: 39;Refactoring session39; });
dbIntegration.addMessage(session.id, 39;user39;, 39;Refactor this code...39;);
class="hl-cmt">// Cache operations
const cachedValue = await dbIntegration.getOrComputeCached(39;my_expensive_computation39;, async () => {
class="hl-cmt">// Simulate expensive computation
await new Promise(resolve => setTimeout(resolve, 100));
return { data: 39;computed result39; };
}, 60000); class="hl-cmt">// Cache for 1 minute
class="hl-cmt">// Code embedding operations
await dbIntegration.indexCodeChunk(39;my-project-id39;, 39;src/main.ts39;, 0, 39;function foo() { ... }39;, {
symbolType: 39;function39;,
symbolName: 39;foo39;,
});
console.log(dbIntegration.formatStats());
}
4. Usage Patterns
4.1. Initializing the Database System
The recommended way to initialize the entire database system is through the initializeDatabaseSystem function from src/database/index.ts. This ensures the DatabaseManager is set up, and all repositories are ready.
import { initializeDatabaseSystem } from 39;../database/index.js39;;
async function startApp() {
await initializeDatabaseSystem({
dbPath: process.env.CODEBUDDY_DB_PATH || undefined,
inMemory: process.env.NODE_ENV === 39;test39;,
verbose: process.env.DEBUG_DB === 39;true39;,
});
console.log(39;Database system ready.39;);
class="hl-cmt">// Now you can safely access repositories or DatabaseIntegration
}
4.2. Accessing Repositories
Once initialized, you can get singleton instances of any repository:
import { getMemoryRepository, getSessionRepository } from 39;../database/index.js39;;
const memoryRepo = getMemoryRepository();
const sessionRepo = getSessionRepository();
const memories = memoryRepo.find({ type: 39;fact39; });
const sessions = sessionRepo.getRecentSessions(5);
4.3. Using DatabaseIntegration
For higher-level operations that might span multiple data types or involve external services (like embedding generation), use DatabaseIntegration:
import { getDatabaseIntegration } from 39;../database/index.js39;;
const dbIntegration = getDatabaseIntegration(); class="hl-cmt">// Assumes initializeDatabaseSystem was called
async function performComplexOperation() {
const query = 39;how to handle errors in async functions39;;
const searchResults = await dbIntegration.searchMemories(query, { semantic: true, limit: 5 });
console.log(39;Semantic memory search results:39;, searchResults);
await dbIntegration.recordToolUsage(39;code_fixer39;, true, 1500, false, 39;my-project-id39;);
}
4.4. Resetting for Testing
For unit and integration tests, it's crucial to reset the database state between tests. The module provides resetDatabaseSystem() for this purpose.
import { initializeDatabaseSystem, resetDatabaseSystem } from 39;../database/index.js39;;
describe(39;Database tests39;, () => {
beforeEach(async () => {
class="hl-cmt">// Ensure a clean in-memory database for each test
await initializeDatabaseSystem({ inMemory: true });
});
afterEach(() => {
resetDatabaseSystem(); class="hl-cmt">// Clears all singletons and closes DB
});
test(39;should add and retrieve a memory39;, async () => {
const dbIntegration = getDatabaseIntegration();
const memory = await dbIntegration.addMemory(39;Test content39;, { type: 39;fact39; });
expect(memory.content).toBe(39;Test content39;);
});
});
5. Data Model (Schema)
The src/database/schema.ts file defines the following key tables:
schema_version: Tracks the current database schema version.memories: Stores persistent memories (facts, preferences, patterns, etc.) with optional vector embeddings, importance, and access statistics.sessions: Stores conversation sessions, including project context, model used, and aggregated cost/token statistics.messages: Stores individual messages within a session, linked bysession_id.code_embeddings: Stores vector embeddings for code chunks, along with file path, symbol information, and content hash for change detection.tool_stats: Tracks usage, success/failure rates, and performance of various tools.repair_learning: Stores learned strategies for fixing specific error patterns, including success rates and examples.analytics: Aggregates daily usage statistics (cost, tokens, requests) per project and model.conventions: (Future/Planned) Stores learned coding conventions.checkpoints: (Future/Planned) Stores metadata for project checkpoints.checkpoint_files: (Future/Planned) Stores individual file contents for checkpoints.cache: A general-purpose key-value cache with TTL and optional embeddings for semantic caching.
All tables include created_at and updated_at timestamps where appropriate, and many use TEXT fields for JSON storage of metadata. Embeddings are stored as BLOB (binary data) and deserialized into Float32Array by the repositories.
6. Integration Points
The src/database module is a foundational component, integrated across various parts of the Code-Buddy application:
src/embeddings/: TheDatabaseIntegrationandEmbeddingRepositorydirectly interact with theEmbeddingProviderto generate and store vector embeddings.src/learning/:DatabaseIntegrationandAnalyticsRepositoryare used byPersistentLearningto record repair attempts and tool usage.src/analytics/:DatabaseIntegrationandAnalyticsRepositoryare used byPersistentAnalyticsto record session costs, tokens, and other usage metrics.src/memory/prospective-memory.ts: UsesDatabaseManagerand potentiallyMemoryRepositoryfor managing tasks, goals, and reminders.src/utils/graceful-shutdown.ts: Registers a shutdown handler to ensureDatabaseManager.close()is called cleanly.server/routes/health.ts: UsesDatabaseManagerto check the database connection status for health checks.agent/specialized/sql-agent.ts: May useDatabaseManager.transaction()orexec()for direct SQL operations.commands/handlers/export-handlers.ts: UsesSessionRepositoryto export session data.
7. Contributing
When contributing to the src/database module, keep the following guidelines in mind:
- Schema Changes:
- If you need to modify the database schema (add a table, add a column, change a column type), do not directly modify
SCHEMA_SQL. - Increment
SCHEMA_VERSIONinsrc/database/schema.ts. - Add a new entry to the
MIGRATIONSobject with the SQL statements for your changes. - Update or create new TypeScript interfaces in
schema.tsto reflect the new structure. - Ensure your migration script is idempotent (can be run multiple times without error) if possible, though
better-sqlite3handlesIF NOT EXISTSwell. - Consider the impact on existing data and write appropriate
ALTER TABLEstatements.
- Repository Logic:
- All database interactions for a specific entity should be encapsulated within its dedicated repository.
- Avoid direct
db.prepare().run()calls outside of repository methods. - Handle serialization/deserialization of complex types (embeddings, JSON) within the repository.
- Write comprehensive unit tests for all new or modified repository methods.
DatabaseIntegration:
- Use
DatabaseIntegrationfor high-level workflows that combine operations from multiple repositories or external services. - Avoid adding simple CRUD methods to
DatabaseIntegrationif they can be directly accessed via a single repository. - Ensure
DatabaseIntegrationmethods are robust and handle potential errors from underlying services (e.g., embedding provider failures).
- Singleton Access:
- Always use the
get...Manager()orget...Repository()functions to obtain instances. Do not directlynewup these classes in application code (except within thegetfunctions themselves). - For testing, use
reset...Manager()orreset...Repository()inafterEachhooks to ensure a clean state.
- Error Handling:
- Database operations can fail. Ensure proper
try...catchblocks, especially ininitialize()and migration logic. - Emit relevant events (e.g.,
db:error) for critical failures.
By adhering to these guidelines, we can maintain a robust, scalable, and easy-to-understand database layer for Code-Buddy.