src — sync
src — sync
The src/sync module provides comprehensive capabilities for managing data synchronization and backup, primarily focusing on cloud integration but also offering a generic state synchronization framework. It is designed to ensure data consistency, handle conflicts, and provide robust backup and restore functionalities for application data.
This module is divided into two main parts:
src/sync/cloud: Focuses on file-level synchronization and backup with various cloud storage providers (S3, GCS, Azure, Local).src/sync/index.ts: Offers a more abstract, state-based synchronization mechanism using vector clocks for conflict detection and resolution, suitable for synchronizing arbitrary application data structures.
1. Cloud Synchronization and Backup (src/sync/cloud)
This sub-module provides high-level APIs for managing file-based synchronization and backups to and from cloud storage. It abstracts away the complexities of interacting with different cloud providers, handling encryption, compression, and conflict resolution.
1.1 Architecture Overview
The cloud sync and backup system is built around a few core components:
CloudStorage: An abstract class defining the interface for cloud storage operations.- Provider Implementations: Concrete classes like
LocalStorage,S3Storage,GCSStorage, andAzureBlobStoragethat implementCloudStoragefor specific providers. CloudSyncManager: Handles file-level synchronization between local directories and cloud storage.BackupManager: Manages the creation, restoration, and lifecycle of backups in cloud storage.createCloudSyncSystem: A convenience factory function to instantiate and manage bothCloudSyncManagerandBackupManagertogether.
graph TD
A[createCloudSyncSystem] --> B(CloudSyncManager)
A --> C(BackupManager)
B --> D(CloudStorage)
C --> D
D --> E{CloudConfig}
E --> F(LocalStorage)
E --> G(S3Storage)
E --> H(GCSStorage)
E --> I(AzureBlobStorage)
1.2 Cloud Storage Abstraction (src/sync/cloud/storage.ts)
The CloudStorage class provides a unified interface for interacting with various cloud storage services. It handles common concerns like client-side encryption and key derivation.
Key Features:
- Provider Agnostic: The
createCloudStoragefactory function instantiates the correct storage implementation based on theCloudConfig.provider. - Client-Side Encryption: If
CloudConfig.encryptionKeyis provided, data is encrypted using AES-256-GCM before upload and decrypted after download. The key is derived using SHA256 from the passphrase. - Path Prefixing: Supports adding a
prefixto all object keys, allowing for logical separation within a bucket. - Core Operations:
upload(key: string, data: Buffer, metadata?: Record: Uploads data to a specified key.) download(key: string): Downloads data from a specified key.delete(key: string): Deletes an object.list(options?: ListOptions): Lists objects within a prefix.exists(key: string): Checks if an object exists.getMetadata(key: string): Retrieves object metadata.
Implementations:
LocalStorage: Stores data on the local filesystem. Useful for development, testing, or local-only backups. It simulates cloud behavior by creating directories and storing metadata in.metafiles.S3Storage,GCSStorage,AzureBlobStorage: These currently provide mock implementations. In a production environment, they would integrate with their respective cloud SDKs (e.g.,@aws-sdk/client-s3). They log operations but do not perform actual cloud interactions in the provided code.
1.3 Cloud Sync Manager (src/sync/cloud/sync-manager.ts)
The CloudSyncManager is responsible for synchronizing local files and directories with cloud storage. It supports automatic synchronization, various sync directions, and conflict resolution strategies.
Configuration (SyncManagerConfig, SyncConfig, SyncItem):
cloud:CloudConfigfor the underlying storage.sync:SyncConfigdefines sync behavior:autoSync: Enables/disables automatic sync on an interval.syncInterval: Frequency of automatic syncs.direction:'push','pull', or'bidirectional'.conflictResolution: Strategy for handling conflicts ('local','remote','newest','manual').items: An array ofSyncItemobjects, each specifying a local path, remote path, and type (e.g., 'sessions', 'memory').excludePatterns: Glob-like patterns to ignore files/directories.compression,encryption: Flags for data processing (encryption is handled byCloudStorage).
Core Synchronization Flow (sync() method):
- State Management: Updates internal
SyncStateand emitssync_startedevent. - Item Iteration: Loops through all enabled
SyncItems. - Delta Calculation (
calculateDelta):
scanLocalFiles: Recursively scans local directories, computes checksums, and records modification times.scanRemoteFiles: Lists objects in cloud storage, fetches metadata (including checksums and modification times).- Compares local and remote file lists to identify:
toUpload: Local files that are new or newer than remote.toDownload: Remote files that are new or newer than local.conflicts: Files with same modification time but different content, or concurrent modifications.
- Execution based on
direction:
'push': OnlyuploadFilesare processed.'pull': OnlydownloadFilesare processed.'bidirectional': Conflicts are resolved first, thenuploadFilesanddownloadFilesare processed in parallel.
- Conflict Resolution (
resolveConflict): Applies the configuredconflictResolutionstrategy. For'manual', conflicts are left for external handling viaresolveConflictManually. - Data Transfer (
uploadFiles,downloadFiles):
- Reads/writes files from/to the local filesystem.
- Applies compression (gzip) if enabled.
- Interacts with
CloudStoragefor actual cloud operations. - Emits
item_uploadedanditem_downloadedevents.
- Cleanup: Updates
lastSyncfor each item. - Event Emission: Emits
sync_completedorsync_failedevents.
Eventing:
CloudSyncManager extends TypedEventEmitterAdapter, providing type-safe events for various sync lifecycle stages (e.g., sync:started, sync:completed, sync:item_uploaded, sync:conflict_detected). It also maintains backward compatibility by emitting generic 'sync-event' and specific legacy events.
Versioning:
getVersionHistory(path: string): Retrieves a list of available versions for a given remote path. This relies on theCloudStorageimplementation providing versioning capabilities (currently mocked for S3/GCS/Azure).restoreVersion(path: string, versionId: string): Downloads and restores a specific version of a file to the local filesystem.
1.4 Cloud Backup Manager (src/sync/cloud/backup-manager.ts)
The BackupManager handles the creation, restoration, and management of application backups in cloud storage. It supports automatic backups, compression, and retention policies.
Configuration (BackupManagerConfig, BackupConfig):
cloud:CloudConfigfor the underlying storage.backup:BackupConfigdefines backup behavior:autoBackup: Enables/disables automatic backups on an interval.backupInterval: Frequency of automatic backups.maxBackups: Maximum number of backups to retain.items: An array of local file/directory paths to include in the backup.compressionLevel: Zlib compression level (0-9).splitSize: Optional size threshold to split large backups into multiple parts.
Core Backup Flow (createBackup() method):
- Backup ID Generation: A unique ID is generated using date and UUID.
- Item Collection (
collectItem,collectDirectory):
- Recursively scans specified local paths.
- Reads file content, calculates checksums, and stores metadata (
BackupItem). - Combines all collected data into a single
Buffer.
- Compression (
compress): Compresses the combined data using gzip. - Manifest Creation: Generates a
BackupManifestcontaining metadata about the backup (ID, creation time, items, sizes, checksums, encryption status). - Upload (
uploadBackup,uploadSplitBackup):
- If
splitSizeis configured and the compressed data exceeds it, the backup is split into multiple parts and uploaded individually. - Otherwise, the entire compressed data is uploaded as a single file (
backup.dat). - The
manifest.jsonfile is always uploaded alongside the data.
- Cleanup (
cleanupOldBackups): Deletes older backups based on themaxBackupsretention policy. - Event Emission: Emits
backup_started,backup_progress, andbackup_createdevents.
Other Operations:
listBackups(): Retrieves a list of available backups by downloading and parsing manifest files.getBackupManifest(backupId: string): Fetches a specific backup's manifest.restoreBackup(backupId: string, targetPath: string, options?):- Downloads backup data (handling split parts).
- Verifies checksums for data integrity.
- Decompresses the data.
- Extracts individual items and writes them to the
targetPath, respectingoverwriteanditemsfilters. - Emits
restore_started,restore_progress,restore_completed, orrestore_errorevents. deleteBackup(backupId: string): Deletes all files associated with a backup (data parts and manifest).verifyBackup(backupId: string): Checks for the existence of manifest and data files/parts.exportBackup(backupId: string, outputPath: string): Downloads a backup and its manifest to local files.importBackup(inputPath: string): Reads a local backup file and its manifest, then uploads it to cloud storage.
1.5 Convenience Functions (src/sync/cloud/index.ts)
The src/sync/cloud/index.ts file serves as the main entry point for cloud-related sync and backup features. It re-exports all public APIs and provides a powerful factory function:
createCloudSyncSystem(config): This function simplifies the setup by creating both aCloudSyncManagerand aBackupManagerwith sensible default configurations. It returns an object withsyncandbackupproperties, along with convenience methods likestartAll(),stopAll(), anddispose().- Configuration Helpers:
createLocalConfig(basePath?): Generates aCloudConfigfor local storage.createS3Config(options): Generates aCloudConfigfor S3.createDefaultSyncItems(): Provides a standard set of items forCloudSyncManager(e.g., sessions, memory, settings).createDefaultBackupItems(): Provides a standard set of items forBackupManager.
2. Generic State Synchronization (src/sync/index.ts)
This module provides a lower-level, more abstract synchronization framework designed for arbitrary application states (e.g., JSON objects). It implements a distributed synchronization model using vector clocks to track causality and resolve conflicts.
2.1 Key Concepts
SyncState: The fundamental unit of synchronization. It encapsulates the actual data (T), along with metadata like a uniqueid,version,timestamp,hash(for content integrity),lastModifiedBy(node ID), and crucially, avectorClock.VectorClock: A map where keys are node IDs and values are integers. It's used to track the causal history of a state across different nodes in a distributed system.SyncOperation: Records local changes (create, update, delete) toSyncStateobjects, including the state ID, data, timestamp, node ID, and the vector clock at the time of the operation. These operations form a log of pending changes.SyncConflict: Represents a situation where local and remote states for the same ID have diverged concurrently, as determined by their vector clocks. It includes thelocalState,remoteState, andconflictType.ConflictResolutionStrategy: An interface for defining how conflicts should be automatically resolved.
2.2 Vector Clock Operations
The module provides a set of utility functions for working with vector clocks:
createVectorClock(nodeId: string): Initializes a new vector clock for a given node.incrementVectorClock(clock: VectorClock, nodeId: string): Increments the count for a specific node in a vector clock. This is crucial for marking a state as "seen" by a node.mergeVectorClocks(clock1: VectorClock, clock2: VectorClock): Combines two vector clocks by taking the maximum value for each node ID. This represents the common knowledge of both clocks.compareVectorClocks(clock1: VectorClock, clock2: VectorClock): Compares two vector clocks to determine their causal relationship:'before','after','concurrent', or'equal'.isVectorClockDominated(clock1: VectorClock, clock2: VectorClock): Checks ifclock1is causally "before" or "equal" toclock2.
2.3 State Management
createSyncState: Creates a new(data: T, nodeId: string, existingClock?: VectorClock) SyncStateobject. It generates a unique ID, initializes its version and timestamp, and sets up its vector clock (incrementing anexistingClockif provided, or creating a new one).updateSyncState: Creates a new version of an existing(state: SyncState , data: T, nodeId: string) SyncState. It increments the version, updates the timestamp, recomputes the hash, and increments the vector clock for the modifying node.computeHash(data: unknown): Generates a SHA256 hash of the state's data, used for quick content comparison.generateId(prefix: string): Utility for generating unique IDs.
2.4 Conflict Detection and Resolution
detectConflict: This core function determines if two states with the same ID are in conflict. It uses(localState: SyncState , remoteState: SyncState ) compareVectorClocksto check for concurrent updates (where neither state causally precedes the other).- Conflict Resolution Strategies:
LastWriteWinsStrategy: Resolves conflicts by choosing the state with the most recenttimestamp.LocalWinsStrategy: Always prefers the local state.RemoteWinsStrategy: Always prefers the remote state.MergeStrategy: Attempts to merge two object states by combining their properties. For conflicting primitive values, it defaults to the local value. This strategy is only applicable to object types.
2.5 Sync Manager (SyncManager)
The SyncManager class orchestrates the entire state synchronization process.
Configuration (SyncConfig):
nodeId: A unique identifier for the current node.conflictStrategy: The default strategy to use for automatic conflict resolution.autoSync: Enables/disables automatic synchronization.syncInterval: Interval for auto-sync ticks.persistPath: Local path to store the manager's state (states, pending operations).
Core Functionality:
- State Storage: Internally maintains a
Mapfor all local states.> - Pending Operations: Stores a list of
SyncOperations representing local changes that need to be synchronized with other nodes. - Persistence (
save(),load()): UsesUnifiedVfsRouterto persist the manager's internal state (allSyncStateobjects andpendingOperations) to a local JSON file. This ensures state is preserved across application restarts. - CRUD Operations: Provides
createState(),getState(),updateState(), anddeleteState()methods for managing localSyncStateobjects. Each modification automatically adds aSyncOperationtopendingOperationsand triggers asave(). applyRemoteState(remoteState: SyncState: This is the heart of reconciliation for a single state.)
- If the
remoteStateis new, it's simply added. - If a local state exists,
detectConflictis called. - If no conflict, vector clocks determine if local or remote is newer.
- If a conflict is detected, the configured
conflictStrategyis applied. If the strategy cannot resolve it (e.g., 'manual' strategy), the conflict is reported, and the manager's status changes to'conflict'.
reconcile(remoteStates: SyncState: Orchestrates the application of multiple remote states, calling[]) applyRemoteStatefor each. It returns aReconciliationResultdetailing applied states, conflicts, and pending operations.resolveConflict(conflict: SyncConflict: Allows for manual resolution of conflicts, providing options to choose local, remote, or a custom merged data., resolution: ConflictResolution, customData?: T) - Auto-Sync:
startAutoSync()andstopAutoSync()manage an interval timer that emitsauto-sync-tickevents, allowing external logic to trigger reconciliation. - Export/Import:
exportState()andimportState()facilitate transferring the entire set of local states. - Eventing: Extends
EventEmitterand emits events likestatus-changed,state-created,state-updated,conflict-detected,conflict-resolved,saved,loaded, etc.
2.6 Singleton Access
getSyncManager: Provides a singleton instance of(config?: SyncConfig) SyncManager. If called multiple times, it returns the same instance, ensuring a single source of truth for state synchronization within the application.resetSyncManager(): Disposes of the current singleton instance and allows a new one to be created.
3. Integration and Usage
The src/sync module offers two distinct but complementary synchronization paradigms:
- File-level sync/backup (
src/sync/cloud): Ideal for synchronizing and backing up user data files, configuration directories, or any file-based assets to cloud storage. UsecreateCloudSyncSystemfor a quick setup. - Generic state sync (
src/sync/index.ts): Best suited for synchronizing structured application data (e.g., settings objects, session states, internal models) across different instances of the application, especially in scenarios requiring robust conflict resolution and version tracking. UsegetSyncManagerto access the singleton.
Developers can choose the appropriate manager based on their data type and synchronization requirements. For example, a Code Buddy application might use CloudSyncManager to back up user-generated code files and SyncManager to synchronize internal application settings or session metadata across different devices.
The src/sync/cloud/index.ts module also provides createDefaultSyncItems() and createDefaultBackupItems() which define common paths for Code Buddy's internal data (e.g., .codebuddy/sessions, .codebuddy/memory, .codebuddy/settings), demonstrating how these modules are intended to be used within the application.
Example Usage (Conceptual)
import { createCloudSyncSystem, createS3Config, createDefaultSyncItems, createDefaultBackupItems } from 39;./sync/cloud/index.js39;;
import { getSyncManager } from 39;./sync/index.js39;;
async function setupSync() {
class="hl-cmt">// 1. Cloud File Sync & Backup
const cloudConfig = createS3Config({
bucket: 39;my-codebuddy-data39;,
region: 39;us-east-139;,
accessKeyId: process.env.AWS_ACCESS_KEY_ID,
secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,
encryptionKey: 39;my-super-secret-key39;,
});
const cloudSystem = createCloudSyncSystem({
cloud: cloudConfig,
sync: {
autoSync: true,
syncInterval: 60000, class="hl-cmt">// Every minute
direction: 39;bidirectional39;,
items: createDefaultSyncItems(),
},
backup: {
autoBackup: true,
backupInterval: 3600000, class="hl-cmt">// Every hour
maxBackups: 5,
items: createDefaultBackupItems(),
},
});
cloudSystem.sync.onTypedSyncEvent(39;sync:completed39;, (event) => {
console.log(`Cloud sync completed: ${event.result.itemsSynced} items.`);
});
cloudSystem.backup.on(39;backup_created39;, (event) => {
console.log(`Cloud backup created: ${event.backup.id}`);
});
cloudSystem.startAll();
console.log(39;Cloud sync and backup started.39;);
class="hl-cmt">// 2. Generic State Synchronization
interface AppSettings {
theme: string;
fontSize: number;
lastOpenedFile: string;
}
const settingsSyncManager = getSyncManager<AppSettings>({
nodeId: 39;user-desktop-client39;,
conflictStrategy: 39;last-write-wins39;,
persistPath: 39;.codebuddy/sync/settings39;,
});
settingsSyncManager.on(39;state-updated39;, (state) => {
console.log(`Settings updated: ${state.id} - ${JSON.stringify(state.data)}`);
});
class="hl-cmt">// Create or update a setting
let mySettings = settingsSyncManager.getState(39;app-settings39;);
if (!mySettings) {
mySettings = settingsSyncManager.createState({
theme: 39;dark39;,
fontSize: 14,
lastOpenedFile: 39;/home/user/project/main.ts39;,
});
} else {
settingsSyncManager.updateState(mySettings.id, {
...mySettings.data,
fontSize: 16,
});
}
class="hl-cmt">// Simulate receiving remote states (e.g., from another client)
const remoteSettings: AppSettings = {
theme: 39;light39;,
fontSize: 14,
lastOpenedFile: 39;/home/user/project/index.js39;,
};
const remoteSyncState = settingsSyncManager.createState(remoteSettings); class="hl-cmt">// In a real scenario, this would come from a remote source
class="hl-cmt">// Reconcile with remote states
const reconciliationResult = await settingsSyncManager.reconcile([remoteSyncState]);
if (reconciliationResult.conflicts.length > 0) {
console.warn(39;Conflicts detected during state reconciliation:39;, reconciliationResult.conflicts);
}
class="hl-cmt">// Clean up on shutdown
process.on(39;beforeExit39;, () => {
cloudSystem.dispose();
settingsSyncManager.dispose();
console.log(39;Sync systems disposed.39;);
});
}
setupSync().catch(console.error);