src — models
src — models
The src/models module is responsible for the discovery, management, and automatic download of Large Language Models (LLMs) from the HuggingFace Hub. It's designed to facilitate local inference by providing a robust system for handling GGUF-formatted models, including intelligent selection based on available hardware resources.
Purpose and Key Features
The primary goal of this module is to abstract away the complexities of finding, downloading, and managing LLM files. It ensures that the application can easily access the necessary models, optimizing for performance and resource availability.
Key features include:
- Automatic Model Discovery & Download: Seamlessly fetches models from HuggingFace Hub.
- VRAM-based Recommendations: Suggests and selects models and quantizations suitable for the host system's GPU memory.
- Quantization Selection: Supports various GGUF quantization types, allowing a trade-off between model size/speed and quality.
- Progress Tracking: Provides real-time updates during model downloads.
- Local Model Management: Scans for existing models, stores metadata, and allows deletion.
- Pre-defined Model Registry: Offers a curated list of
RECOMMENDED_MODELSoptimized for common use cases (e.g., coding, general purpose).
Core Concepts and Data Structures
The module defines several interfaces and constants that are central to its operation:
QuantizationType&QUANTIZATION_TYPES:- Defines the properties of different GGUF quantization levels (e.g.,
Q4_K_M,Q8_0,F16). QUANTIZATION_TYPESis a constant record mapping quantization names to their detailed properties, includingbitsPerWeight,qualityScore, anddescription. This is crucial for VRAM estimation and quality-based selection.ModelInfo:- Describes a specific LLM, including its
id,name,size,parameterCount,huggingFaceRepo,defaultQuantization,supportedQuantizations,contextLength,license, andtags. RECOMMENDED_MODELSis a constant record of pre-configuredModelInfoobjects, serving as the module's internal registry of known models.ModelSize: A union type defining common LLM parameter counts (e.g., "7b", "13b").DownloadProgress: An interface for tracking the status of an ongoing model download, includingdownloadedBytes,totalBytes,percentage,speed, andeta.DownloadedModel: Represents a model that has been successfully downloaded and stored locally, including itsid,path,quantization,sizeBytes, anddownloadedAttimestamp.ModelHubConfig&DEFAULT_MODEL_HUB_CONFIG:- Configures the
ModelHubinstance, specifying themodelsDir(where models are stored), an optionalhfTokenfor gated HuggingFace models,downloadTimeout,chunkSizefor streaming, andautoSelectQuantizationbehavior. DEFAULT_MODEL_HUB_CONFIGprovides sensible defaults, typically storing models in~/.codebuddy/models.
The ModelHub Class
The ModelHub class is the central component of this module, extending EventEmitter to provide progress and status updates.
graph TD
subgraph "src/models"
ModelHub[ModelHub Class]
RECOMMENDED_MODELS(RECOMMENDED_MODELS)
QUANTIZATION_TYPES(QUANTIZATION_TYPES)
end
subgraph "External"
GPUMonitor[src/hardware/gpu-monitor.ts::getGPUMonitor]
HuggingFace[HuggingFace Hub]
FileSystem[Node.js fs/path/os]
Logger[src/utils/logger.js::logger]
end
ModelHub -- uses --> RECOMMENDED_MODELS
ModelHub -- uses --> QUANTIZATION_TYPES
ModelHub -- calls --> GPUMonitor
ModelHub -- fetches from --> HuggingFace
ModelHub -- interacts with --> FileSystem
ModelHub -- logs via --> Logger
ModelHub -- emits events --> DownloadProgress(DownloadProgress)
ModelHub -- emits events --> DownloadedModel(DownloadedModel)
Initialization and Local Model Scanning
constructor(config?: Partial:) - Initializes the
ModelHubwith a merged configuration (default + user-provided). - Ensures the
modelsDirexists, creating it recursively if necessary. - Calls
scanLocalModels()to discover any.gguffiles already present in themodelsDir. scanLocalModels():- Reads the
modelsDirand identifies.gguffiles. - For each GGUF file, it extracts the
modelIdandquantizationtype from the filename usingextractModelIdFromFilenameandextractQuantizationFromFilename. - Stores the discovered models in an internal
Map. extractModelIdFromFilename(filename: string): Parses a GGUF filename to derive the base model ID, removing quantization suffixes and file extensions.extractQuantizationFromFilename(filename: string): Identifies the quantization type present in a GGUF filename by matching againstQUANTIZATION_TYPES.
Model Discovery and Listing
listModels(): ModelInfo[]: Returns the complete list ofRECOMMENDED_MODELSknown to the hub.listDownloaded(): DownloadedModel[]: Returns an array of all models currently downloaded and managed by the hub.getModelInfo(modelId: string): ModelInfo | null: Retrieves detailedModelInfofor a givenmodelIdfromRECOMMENDED_MODELS.getDownloaded(fileNameOrId: string): DownloadedModel | null: RetrievesDownloadedModelinformation, searching by exact filename or partial model ID.formatModelList(): string: Generates a human-readable string summarizing available and downloaded models, indicating their status and basic information.
Intelligent Model Selection
This module integrates with the gpu-monitor to make informed decisions about model and quantization selection.
getRecommendedModel(useCase: "code" | "general" | "fast" = "code"): Promise:- Fetches available VRAM using
getGPUMonitor().getStats(). - Filters
RECOMMENDED_MODELSbased on the specifieduseCasetags. - Sorts candidates by parameter count (preferring larger models that fit).
- Uses
estimateVRAM()to check if a model (withQ4_K_Mquantization) fits within 90% of available VRAM. - Returns the largest fitting model, or the smallest if none fit.
estimateVRAM(model: ModelInfo, quantization: string): number:- Calculates an approximate VRAM usage in MB for a given
modelandquantization. - The formula considers
parameterCount,bitsPerWeightfromQUANTIZATION_TYPES, and adds an overhead for the KV cache based oncontextLength. selectQuantization(model: ModelInfo, targetVRAM?: number): Promise:- If
autoSelectQuantizationis enabled in the config, this method determines the highest quality quantization (QUANTIZATION_TYPES.qualityScore) frommodel.supportedQuantizationsthat fits within thetargetVRAM(or 85% of detected free VRAM). - It iterates from highest to lowest quality, using
estimateVRAM()for each. - If no quantization fits, it defaults to the lowest quality supported.
Model Download Management
The ModelHub handles the entire download process, including resolving file URLs and tracking progress.
download(modelId: string, quantization?: string): Promise:- Retrieves
ModelInfofor themodelId. - Calls
selectQuantization()ifquantizationis not explicitly provided. - Checks if the model is already downloaded.
- Constructs the local
filePathand callsresolveDownloadUrl()to get the actual HuggingFace download link. - Emits
download:startevent. - Calls
downloadFile()to perform the actual download. - On completion, updates
downloadedModelsand emitsdownload:complete. - Emits
download:errorif any issue occurs. resolveDownloadUrl(model: ModelInfo, quantization: string): Promise:- This is a critical method for robust HuggingFace integration.
- It attempts to construct the download URL using common GGUF filename patterns (e.g.,
model-id-quant.gguf). - It performs
HEADrequests to verify if these URLs are valid. - As a fallback, it queries the HuggingFace API (
/api/models/{repo}/tree/main) to list files in the repository and find a matching GGUF file. - This ensures flexibility in case of varying filename conventions on HuggingFace.
downloadFile(url: string, filePath: string, fileName: string): Promise:- Performs the actual HTTP download using
fetch. - Supports
hfTokenfor authenticated downloads. - Streams the response body to
fs.createWriteStream(). - Continuously calculates and emits
download:progressevents. - Handles errors by unlinking partially downloaded files.
Model Deletion
delete(fileName: string): boolean:- Removes a downloaded model file from the file system.
- Updates the internal
downloadedModelsmap. - Emits a
deleteevent.
Configuration
getConfig(): ModelHubConfig: Returns a copy of the current configuration.updateConfig(config: Partial: Merges new configuration properties into the existing one.): void
Event Emitter
As an EventEmitter, ModelHub emits the following events:
download:start:{ modelId: string, fileName: string, quantization: string }download:progress:DownloadProgressdownload:complete:DownloadedModeldownload:error:{ modelId: string, error: Error }delete:{ fileName: string }
Developers can subscribe to these events to provide UI feedback or react to download lifecycle changes.
Utility Formatting
formatRecommendations(): Promise: Generates a formatted string showing VRAM recommendations for alllistModels(), indicating which quantizations fit the current system's VRAM.
Singleton Access
The module provides a singleton pattern for ModelHub to ensure a single, consistent instance across the application.
getModelHub(config?: Partial:): ModelHub - Returns the existing
ModelHubinstance if one exists. - If not, it creates a new
ModelHubinstance with the providedconfig(or defaults) and returns it. resetModelHub(): void:- Disposes of the current
ModelHubinstance (removing all event listeners). - Sets the singleton instance to
null, allowing a new instance to be created on the nextgetModelHubcall. This is primarily useful for testing or re-initialization scenarios.
External Integrations
The src/models module relies on several external components:
src/hardware/gpu-monitor.ts: Crucial forgetRecommendedModelandselectQuantizationto determine available VRAM. It callsgetGPUMonitor().initialize()andgetGPUMonitor().getStats().- HuggingFace Hub: The primary source for model downloads. The module interacts with HuggingFace via
fetchrequests to resolve file URLs and download model binaries. - Node.js
fs,path,os: Used extensively for file system operations (creating directories, reading files, writing streams, deleting files) and determining user home directory for default model storage. src/utils/logger.js: For structured logging of events and errors within the module.
Contributing and Extending
Developers looking to contribute to this module should be familiar with:
- Asynchronous programming with
async/await: Heavily used for network requests and file I/O. EventEmitterpattern: For handling download progress and status updates.- File system operations: Understanding
fsmodule functions. - HuggingFace Hub API/conventions: Especially for
resolveDownloadUrllogic. - GGUF quantization types: To accurately estimate VRAM and manage model quality.
To add new recommended models, update the RECOMMENDED_MODELS constant in model-hub.ts with the appropriate ModelInfo. Ensure the huggingFaceRepo and supportedQuantizations are accurate for the new model.