能力标签
🛠
AI工具

TurboQuant

基于 Python · 开源免费,本地部署,数据完全自主可控
英文名:turboquant-pro
⭐ 19 Stars 🍴 1 Forks 💻 Python 📄 MIT 🏷 AI 8.0分
8.0AI 综合评分
AILLM缓存压缩
✦ AI Skill Hub 推荐

AI Skill Hub 强烈推荐:TurboQuant 是一款优质的AI工具。AI 综合评分 8.0 分,在同类工具中表现稳健。如果你正在寻找可靠的AI工具解决方案,这是一个值得深入了解的选择。

📚 深度解析

TurboQuant 是一款基于 Python 的开源工具,在 GitHub 上收获 0k+ Star,是AI、LLM、缓存压缩领域中的优质开源项目。开源工具的最大优势在于代码完全透明,你可以审计每一行代码的安全性,也可以根据自身需求进行二次开发和定制。

**为什么要使用开源工具而非商业 SaaS?**
对于个人开发者和有隐私需求的用户,本地部署的开源工具意味着数据不离本机,不受第三方服务商的数据政策约束。同时,开源工具通常没有使用次数限制和月度费用,一次安装即可长期使用,对于高频使用场景的总拥有成本(TCO)远低于订阅制商业工具。

**安装与环境准备**
TurboQuant 依赖 Python 运行环境。建议通过 pyenv(Python)或 nvm(Node.js)管理 Python 版本,避免全局环境污染。对于新手用户,推荐先创建虚拟环境(python -m venv venv && source venv/bin/activate),再安装依赖,这样即使出现问题也可以随时删除虚拟环境重新开始,不影响系统稳定性。

**社区与维护**
GitHub Issue 和 Discussion 是获取帮助的最快渠道。在提问前建议先检查 Closed Issues(已关闭的问题),大多数常见问题都已有解答。遇到 Bug 时,提供 pip list 的输出、完整错误堆栈和最小可复现示例,能显著提高开发者响应速度。AI Skill Hub 将持续追踪 TurboQuant 的版本更新,及时通知重要功能变化。

📋 工具概览

TurboQuant 是一款基于 Python 开发的开源工具,专注于 AI、LLM、缓存压缩 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。

GitHub Stars
⭐ 19
开发语言
Python
支持平台
Windows / macOS / Linux
维护状态
轻量级项目,按需更新
开源协议
MIT
AI 综合评分
8.0 分
工具类型
AI工具
Forks
1

📖 中文文档

以下内容由 AI Skill Hub 根据项目信息自动整理,如需查看完整原始文档请访问底部「原始来源」。

TurboQuant 是一款基于 Python 开发的开源工具,专注于 AI、LLM、缓存压缩 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。

📌 核心特色
  • 开源免费,支持本地部署,数据完全自主可控
  • 活跃的 GitHub 开源社区,持续迭代更新
  • 提供详细文档和使用示例,新手友好
  • 支持自定义配置,灵活适配不同使用环境
  • 可作为基础组件集成进现有技术栈或进行二次开发
🎯 主要使用场景
  • 本地部署运行,保护数据隐私,满足合规要求
  • 自定义集成到现有系统,扩展技术栈能力
  • 作为开源基础组件进行商业化二次开发
以下安装命令基于项目开发语言和类型自动生成,实际以官方 README 为准。
安装命令
# 方式一:pip 安装(推荐)
pip install turboquant-pro

# 方式二:虚拟环境安装(推荐生产环境)
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install turboquant-pro

# 方式三:从源码安装(获取最新功能)
git clone https://github.com/ahb-sjsu/turboquant-pro
cd turboquant-pro
pip install -e .

# 验证安装
python -c "import turboquant_pro; print('安装成功')"
📋 安装步骤说明
  1. 访问 GitHub 仓库页面
  2. 按照 README 文档完成依赖安装
  3. 根据系统环境完成初始化配置
  4. 参考官方示例或文档开始使用
  5. 遇到问题可在 GitHub Issues 中查找解答
以下用法示例由 AI Skill Hub 整理,涵盖最常见的使用场景。
常用命令 / 代码示例
# 命令行使用
turboquant-pro --help

# 基本用法
turboquant-pro input_file -o output_file

# Python 代码中调用
import turboquant_pro

# 示例
result = turboquant_pro.process("input")
print(result)
以下配置示例基于典型使用场景生成,具体参数请参照官方文档调整。
配置示例
# turboquant-pro 配置文件示例(config.yml)
app:
  name: "turboquant-pro"
  debug: false
  log_level: "INFO"

# 运行时指定配置文件
turboquant-pro --config config.yml

# 或通过环境变量配置
export TURBOQUANT_PRO_API_KEY="your-key"
export TURBOQUANT_PRO_OUTPUT_DIR="./output"
📑 README 深度解析 真实文档 完整度 86/100 含工作流图 查看 GitHub 原文 →
以下内容由系统直接从 GitHub README 解析整理,保留代码块、表格与列表结构。

TurboQuant Pro

PyPI version PyPI Downloads Python versions Tests Code style: black Linting: ruff License: MIT DOI

PCA-Matryoshka dimension reduction + TurboQuant scalar quantization for embedding compression, LLM KV caches, model weight pruning, pgvector, FAISS, and NATS transport.

Up to 27x embedding compression at 99.8% recall@10 (with 5x oversampling + reranking — all methods benchmarked identically). At ~30x compression turboquant-pro beats the 2024 SOTA (RaBitQ) on recall and ties OPQ — at 1M-vector scale — while building the index 4–20x faster. Learned codebooks reduce quantization error 22%. 397 tests. Multi-modal (text, vision, audio, code). Production observability. Works on consumer GPUs (Volta+) and CPU.

Important: Cosine similarity to the original vector is not a reliable proxy for retrieval quality at high compression. Our own data shows PCA-256+TQ3 has lower cosine (0.963) but higher recall@10 (78.2%) than PCA-384+TQ3 (0.979 cosine, 76.4% recall). Always evaluate on task-relevant retrieval metrics.

What's New in v1.0.0

  • Learned codebook fine-tuning (LearnedQuantizer): Train codebooks on your actual data instead of assuming Gaussian. fit_codebook(embeddings) returns a ready quantizer. Pushes cosine similarity from 0.978 to 0.99+ at the same bit-width.
  • Multi-modal compression (ModalityPreset): Pre-configured presets for text (BGE-M3, E5, ada-002), vision (CLIP, SigLIP), audio (Whisper), and code (CodeBERT, CodeLlama) embeddings. Per-modality optimal PCA + bit-width recommendations.
  • Production observability (QualityMonitor): Rolling-window cosine similarity tracking, KS-test drift detection, alert callbacks, Prometheus-compatible metrics. Know when compression quality degrades in production.

Installation

```bash pip install turboquant-pro

Build any component

tq = cfg.build_quantizer() # TurboQuantKV cache = cfg.build_cache() # TurboQuantKVCache rq = cfg.build_rope_quantizer() # RoPEAwareQuantizer mgr = cfg.build_manager() # TurboQuantKVManager (all layers)

Quick Start

```python from turboquant_pro import TurboQuantKV

Fit PCA on a sample of embeddings (5-10K vectors is sufficient)

pca = PCAMatryoshka(input_dim=1024, output_dim=384) result = pca.fit(sample_embeddings) print(f"Variance explained: {result.total_variance_explained:.1%}")

Auto-configure from model name — picks optimal K/V bits, RoPE-awareness

tq = TurboQuantKV.from_model("llama-3-8b") # balanced (K4/V3) tq = TurboQuantKV.from_model("gemma-2-27b", target="compression") # K3/V2

compressed_k = tq.compress(kv_key_tensor, packed=True, kind="key") # 4-bit keys compressed_v = tq.compress(kv_val_tensor, packed=True, kind="value") # 3-bit values key_approx = tq.decompress(compressed_k) # cos_sim > 0.995 (keys) val_approx = tq.decompress(compressed_v) # cos_sim > 0.978 (values)


Or manually:
python tq = TurboQuantKV(head_dim=256, n_heads=16, bits=3, use_gpu=False) compressed = tq.compress(kv_tensor, packed=True) # 5.1x smaller reconstructed = tq.decompress(compressed) # cos_sim > 0.978 ```

Auto-Config API

Auto-detect model architecture and select optimal compression:

```python from turboquant_pro import AutoConfig

Works from a HuggingFace config dict too

cfg = AutoConfig.from_dict(model.config.to_dict(), target="compression") ```

Target presets:

TargetConfigKey CosSimRatioUse case
qualityK4/V4 + RoPE0.9953.8xMaximum accuracy
balancedK4/V3 + RoPE0.995 / 0.9784.3x**Recommended default**
compressionK3/V2 + RoPE0.978 / 0.9415.8xMemory-constrained
extremeK2/V20.9417.1xMaximum compression

Supported models: LLaMA 3 (8B, 70B), Gemma 2 (9B, 27B), Gemma 4 27B-A4B (262K context MoE), Qwen 2.5 (7B, 72B), Mistral 7B. Any HuggingFace model works via transformers.AutoConfig.

Eigenvalue-Weighted Mixed Precision (v0.9.0)

Theory: After PCA, early dimensions explain most variance. Spending 4 bits on high-eigenvalue dimensions and 2 bits on the tail gives better quality than uniform 3-bit at the same average storage.

How it works: pca.with_weighted_quantizer(avg_bits=3.0) auto-computes the bit schedule from cumulative variance thresholds (top 60% variance -> 4-bit, next 30% -> 3-bit, bottom 10% -> 2-bit). Each segment gets its own quantizer with the appropriate codebook.

Result: At 2.8 avg bits, beats uniform 3-bit (0.962 vs 0.958) in 7% less storage.

Unified Auto-Config API (v0.9.1)

How it works: TurboQuantKV.from_model("llama-3-8b") reads head_dim, n_kv_heads, rope_theta, and max_position_embeddings from a built-in model registry (or HuggingFace Hub), then selects optimal key_bits, value_bits, and RoPE-aware settings based on a target preset (quality/balanced/compression/extreme).

Config Ratio Cosine Recall Var% Time

PCA-128 + TQ2 113.8x 0.9237 78.7% 79.9% 2.2s PCA-256 + TQ3 41.0x 0.9700 92.0% 92.3% 0.7s PCA-384 + TQ4 20.9x 0.9906 96.0% 97.3% 0.6s PCA-512 + TQ4 15.8x 0.9949 96.3% 99.0% 0.6s

Recommendation (min recall >= 95%): PCA-384 + TQ4: 20.9x compression, 96.0% recall@10 ```

Integration Options

Feature Reference

A complete guide to every feature in TurboQuant Pro, the theory behind it, and when to use it.

Component map

flowchart TB subgraph API["Public API"] AC[AutoConfig.from_pretrained] TQ[TurboQuantKV] PCA[PCAMatryoshka] LQ[LearnedQuantizer] end subgraph Build["Built by AutoConfig"] CACHE[TurboQuantKVCache] RQ[RoPEAwareQuantizer] MGR[TurboQuantKVManager] end subgraph Index["Retrieval"] HNSW[CompressedHNSW] CACHE2[L2 Embedding Cache] end subgraph Ops["Production"] QM[QualityMonitor
drift detection] EXP[Cross-framework Export
FAISS / Milvus / Qdrant / Weaviate] end AC --> TQ AC --> CACHE AC --> RQ AC --> MGR PCA --> TQ LQ --> TQ TQ --> HNSW TQ --> CACHE2 TQ --> EXP TQ --> QM classDef api fill:#e3f2fd,stroke:#1565c0; classDef build fill:#fff3e0,stroke:#e65100; classDef ops fill:#f3e5f5,stroke:#6a1b9a; class AC,TQ,PCA,LQ api; class CACHE,RQ,MGR build; class HNSW,CACHE2,QM,EXP ops;

Additional Components

  • Streaming KV Cache (v0.3.0): Two-tier L1 hot / L2 cold cache for autoregressive generation.
  • NATS Transport Codec (v0.3.0): Compressed wire format for NATS JetStream events (392 bytes vs 4096 bytes per embedding).
  • vLLM Plugin (v0.5.0): TurboQuantKVManager multi-layer cache for vLLM integration.
  • FAISS Integration (v0.5.0): TurboQuantFAISS wraps FAISS with auto PCA compression.
  • Rust pgext (v0.5.0): Native PostgreSQL extension with tqvector type and <=> operator.
  • Autotune CLI (v0.5.0): turboquant-pro autotune finds optimal compression in ~10 seconds.
  • Model Weight Compression (v0.6.0-v0.7.0): SVD and activation-space PCA for LLM weight pruning.

---

Create the full pipeline: PCA-384 + TurboQuant 3-bit

pipeline = pca.with_quantizer(bits=3) # ~27x compression

FAISS Integration

Wrap FAISS indices with automatic PCA compression:

from turboquant_pro import PCAMatryoshka
from turboquant_pro.faiss_index import TurboQuantFAISS

pca = PCAMatryoshka(input_dim=1024, output_dim=384)
pca.fit(sample_embeddings)

index = TurboQuantFAISS(pca, index_type="ivf", n_lists=100)
index.add(corpus)  # Auto PCA-compressed
distances, ids = index.search(query, k=10)  # Auto PCA-rotated
print(index.stats())  # 2.7x smaller index

Supports Flat, IVF, and HNSW. Save/load indices to disk.

Native PostgreSQL Extension (Rust + CUDA)

The pgext/ directory contains a native PostgreSQL extension written in Rust (pgrx) that adds the tqvector data type directly to PostgreSQL — no Python needed.

-- Compress your entire table in one command
CREATE TABLE embeddings_tq AS
SELECT id, tq_compress(embedding::float4[], 3) AS tqv
FROM embeddings;

-- Search with cosine distance operator
SELECT id, tqv <=> tq_compress(query::float4[], 3) AS dist
FROM embeddings_tq ORDER BY dist LIMIT 10;

-- Check compression
SELECT tq_dim(tqv), tq_bits(tqv), tq_ratio(tqv) FROM embeddings_tq LIMIT 1;
-- 1024, 3, 10.6

Production benchmark (194K BGE-M3 1024-dim vectors on Atlas):

MetricResult
Compression speed23,969 vec/sec
Storage (original)5,237 MB
Storage (compressed)169 MB
Compression ratio31x (including table overhead)
Rust unit tests12 passing

Build and install:

cd pgext
cargo install cargo-pgrx && cargo pgrx init --pg16 $(which pg_config)
cargo pgrx install --release
psql -c "CREATE EXTENSION tqvector;"

Optional GPU acceleration: cargo build --features gpu (requires CUDA 12.0+, cudarc).

See pgext/README.md for full API documentation.

PostgreSQL integration

tq.create_compressed_table(conn, "embeddings_compressed") tq.insert_compressed(conn, "embeddings_compressed", ids, embeddings) results = tq.search_compressed(conn, "embeddings_compressed", query, top_k=10) ```

Storage savings (1024-dim BGE-M3, 3-bit, no PCA truncation):

TurboQuant 3-bit alone compresses each vector from 4,096 to ~388 bytes (10.5x):

CorpusVectorsOriginalCompressed
RAG chunks112K437 MB41 MB
Ethics2.4M9,375 MB893 MB
Publications824K3,222 MB307 MB

Components

ClassPurpose
PCAMatryoshkaPCA rotation + truncation for dimension reduction
PCAMatryoshkaPipelineCombined PCA + TurboQuant end-to-end pipeline
TurboQuantKVStateless compress/decompress with optional bit-packing
TurboQuantKVCacheStreaming L1/L2 tiered cache for autoregressive inference
TurboQuantKVManagerMulti-layer KV cache manager (vLLM plugin)
TurboQuantFAISSFAISS index wrapper with auto PCA compression
TurboQuantPGVectorCompress pgvector embeddings for PostgreSQL storage
TurboQuantNATSCodecEncode/decode embeddings for NATS transport
run_autotuneSweep configs and recommend optimal compression
ModelCompressorSVD analysis + low-rank compression of model FFN weights

vLLM KV Cache Plugin

Multi-layer KV cache manager with hot/cold tiering:

```python from turboquant_pro.vllm_plugin import TurboQuantKVManager

mgr = TurboQuantKVManager( n_layers=32, n_kv_heads=8, head_dim=128, bits=3, hot_window=512 )

Benchmarks vs SOTA (real data, all methods reranked identically)

At 32× compression, recall@10 on real LaBSE / multilingual-Gutenberg embeddings (RESULTS_labse_199k.md, RESULTS_gutenberg_1m.md):

methodrecall@10 (single)recall@10 (+rerank)index build
PQ0.4670.827142 s
IVF-PQ0.4960.756355 s
RaBitQ (2024 SOTA)0.6300.9620.3 s
OPQ0.7800.999632 s
**turboquant-pro****0.784****0.9992****31 s**

turboquant-pro beats the 2024 binary-quantization SOTA (RaBitQ) at both operating points and ties OPQ, at 4–20× lower index build cost — and this holds at 1M scale (tq-pro 0.989 +rerank, tying OPQ). Fast search: the AVX2 ADC kernel (turboquant_pro/_adc/) reproduces this recall (0.9995 +rerank) at 3802 qps7.9× faster than naive flat-reconstruct and competitive with ScaNN — at 96 bytes, training-free (see docs/DESIGN_fast_adc.md). Full honest evaluation of every feature: COMPREHENSIVE_ANALYSIS.md.

🎯 aiskill88 AI 点评 A 级 2026-06-21

高性能AI工具,优化LLM和向量数据库

⚡ 核心功能

👥 适合人群

AI 技术爱好者研究人员和学生开发者和工程师技术创业者

🎯 使用场景

  • 本地部署运行,保护数据隐私,满足合规要求
  • 自定义集成到现有系统,扩展技术栈能力
  • 作为开源基础组件进行商业化二次开发

⚖️ 优点与不足

✅ 优点
  • +MIT 协议,可免费商用
  • +完全开源免费,无授权费用
  • +本地部署,数据完全自主可控
  • +开发者社区支持,遇问题可查可问
⚠️ 不足
  • 安装和初始配置可能需要一定技术基础
  • 功能完整性通常不如成熟商业产品
  • 技术支持主要依赖开源社区,响应速度不稳定
⚠️ 使用须知

AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。

建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。

📄 License 说明

✅ MIT 协议 — 最宽松的开源协议之一,可自由商用、修改、分发,仅需保留版权声明。

🔗 相关工具推荐

🧩 你可能还需要
基于当前 Skill 的能力图谱,自动补全的工具组合

❓ 常见问题 FAQ

参考项目README和文档
💡 AI Skill Hub 点评

总体来看,TurboQuant 是一款质量优秀的AI工具,在同类工具中具备一定竞争力。AI Skill Hub 将持续追踪其更新动态,建议收藏备用,结合自身场景选择合适时机引入使用。

📚 深入学习 TurboQuant
查看分步骤安装教程和完整使用指南,快速上手这款工具
🌐 原始信息
原始名称 turboquant-pro
Topics AILLM缓存压缩
GitHub https://github.com/ahb-sjsu/turboquant-pro
License MIT
语言 Python
🔗 原始来源
🐙 GitHub 仓库  https://github.com/ahb-sjsu/turboquant-pro

收录时间:2026-06-21 · 更新时间:2026-06-21 · License:MIT · AI Skill Hub 不对第三方内容的准确性作法律背书。

📺 订阅 AI Skill Hub Daily Telegram 频道
每天 8 条精选 AI Skill、MCP、Agent 与自动化工具推送
加入频道 →