经 AI Skill Hub 精选评估,QuillCache 获评「推荐使用」。这款AI工具在功能完整性、社区活跃度和易用性方面表现出色,AI 评分 7.5 分,适合有一定技术背景的用户使用。
QuillCache是开源AI工具,提供分布式KV缓存池和控制平面,用于LLM服务。它提高了AI模型的可靠性和性能,适合于大规模AI应用。
QuillCache 是一款基于 Rust 开发的开源工具,专注于 ai、kv-cache、rust 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
QuillCache是开源AI工具,提供分布式KV缓存池和控制平面,用于LLM服务。它提高了AI模型的可靠性和性能,适合于大规模AI应用。
QuillCache 是一款基于 Rust 开发的开源工具,专注于 ai、kv-cache、rust 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
# 方式一:cargo install(推荐) cargo install quillcache # 方式二:从源码编译 git clone https://github.com/feichai0017/quillcache cd quillcache cargo build --release # 二进制在 ./target/release/quillcache
# 查看帮助 quillcache --help # 基本运行 quillcache [options] <input> # 详细使用说明请查阅文档 # https://github.com/feichai0017/quillcache
# quillcache 配置说明 # 查看配置选项 quillcache --config-example > config.yml # 常见配置项 # output_dir: ./output # log_level: info # workers: 4 # 环境变量(覆盖配置文件) export QUILLCACHE_CONFIG="/path/to/config.yml"
QuillCache is a Mooncake-style distributed KV cache pool and control plane for LLM serving, written in Rust — replicating the architecture of NVIDIA Dynamo and Moonshot's Mooncake, plus two properties the production data planes leave implicit: identity-governed safe reuse and a crash-consistent persistent tier.
QuillCache sits beside real inference engines (vLLM, SGLang) and owns the KV cache as a resource:
- a byte pool — DRAM + SSD tiers that hold real KV block bytes, with capacity-driven demotion and eviction; - a transfer engine — moves blocks between nodes (TCP today, RDMA reserved); - a residency index — maps each block (by identity) to where it lives (node + tier), persistent so it survives a restart; - a control plane / Conductor — routes requests cache-aware (the Dynamo KV-router cost function), governs reuse, and meters SLO.
It does not run models — no transformer kernels, no attention. The CUDA tier moves and quantizes KV bytes (the data path), not inference compute.
cargo build cargo test
cd crates/quillcache-cuda && cargo build --features cuda ```
```bash
QuillCache replicates the production reference designs piece by piece, then adds its differentiation on top:
| Mooncake / Dynamo | QuillCache | Status |
|---|---|---|
| Mooncake Store (pooled DRAM/SSD KV) | LocalKvStore + PooledStore | ✅ real bytes |
| Mooncake Transfer Engine | quillcache-transfer | ✅ TCP / ⊙ RDMA reserved |
| Conductor / scheduler | quillcache-control + router | ✅ |
| Dynamo KV-router cost function | DynamoCostRouter | ✅ reproduces the worked example |
| Dynamo KVBM tiers (G1/G2/G3) | StoreDataPlane (HBM/DRAM/SSD) | ✅ moves real bytes |
| Dynamo KV-Cache Indexer | residency index (Holt ART) | ✅ persistent |
| Dynamo etcd / service discovery | NodeRegistry (StaticRegistry) | ✅ etcd pluggable |
| — *(neither does this)* | **identity guard + crash-consistency** | 🎯 differentiation |
Everything here is real code — there is no simulation (the earlier cost-model sims were removed). The honest distinction is how far each piece is integrated:
- ✅ wired online & measured — gateway, control plane, Dynamo-cost routing, persistent residency index, StoreDataPlane moving real bytes across HBM/DRAM/SSD, the identity guard, live SLO goodput, and the ART-vs-LSM storage study. - ▣ tested unit (not yet on the online gateway path) — PooledStore cross-node fetch over TCP, and LocalKvStore::recover crash recovery. Both are covered by tests; wiring them into the live gateway needs an engine KV-connector for the engine⟷pool byte handoff. - ⊙ reserved / needs hardware — RdmaTransfer (behind the rdma feature) and the CUDA device tier (build quillcache-cuda with --features cuda on a GPU box). Both are real interfaces, stubbed/fallback so the default build is hardware-free.
cargo test — 45 tests pass; cargo fmt --check and cargo clippy are clean.
The residency / prefix index is written on every KV event and read on every request (longest reusable prefix); a persistent control plane needs it on disk. Which storage engine fits a prefix-heavy, write-frequent index? Measured on the same trace via cargo run --features "rocksdb holt" -- bench-index:
| backend | ingest | prefix_scan p50 | p99 | recovery | on-disk | write-amp |
|---|---|---|---|---|---|---|
| memory (flat map) | 706k/s | 494 µs | 1685 µs | — | 0 | — |
| rocksdb (LSM) | 56k/s | 16.8 µs | 29.6 µs | 4.1 ms | small | **10.6×** |
| **holt (ART)** | 55k/s | **9.96 µs** | **13.7 µs** | **2.6 ms** | larger | **1.0×** |
ART gives the lowest prefix-scan latency (~1.7× faster than LSM at p50, ~50× faster than the flat map's O(N) scan), the fastest recovery, and 1× write amplification (append-only — it writes each record once); LSM is far more space-efficient on disk but pays 10.6× write amplification (compaction rewrites). Write amplification is measured from RocksDB's own flush/compaction statistics, not assumed. Pick ART when prefix-scan latency and recovery dominate (the common case for a residency index queried per request), pick LSM when disk footprint is the constraint.
cargo run --features "rocksdb holt" -- bench-index --backend holt cargo run --features "rocksdb holt" -- bench-index --backend rocksdb
QuillCache是一个值得关注的开源AI工具,提供分布式KV缓存池和控制平面,用于LLM服务。它提高了AI模型的可靠性和性能,适合于大规模AI应用。但是,需要注意的是,QuillCache依赖于Rust环境和Cargo依赖包的安装。
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ MIT 协议 — 最宽松的开源协议之一,可自由商用、修改、分发,仅需保留版权声明。
AI Skill Hub 点评:QuillCache 的核心功能完整,质量良好。对于AI 技术爱好者来说,这是一个值得纳入个人工具库的选择。建议先在非生产环境试用,再逐步推广。
| 原始名称 | quillcache |
| Topics | aikv-cacherust |
| GitHub | https://github.com/feichai0017/quillcache |
| License | MIT |
| 语言 | Rust |
收录时间:2026-06-12 · 更新时间:2026-06-12 · License:MIT · AI Skill Hub 不对第三方内容的准确性作法律背书。