开源AI工具:虚拟化弹性KV缓存 是 AI Skill Hub 本期精选AI工具之一。已获得 1.0k 颗 GitHub Star,综合评分 7.5 分,整体质量较高。我们推荐使用将其纳入你的 AI 工具库,帮助提升工作效率。
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond,提供虚拟化弹性KV缓存功能,支持动态GPU共享和超越。
开源AI工具:虚拟化弹性KV缓存 是一款基于 Python 开发的开源工具,专注于 elastic-kvcache、gpu-mutiplexing、gpu-sharing 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond,提供虚拟化弹性KV缓存功能,支持动态GPU共享和超越。
开源AI工具:虚拟化弹性KV缓存 是一款基于 Python 开发的开源工具,专注于 elastic-kvcache、gpu-mutiplexing、gpu-sharing 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
# 方式一:pip 安装(推荐)
pip install kvcached
# 方式二:虚拟环境安装(推荐生产环境)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install kvcached
# 方式三:从源码安装(获取最新功能)
git clone https://github.com/ovg-project/kvcached
cd kvcached
pip install -e .
# 验证安装
python -c "import kvcached; print('安装成功')"
# 命令行使用
kvcached --help
# 基本用法
kvcached input_file -o output_file
# Python 代码中调用
import kvcached
# 示例
result = kvcached.process("input")
print(result)
# kvcached 配置文件示例(config.yml) app: name: "kvcached" debug: false log_level: "INFO" # 运行时指定配置文件 kvcached --config config.yml # 或通过环境变量配置 export KVCACHED_API_KEY="your-key" export KVCACHED_OUTPUT_DIR="./output"
<br> <br> <p> <a href="https://www.python.org/"><img alt="Python" src="https://img.shields.io/badge/Python-3.9%E2%80%933.13-blue"></a> <img alt="Engines" src="https://img.shields.io/badge/Engines-SGLang%20%7C%20vLLM-blueviolet"> <a href="https://yifanqiao.notion.site/Solve-the-GPU-Cost-Crisis-with-kvcached-289da9d1f4d68034b17bf2774201b141"><img alt="Blog" src="https://img.shields.io/badge/Blog-Read-FF5722?logo=rss&logoColor=white&labelColor=555555"></a> <a href="https://arxiv.org/abs/2508.08448"><img alt="arXiv: GPU OS vision" src="https://img.shields.io/badge/arXiv-GPU%20OS%20vision-b31b1b?logo=arxiv&logoColor=white&labelColor=555555"></a> <br> <a href="https://arxiv.org/abs/2505.04021"><img alt="arXiv: Multi LLM Serving" src="https://img.shields.io/badge/arXiv-Multi%20LLM%20Serving-b31b1b?logo=arxiv&logoColor=white&labelColor=555555"></a> <a href="https://join.slack.com/t/ovg-project/shared_invite/zt-3fr01t8s7-ZtDhHSJQ00hcLHgwKx3Dmw"><img alt="Slack Join" src="https://img.shields.io/badge/Slack-Join-4A154B?logo=slack&logoColor=white&labelColor=555555"></a> <a href="https://deepwiki.com/ovg-project/kvcached"><img alt="DeepWiki" src="https://img.shields.io/badge/DeepWiki-Docs-6B46C1?logo=book&logoColor=white&labelColor=555555"></a> <a href="LICENSE"><img alt="License" src="https://img.shields.io/badge/License-Apache_2.0-blue.svg"></a> </p>
</div>
<p align="center"> <img src="https://raw.githubusercontent.com/ovg-project/kvcached/refs/heads/main/assets/ads.jpg" alt="Make GPU Sharing Flexible and Easy" width="500" /> </p>
kvcached (KV cache daemon) is a KV cache library for LLM serving/training on shared GPUs. By bringing OS-style virtual memory abstraction to LLM systems, it enables elastic and demand-driven KV cache allocation, improving GPU utilization under dynamic workloads.
kvcached achieves this by decoupling GPU virtual addressing from physical memory allocation for KV caches. It allows serving engines to initially reserve virtual memory only and later back it with physical GPU memory when the cache is actively used. This decoupling enables on-demand allocation and flexible sharing, bringing better GPU memory utilization under dynamic and mixed workloads. Check out more details in the blog.
kvcached can be installed as a plugin with existing SGLang or vLLM environment.
pip install kvcached --no-build-isolation
```bash
kvcached installed with original engine dockers.
docker pull ghcr.io/ovg-project/kvcached-sglang:latest # kvcached-v0.1.5-sglang-v0.5.10
docker pull ghcr.io/ovg-project/kvcached-vllm:latest # kvcached-v0.1.5-vllm-v0.19.0
We prepare an all-in-one docker for developers:
docker pull ghcr.io/ovg-project/kvcached-dev:latest
More instructions can be found here. GB200 dockers are on the way.
|
|
Multi‑LLM serving kvcached allows multiple LLMs to share a GPU's memory elastically, enabling concurrent deployment without the rigid memory partitioning used today. This improves GPU utilization and saves serving costs. |
|
|
Serverless LLM By allocating KV cache only when needed, kvcached supports serverless deployments where models can spin up and down on demand. |
|
|
Compound AI systems kvcached makes compound AI systems practical on limited hardware by elastically allocating memory across specialized models in a pipeline (e.g., retrieval, reasoning, and summarization). |
|
|
GPU workload colocation kvcached allows LLM inference to coexist with other GPU workloads such as training jobs, fine-tuning, or vision models. |
</div>
See concrete examples here: kvcached/examples.
该项目提供了一个开源的虚拟化弹性KV缓存解决方案,支持动态GPU共享和AI推理引擎,值得关注。
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ Apache 2.0 — 宽松开源协议,可商用,需保留版权声明和 NOTICE 文件,含专利授权条款。
经综合评估,开源AI工具:虚拟化弹性KV缓存 在AI工具赛道中表现稳健,质量良好。如果你已有明确的使用需求,可以直接上手体验;如果还在评估阶段,建议对比同类工具后再做决策。
| 原始名称 | kvcached |
| Topics | elastic-kvcachegpu-mutiplexinggpu-sharinginference-enginekvcache |
| GitHub | https://github.com/ovg-project/kvcached |
| License | Apache-2.0 |
| 语言 | Python |
收录时间:2026-05-23 · 更新时间:2026-05-23 · License:Apache-2.0 · AI Skill Hub 不对第三方内容的准确性作法律背书。