Krasis 是 AI Skill Hub 本期精选AI工具之一。综合评分 8.2 分,整体质量较高。我们强烈推荐将其纳入你的 AI 工具库,帮助提升工作效率。
Krasis 是一款基于 C++ 开发的开源工具,专注于 cpu-inference、gpu-inference、high-performance-inference 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
Krasis 是一款基于 C++ 开发的开源工具,专注于 cpu-inference、gpu-inference、high-performance-inference 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
# 克隆仓库 git clone https://github.com/brontoguana/krasis cd krasis # 查看安装说明 cat README.md # 按 README 完成环境依赖安装后即可使用
# 查看帮助 krasis --help # 基本运行 krasis [options] <input> # 详细使用说明请查阅文档 # https://github.com/brontoguana/krasis
# krasis 配置说明 # 查看配置选项 krasis --config-example > config.yml # 常见配置项 # output_dir: ./output # log_level: info # workers: 4 # 环境变量(覆盖配置文件) export KRASIS_CONFIG="/path/to/config.yml"
Krasis is an LLM runtime for running large MoE models on NVIDIA consumer GPUs. It is built around fast GPU prompt processing, GPU-executed decode, and HCS expert residency management so models much larger than VRAM can still run locally.
The current runtime is no longer the early Python-hot-path prototype. The serving path is Rust/CUDA focused: Python is used for launcher/setup/model loading work, while the performance-sensitive runtime path uses Rust/CUDA orchestration, CUDA kernels, cached quantized weights, and measured VRAM budgeting.
You can contact me here, but for bugs, setup problems, model requests, or feature requests please open a GitHub issue.
If you want to monitor Krasis during runs, check out ktop.

- Krasis currently targets NVIDIA GPUs with CUDA, including Ampere and newer architectures. The production HQQ attention and compact KV cache modes do not require FP8 support. - Input models should be BF16 safetensors from Hugging Face or another local safetensors source. - First run is slower because Krasis builds optimized local caches. Later runs reuse those caches. - Disk usage must cover the source model plus Krasis cache artifacts under ~/.krasis. - System RAM should be sized for the selected quantized cache and HCS backing store. Larger models need substantial RAM even when GPU VRAM is limited. - Production runs should use quantized INT4/INT8 expert caches and HQQ attention. BF16-heavy modes are validation/debug modes, not normal deployment targets.
krasis-setup
This installs runtime CUDA/PyTorch dependencies when needed. It is usually only required once per machine.
Linux/WSL:
curl -sSf https://raw.githubusercontent.com/brontoguana/krasis/main/install.sh | bash
This creates a managed environment at ~/.krasis/venv, installs Krasis, symlinks commands into ~/.local/bin, and updates PATH for the current shell. No sudo is required for the Krasis install itself.
Native Windows preview:
Download KrasisSetup-*-win64.exe from a GitHub release. The installer creates a per-user install under %LOCALAPPDATA%\Programs\Krasis, installs a private Python runtime and Krasis environment, and adds a Start Menu shortcut. The shortcut opens a maximized PowerShell window running the interactive Krasis launcher. Models and caches still live under %USERPROFILE%\.krasis.
The first native Windows target covers the Marlin/FlashAttention sidecar path. FLA/linear-attention models still need a separate native Windows FLA sidecar port before they should be treated as supported on Windows.
curl -sSf https://raw.githubusercontent.com/brontoguana/krasis/main/install.sh | bash -s -- --uninstall ```
For development builds:
git clone https://github.com/brontoguana/krasis.git
cd krasis
./dev build
./dev run qcn
The ./dev entry point handles environment setup and is preferred for local development commands.
krasis --non-interactive
krasis --config tests/qcn-k4v4-hqq8-int4-benchmark.conf
Krasis exposes an OpenAI-compatible chat endpoint:
http://localhost:8012/v1/chat/completions
Useful endpoints:
GET /healthGET /v1/modelsPOST /v1/timing高性能LLM运行时,支持多种硬件加速
该工具使用 NOASSERTION 协议,商用场景请仔细阅读协议条款,必要时咨询法律意见。
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
📄 NOASSERTION — 请查阅原始协议条款了解具体使用限制。
经综合评估,Krasis 在AI工具赛道中表现稳健,质量优秀。如果你已有明确的使用需求,可以直接上手体验;如果还在评估阶段,建议对比同类工具后再做决策。
| 原始名称 | krasis |
| Topics | cpu-inferencegpu-inferencehigh-performance-inferencehybrid-inference |
| GitHub | https://github.com/brontoguana/krasis |
| License | NOASSERTION |
| 语言 | C++ |
收录时间:2026-07-05 · 更新时间:2026-07-05 · License:NOASSERTION · AI Skill Hub 不对第三方内容的准确性作法律背书。