经 AI Skill Hub 精选评估,llmfit AI技能包 获评「强烈推荐」。在 GitHub 上收获超过 26.4k 颗 Star,这款AI工具在功能完整性、社区活跃度和易用性方面表现出色,AI 评分 8.8 分,适合有一定技术背景的用户使用。
支持数百个开源AI模型和多个推理框架的统一CLI工具。自动检测硬件配置,一条命令快速找到适配模型并运行。适合想在本地部署大模型、但不熟悉配置的开发者和AI爱好者。
llmfit AI技能包 是一款基于 Rust 开发的开源工具,专注于 本地推理、模型管理、硬件适配 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
支持数百个开源AI模型和多个推理框架的统一CLI工具。自动检测硬件配置,一条命令快速找到适配模型并运行。适合想在本地部署大模型、但不熟悉配置的开发者和AI爱好者。
llmfit AI技能包 是一款基于 Rust 开发的开源工具,专注于 本地推理、模型管理、硬件适配 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
# 方式一:cargo install(推荐) cargo install llmfit # 方式二:从源码编译 git clone https://github.com/AlexsJones/llmfit cd llmfit cargo build --release # 二进制在 ./target/release/llmfit
# 查看帮助 llmfit --help # 基本运行 llmfit [options] <input> # 详细使用说明请查阅文档 # https://github.com/AlexsJones/llmfit
# llmfit 配置说明 # 查看配置选项 llmfit --config-example > config.yml # 常见配置项 # output_dir: ./output # log_level: info # workers: 4 # 环境变量(覆盖配置文件) export LLMFIT_CONFIG="/path/to/config.yml"
<p align="center"> <img src="assets/icon.svg" alt="llmfit icon" width="128" height="128"> </p>
<p align="center"> <b>English</b> · <a href="README.zh.md">中文</a> · <a href="README.ja.md">日本語</a> </p>
<p align="center"> <a href="https://github.com/AlexsJones/llmfit/actions/workflows/ci.yml"><img src="https://github.com/AlexsJones/llmfit/actions/workflows/ci.yml/badge.svg" alt="CI"></a> <a href="https://crates.io/crates/llmfit"><img src="https://img.shields.io/crates/v/llmfit.svg" alt="Crates.io"></a> <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue.svg" alt="License"></a> <a href="https://about.signpath.io"><img src="https://img.shields.io/badge/SignPath-signed-brightgreen?logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIxNiIgaGVpZ2h0PSIxNiIgZmlsbD0id2hpdGUiIHZpZXdCb3g9IjAgMCAxNiAxNiI+PHBhdGggZD0iTTEwLjA2NyA0LjU2N2wtNC43MzQgNC43MzMtMS40LTEuNGExIDEgMCAwIDAtMS40MTQgMS40MTRsMi4xIDIuMWExIDEgMCAwIDAgMS40MTQgMGw1LjQ0LTUuNDRhMSAxIDAgMCAwLTEuNDE0LTEuNDE0eiIvPjwvc3ZnPg==" alt="Signed with SignPath"></a> </p>
New: Community Leaderboard — Browse real-world performance data from actual users. Pressbto see measured tok/s, TTFT, and VRAM for any GPU — not just yours. Pick from 27+ hardware presets (RTX 5090 to Apple M1) withHto compare real numbers before you buy or build.
Hundreds of models & providers. One command to find what runs on your hardware.
A terminal tool that right-sizes LLM models to your system's RAM, CPU, and GPU. Detects your hardware, scores each model across quality, speed, fit, and context dimensions, and tells you which ones will actually run well on your machine.
Ships with an interactive TUI (default) and a classic CLI mode. Supports multi-GPU setups, MoE architectures, dynamic quantization selection, speed estimation, and local runtime providers (Ollama, llama.cpp, MLX, Docker Model Runner, LM Studio).
New: Community Leaderboard (b) — See real-world tok/s, TTFT, and VRAM usage from other users running the same hardware as you. Powered by localmaxxing.com, this bridges the gap between estimated and actual performance.
Also: Download Manager (D), Advanced Configuration (A), and Hardware Simulation — Press D to manage downloads, view history, delete models, and configure the download directory. Press A to tune TPS efficiency, run mode factors, and scoring weights. Press S to simulate different hardware.
Sister projects: - sympozium — managing agents in Kubernetes. - llmserve — a simple TUI for serving local LLM models. Pick a model, pick a backend, serve it. - llama-panel — a native macOS app for managing local llama-server instances.

---
llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --quant mlx-4bit llmfit plan "Qwen/Qwen3-4B-MLX-4bit" --context 8192 --target-tps 25 --json
cargo login cargo publish
Before publishing, make sure:
- The version in `Cargo.toml` is correct (bump with each release).
- A `LICENSE` file exists in the repo root. Create one if missing:
sh
| Crate | Purpose |
|---|---|
clap | CLI argument parsing with derive macros |
sysinfo | Cross-platform RAM and CPU detection |
serde / serde_json | JSON deserialization for model database |
tabled | CLI table formatting |
colored | CLI colored output |
ureq | HTTP client for runtime/provider API integration |
ratatui | Terminal UI framework |
crossterm | Terminal input/output backend for ratatui |
---
ollama serve or the Ollama desktop app)http://localhost:11434 (Ollama's default API port)docker run ghcr.io/alexsjones/llmfit This prints JSON from llmfit recommend command. The JSON could be further queried with jq. podman run ghcr.io/alexsjones/llmfit recommend --use-case coding | jq '.models[].name'
llmfit integrates with Docker Model Runner, Docker Desktop's built-in model serving feature.
Requirements:
http://localhost:12434How it works:
GET /engines to list models available in Docker Model Runnerai/<tag> naming)d in the TUI pulls via docker model pullTo connect to Docker Model Runner on a different host or port, set the DOCKER_MODEL_RUNNER_HOST environment variable:
DOCKER_MODEL_RUNNER_HOST="http://192.168.1.100:12434" llmfit
```sh
llmfit recommend --json --use-case coding --limit 3
Press A to open the Advanced Configuration popup. This panel lets you tune the parameters behind TPS estimation, run mode penalties, and composite scoring — addressing issue #449 where tok/s was overestimated for certain models (e.g., Qwen3 30B).
All changes are applied immediately and the model table is recalculated. Close with Esc to accept or Ctrl-R to reset to defaults.
| Field | Description | Default |
|---|---|---|
| **Efficiency** | Global efficiency factor for bandwidth-based TPS. Accounts for overhead | 0.55 |
| **GPU factor** | Speed multiplier for pure GPU inference | 1.0 |
| **CPU Offload** | Speed multiplier when weights spill to system RAM | 0.5 |
| **MoE Offload** | Speed multiplier for Mixture-of-Experts expert switching | 0.8 |
| **Tensor Par** | Speed multiplier for tensor-parallel inference | 0.9 |
| **CPU Only** | Speed multiplier for CPU-only execution | 0.3 |
| **Context cap** | Max context length used for memory estimation (leave blank for default) | auto |
| Key | Action |
|---|---|
Tab / j / k | Switch between fields |
Type digits / . | Edit the selected field |
Left / Right | Move cursor within the field |
Backspace / Delete | Remove characters |
Ctrl-U | Clear the current field |
Enter | Apply changes and recalculate all scores |
Esc / q | Close without applying |
export LOCALMAXXING_API_KEY="bhk_your_key_here" llmfit
llmfit --api-key "bhk_your_key_here" ```
| Variable | Description |
|---|---|
LOCALMAXXING_API_KEY | Bearer token for localmaxxing.com API |
llmfit bench --provider ollama --url http://my-server:11434 llama3.2
llmfit bench --provider vllm --url http://localhost:8000
Use --cli or any subcommand to get classic table output:
```sh
llmfit serve --host 0.0.0.0 --port 8787 ```
llmfit serve starts an HTTP API that exposes the same fit/scoring data used by TUI/CLI, including filtering and top-model selection for a node.
```sh
curl "http://localhost:8787/api/v1/models/top?limit=5&min_fit=good&use_case=coding"
python3 scripts/test_api.py --spawn
OLLAMA_HOST="http://192.168.1.100:11434" llmfit --cli OLLAMA_HOST="http://192.168.1.100:11434" llmfit fit --perfect -n 5 ```
This is useful for: - Running llmfit on one machine while Ollama serves from another (e.g., GPU server + laptop client) - Connecting to Ollama running in Docker containers with custom ports - Using Ollama behind reverse proxies or load balancers
If your LM Studio instance has Require API Key enabled (required for MCP server access), set the LMSTUDIO_API_KEY environment variable to provide a Bearer token with all requests:
export LMSTUDIO_API_KEY="your-api-key-here"
llmfit
llmfit supports multiple local runtime providers:
mlx-community/* repos on HuggingFace, not the original model publisherWhen more than one compatible provider is available for a model, pressing d in the TUI opens a provider picker modal.
llmfit integrates with Ollama to detect which models you already have installed and to download new ones directly from the TUI.
llmfit integrates with llama.cpp as a runtime/download provider in both TUI and CLI.
Requirements:
llama-cli or llama-server available in PATH (for runtime detection)How it works:
| Variable | Default | Description |
|---|---|---|
LLAMA_CPP_PATH | *(none)* | Directory containing llama.cpp binaries (llama-cli, llama-server). Checked before PATH lookup. |
LLAMA_SERVER_PORT | 8080 | Port used when probing a running llama-server health endpoint for runtime detection. |
If llama.cpp is installed in a non-standard location, set LLAMA_CPP_PATH so llmfit can find it without requiring it in your PATH.
llmfit integrates with LM Studio as a local model server with built-in model download capabilities.
Requirements:
http://127.0.0.1:1234How it works:
GET /v1/models to list models available in LM Studiod in the TUI triggers a download via POST /api/v1/models/downloadGET /api/v1/models/download-statusllmfit ships as an OpenClaw skill that lets the agent recommend hardware-appropriate local models and auto-configure Ollama/vLLM/LM Studio providers.
If you're looking for a different approach, check out llm-checker -- a Node.js CLI tool with Ollama integration that can pull and benchmark models directly. It takes a more hands-on approach by actually running models on your hardware via Ollama, rather than estimating from specs. Good if you already have Ollama installed and want to test real-world performance. Note that it doesn't support MoE (Mixture-of-Experts) architectures -- all models are treated as dense, so memory estimates for models like Mixtral or DeepSeek-V3 will reflect total parameter count rather than the smaller active subset.
---
项目 llmfit 是一个用于推荐和部署大型语言模型的工具。它提供了一个易于使用的界面来发现、部署和管理大型语言模型。
项目 llmfit 需要特定的硬件配置来支持某些模型配置。用户需要根据模型配置来规划硬件资源。
安装 llmfit 可以通过以下方式进行: 1. 使用 Docker:`docker run ghcr.io/alexsjones/llmfit` 2. 使用 pip:`pip install llmfit` 3. 从源码编译:`cargo build`
使用 llmfit 可以通过以下命令进行: 1. `llmfit recommend`:推荐模型 2. `llmfit plan`:规划硬件资源 3. `llmfit bench`:性能测试
llmfit 支持通过环境变量和 CLI flags 来配置参数。 环境变量:`LOCALMAXXING_API_KEY`、`LOCALMAXXING_API_URL` CLI flags:`--api-key`、`--url`
llmfit 提供了一个 API 来访问模型和硬件资源。 用户可以通过 `llmfit --api-key` 来获取 API token。
llmfit 支持多种本地运行时提供者,包括 Ollama、llama.cpp 和 MLX。 用户可以通过 `llmfit --provider` 来选择提供者。 Ollama 集成:llmfit 与 Ollama 集成来检测用户已经安装的模型和下载新模型。 llama.cpp 集成:llmfit 与 llama.cpp 集成来提供运行时和下载服务。 MLX 集成:llmfit 与 MLX 集成来提供 Apple Silicon 模型缓存和服务器支持。
高质量开源项目,解决本地模型部署的核心痛点,Rust高性能实现,活跃维护,社区热度高。
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ MIT 协议 — 最宽松的开源协议之一,可自由商用、修改、分发,仅需保留版权声明。
AI Skill Hub 点评:llmfit AI技能包 的核心功能完整,质量优秀。对于AI 技术爱好者来说,这是一个值得纳入个人工具库的选择。建议先在非生产环境试用,再逐步推广。
| 原始名称 | llmfit |
| 原始描述 | 开源AI工具:Hundreds of models & providers. One command to find what runs on your hardware.。⭐26.4k · Rust |
| Topics | 本地推理模型管理硬件适配多框架支持GGUF格式 |
| GitHub | https://github.com/AlexsJones/llmfit |
| License | MIT |
| 语言 | Rust |
收录时间:2026-05-18 · 更新时间:2026-05-19 · License:MIT · AI Skill Hub 不对第三方内容的准确性作法律背书。