经 AI Skill Hub 精选评估,苹果芯片LLM运行时 获评「强烈推荐」。这款AI工具在功能完整性、社区活跃度和易用性方面表现出色,AI 评分 8.0 分,适合有一定技术背景的用户使用。
支持Gemma 4和Qwen 3.6 MTP模式的开源AI工具
苹果芯片LLM运行时 是一款基于 Rust 开发的开源工具,专注于 ai-interface、gemma4、generative-ai 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
支持Gemma 4和Qwen 3.6 MTP模式的开源AI工具
苹果芯片LLM运行时 是一款基于 Rust 开发的开源工具,专注于 ai-interface、gemma4、generative-ai 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
# 方式一:cargo install(推荐) cargo install ax-engine # 方式二:从源码编译 git clone https://github.com/defai-digital/ax-engine cd ax-engine cargo build --release # 二进制在 ./target/release/ax-engine
# 查看帮助 ax-engine --help # 基本运行 ax-engine [options] <input> # 详细使用说明请查阅文档 # https://github.com/defai-digital/ax-engine
# ax-engine 配置说明 # 查看配置选项 ax-engine --config-example > config.yml # 常见配置项 # output_dir: ./output # log_level: info # workers: 4 # 环境变量(覆盖配置文件) export AX_ENGINE_CONFIG="/path/to/config.yml"
AX Engine is a Mac-first LLM inference runtime, local server, SDK layer, and benchmark toolkit for Apple Silicon. It runs direct-support MLX model families natively, and routes other MLX text models or non-MLX models through explicit mlx-lm and llama.cpp compatibility routes.
AX Engine is for developers who want a local OpenAI-compatible model server on Apple Silicon without hiding which runtime path is doing the work.
Install:
pip install "ax-engine[download]"
Download a small model and start the server:
MODEL_DIR="$(ax-engine download mlx-community/Qwen3-4B-4bit --json | python3 -c 'import json,sys; print(json.load(sys.stdin)["dest"])')"
ax-engine serve "$MODEL_DIR" --port 8080
High-memory model shortcuts:
```bash
ax-engine download qwen36-35b --dest /Volumes/Models/qwen36-35b
from ax_engine import download_model path = download_model("mlx-community/Qwen3.6-35B-A3B-4bit")
Built-in download aliases:
| Alias | Repo |
|---|---|
| `qwen36-35b` | `mlx-community/Qwen3.6-35B-A3B-4bit` |
| `qwen36-27b`, `qwen36-27b-5bit`, `qwen36-27b-6bit`, `qwen36-27b-8bit` | `mlx-community/Qwen3.6-27B-{4,5,6,8}bit` |
| `gemma4-e2b`, `gemma4-e2b-5bit`, `gemma4-e2b-6bit`, `gemma4-e2b-8bit` | `mlx-community/gemma-4-e2b-it-{4,5,6,8}bit` |
| `gemma4-12b`, `gemma4-12b-6bit` | `mlx-community/gemma-4-12B-it-{4,6}bit` |
| `gemma4-26b` | `mlx-community/gemma-4-26b-a4b-it-4bit` |
| `gemma4-31b` | `mlx-community/gemma-4-31b-it-4bit` |
Leave downloads in the Hugging Face Hub cache by default — it's shared with `mlx_lm` and other HF-aware tools, avoiding duplicate copies of large weights. Use `--dest` only when you want an explicit copy outside the shared cache.
If you already have `mlx_lm` installed, its downloads land in the same cache and AX Engine can auto-discover them:
bash python -m mlx_lm.generate --model mlx-community/Qwen3-4B-4bit --prompt "x" --max-tokens 1 ax-engine-bench generate-manifest ~/.cache/huggingface/hub/models--mlx-community--Qwen3-4B-4bit/snapshots/<hash> ax-engine serve ~/.cache/huggingface/hub/models--mlx-community--Qwen3-4B-4bit/snapshots/<hash> --port 8080 ```
Direct support means AX has a repo-owned ax-engine-mlx graph for the model family and loads MLX safetensors through the AX manifest path. Other MLX text models can still use the explicit mlx_lm_delegated compatibility route.
| Family | Direct model IDs | Current scope | Architecture notes |
|---|---|---|---|
| Gemma 4 | gemma-4-e2b-it, gemma-4-e4b-it, gemma-4-12b-it, gemma-4-26b-a4b-it, gemma-4-31b-it | Repo-owned MLX runtime; MLX affine 4/5/6/8-bit weights; assistant-MTP benchmark path | Dense unified 12B, per-layer embedding, and MoE variants; sliding-window + full attention, logit softcapping |
| Qwen 3 | Qwen3-4B-4bit and manifest-backed dense checkpoints | Repo-owned MLX runtime | SwiGLU dense FFN; per-head QK norm |
| Qwen 3.5 | Qwen3.5-9B-MLX-4bit | Repo-owned MLX runtime | Linear attention + MoE FFN; attn_output_gate per-head interleaving |
| Qwen 3.6 / Coder Next | Qwen3.6-35B-A3B 4-bit, Qwen3.6-27B 4/5/6/8-bit, Qwen3-Coder-Next-4bit | Repo-owned MLX runtime | qwen3_next: GatedDelta linear attention, full attention with per-head sigmoid gate, sparse top-k MoE |
GLM 4.7 Flash (glm4_moe_lite) was demoted from direct support to themlx_lm_delegatedpassby route: native decode only reachesmlx_lmparity and the 4-bit export has no MTP head. Theglm4.7-flash-4bitpreset now selects the delegated tier and requires--mlx-lm-server-url. Seedocs/SUPPORTED-MODELS.md.
Adding a new architecture means implementing the model graph in ax-engine-mlx, not wiring up a generic loader. Architecture code alone is not a direct-support claim — a model requires a repo-owned graph, manifest, smoke coverage, and benchmark evidence before promotion here. LLaMA, Mistral, Mixtral, DeepSeek, and unlisted Gemma/Qwen variants should use the explicit delegated route.
Before promoting another architecture or checkpoint, run scripts/probe_mlx_model_support.py --model-dir <model-dir>; a model should report repo_owned_runtime_ready only when its manifest, local reference files, and runtime path are all present.
Full list: docs/SUPPORTED-MODELS.md.
高性能AI模型运行时,支持多种模式
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ Apache 2.0 — 宽松开源协议,可商用,需保留版权声明和 NOTICE 文件,含专利授权条款。
AI Skill Hub 点评:苹果芯片LLM运行时 的核心功能完整,质量优秀。对于AI 技术爱好者来说,这是一个值得纳入个人工具库的选择。建议先在非生产环境试用,再逐步推广。
| 原始名称 | ax-engine |
| Topics | ai-interfacegemma4generative-aiinference-enginellmrust |
| GitHub | https://github.com/defai-digital/ax-engine |
| License | Apache-2.0 |
| 语言 | Rust |
收录时间:2026-06-12 · 更新时间:2026-06-12 · License:Apache-2.0 · AI Skill Hub 不对第三方内容的准确性作法律背书。