经 AI Skill Hub 精选评估,Lattice 苹果芯片LLM工具箱 获评「强烈推荐」。这款AI工具在功能完整性、社区活跃度和易用性方面表现出色,AI 评分 8.2 分,适合有一定技术背景的用户使用。
一个专为Apple Silicon设计的纯Rust实现大模型工具,支持LLM的运行、量化与微调。其核心特色是完全脱离Python和CUDA依赖,提供极高性能的推理与嵌入能力,非常适合追求极致性能的Mac开发者和AI研究员。
Lattice 苹果芯片LLM工具箱 是一款基于 Rust 开发的开源工具,专注于 大模型推理、Apple Silicon、Rust语言 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
一个专为Apple Silicon设计的纯Rust实现大模型工具,支持LLM的运行、量化与微调。其核心特色是完全脱离Python和CUDA依赖,提供极高性能的推理与嵌入能力,非常适合追求极致性能的Mac开发者和AI研究员。
Lattice 苹果芯片LLM工具箱 是一款基于 Rust 开发的开源工具,专注于 大模型推理、Apple Silicon、Rust语言 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
# 方式一:cargo install(推荐) cargo install lattice # 方式二:从源码编译 git clone https://github.com/ohdearquant/lattice cd lattice cargo build --release # 二进制在 ./target/release/lattice
# 查看帮助 lattice --help # 基本运行 lattice [options] <input> # 详细使用说明请查阅文档 # https://github.com/ohdearquant/lattice
# lattice 配置说明 # 查看配置选项 lattice --config-example > config.yml # 常见配置项 # output_dir: ./output # log_level: info # workers: 4 # 环境变量(覆盖配置文件) export LATTICE_CONFIG="/path/to/config.yml"
Pure Rust inference engine for transformer models on Apple Silicon, with a native macOS app.
Quick start · Lattice Studio · Benchmarks · Roadmap
No ONNX. No Python. No CUDA. No external ML runtime. Lattice implements the full compute graph in Rust: weight loading, tokenization, forward pass, vector operations, quantization, and LoRA training. Hand-written Metal shaders accelerate inference on Apple Silicon. SIMD kernels (AVX2 on x86, NEON on ARM) handle the CPU path.
---
| Pure Rust compute | Hand-written SIMD kernels (AVX2/NEON). No C++, no ONNX, no CUDA. |
| Metal GPU backend | Native Apple Silicon acceleration via Metal MSL shaders. WGPU fallback for cross-platform. |
| Generation models | Qwen3.5-0.8B / 2B via lattice chat/serve. Qwen3.6-27B via lattice chat/serve from a native Q4 checkpoint (requires the Metal GPU build, --features "f16 metal-gpu"); safetensors 27B is loader-level only. Qwen3.6-35B-A3B (MoE): config + weight loader support. Hybrid GatedDeltaNet + GQA architecture. |
| Embedding models | 9 models: BGE, E5, MiniLM, Qwen3-Embedding families. Auto-download for 7 BERT-family variants. |
| Three tokenizers | WordPiece, SentencePiece, BPE. No Hugging Face tokenizers C extension. |
| Quantization | Q8, Q4, and QuaRot (rotation-based 4-bit). No other engine runs Q4 + LoRA hot-swap on Qwen3.5. |
| LoRA | Inference hook, hot-swap with no reload, PEFT safetensors format, training via lattice-tune. |
| HTTP API | OpenAI-compatible /v1/chat/completions via lattice serve. |
| Safetensors native | Memory-mapped weight loading. Single-file and sharded checkpoints. |
| KV cache | Incremental decoding with key-value caching. |
| Speculative decoding | Draft-model acceleration on the CPU path. |
| Grammar decoding | Constrained output via a pushdown automaton. OpenAI string-level stop sequences. |
| MRL support | Matryoshka truncation for Qwen3-Embedding models (output dimension >= 32). |
| LRU cache | CachedEmbeddingService with sharded in-memory cache and hit/miss stats. |
| Knowledge distillation | Train small models from Claude/GPT/Gemini teacher soft labels via lattice-tune. |
| Optimal transport | Sinkhorn-Knopp solver for embedding drift detection via lattice-transport. |
---
cargo bench -p lattice-inference --features metal-gpu,f16 -- metal_decode
Three ways to get lattice, in order of convenience:
1. cargo install (from crates.io):
```bash
```bash
./apps/macos/scripts/package-app.sh --skip-build
./apps/macos/scripts/package-app.sh --skip-cargo ```
Drag LatticeStudio.app from the .dmg to /Applications.
The app is ad-hoc signed. On first launch, right-click and choose "Open" to bypass Gatekeeper, then click "Open" in the dialog. macOS remembers the exception for subsequent launches. Alternatively:
xattr -dr com.apple.quarantine /Applications/LatticeStudio.app
[dependencies]
lattice-embed = "0.4"
tokio = { version = "1", features = ["full"] }
use lattice_embed::{EmbeddingService, EmbeddingModel, NativeEmbeddingService};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let service = NativeEmbeddingService::default();
// Single embedding (BGE-small-en-v1.5, 384 dimensions)
let embedding = service
.embed_one("The quick brown fox jumps over the lazy dog", EmbeddingModel::default())
.await?;
println!("Dimensions: {}", embedding.len()); // 384
// Batch
let texts = vec![
"First document".to_string(),
"Second document".to_string(),
];
let embeddings = service.embed(&texts, EmbeddingModel::BgeSmallEnV15).await?;
// SIMD-accelerated similarity
let similarity = lattice_embed::utils::cosine_similarity(&embeddings[0], &embeddings[1]);
println!("Similarity: {:.4}", similarity);
Ok(())
}
Model weights are downloaded from HuggingFace on first use and cached at ~/.lattice/models (or $LATTICE_MODEL_CACHE).
lattice serve exposes an OpenAI-compatible endpoint:
curl http://127.0.0.1:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-0.8b",
"messages": [{"role": "user", "content": "Hello"}],
"max_tokens": 128,
"temperature": 0.7
}'
---
./scripts/bench_context_scaling.sh
Performance depends on hardware, model size, batch size, and sequence length. Run benchmarks on your target hardware for representative numbers.
---
aiskill88点评:纯Rust实现极具竞争力,彻底解决了Mac端AI环境配置痛点,是高性能本地AI部署的潜力之选。
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ Apache 2.0 — 宽松开源协议,可商用,需保留版权声明和 NOTICE 文件,含专利授权条款。
AI Skill Hub 点评:Lattice 苹果芯片LLM工具箱 的核心功能完整,质量优秀。对于AI 技术爱好者来说,这是一个值得纳入个人工具库的选择。建议先在非生产环境试用,再逐步推广。
| 原始名称 | lattice |
| Topics | 大模型推理Apple SiliconRust语言 |
| GitHub | https://github.com/ohdearquant/lattice |
| License | Apache-2.0 |
| 语言 | Rust |
收录时间:2026-07-05 · 更新时间:2026-07-05 · License:Apache-2.0 · AI Skill Hub 不对第三方内容的准确性作法律背书。