Taos本地AI记忆系统 是 AI Skill Hub 本期精选AI工具之一。综合评分 7.2 分,整体质量较高。我们推荐使用将其纳入你的 AI 工具库,帮助提升工作效率。
离线运行的本地优先AI记忆工具,支持8GB+内存的任何设备(树莓派、迷你PC等)。提供框架无关的嵌入式AI能力,适合边缘计算和隐私保护场景的开发者。
Taos本地AI记忆系统 是一款基于 Python 开发的开源工具,专注于 本地AI、离线运行、边缘计算 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
离线运行的本地优先AI记忆工具,支持8GB+内存的任何设备(树莓派、迷你PC等)。提供框架无关的嵌入式AI能力,适合边缘计算和隐私保护场景的开发者。
Taos本地AI记忆系统 是一款基于 Python 开发的开源工具,专注于 本地AI、离线运行、边缘计算 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
# 方式一:pip 安装(推荐)
pip install taosmd
# 方式二:虚拟环境安装(推荐生产环境)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install taosmd
# 方式三:从源码安装(获取最新功能)
git clone https://github.com/jaylfc/taosmd
cd taosmd
pip install -e .
# 验证安装
python -c "import taosmd; print('安装成功')"
# 命令行使用
taosmd --help
# 基本用法
taosmd input_file -o output_file
# Python 代码中调用
import taosmd
# 示例
result = taosmd.process("input")
print(result)
# taosmd 配置文件示例(config.yml) app: name: "taosmd" debug: false log_level: "INFO" # 运行时指定配置文件 taosmd --config config.yml # 或通过环境变量配置 export TAOSMD_API_KEY="your-key" export TAOSMD_OUTPUT_DIR="./output"
<p align="center"> <img src="logo.png" alt="taOSmd" width="300"> </p>
POST /ingest/batch dedupes on your content hashes, plus a BM25-only search mode for instant keyword lookupstaosmd serve ships a message bus with named channels, realtime SSE wake, and a poll cursor, so several agents on one project coordinate over the same server that holds their memory. Every message is an append-only archive event, so the whole conversation is auditable and replayable, not a separate mutable log.taosmd tasks gives multi-agent teams a ready queue (open tasks with no open blockers) and a prime briefing endpoint for session handoffs; every mutation is an append-only archive event, so the task tables are replayable history (concept credit: beads)valid_to invalidation) and the vector recall layer (matching chunks soft-hidden, not deleted); recall returns only the active factCore taOSmd (the 97.0% benchmark) is fully self-contained, it uses only standard packages (SQLite, numpy, ONNX Runtime) plus the MiniLM embedding model. No external servers or forked repos needed.
Optional integrations for the full taOS stack:
| Component | Source | Notes |
|---|---|---|
| QMD (embedding / reranking / query expansion) | [jaylfc/qmd](https://github.com/jaylfc/qmd) (fork, on npm as @jaylfc/qmd) | Tracks upstream [tobi/qmd](https://github.com/tobi/qmd) v2.5.3 and adds a pluggable model backend: qmd serve (HTTP model server) plus remote / Ollama-compatible backends (--server, --backend ollama) so embeddings, reranking and expansion can be served by an Ollama or NPU host. |
| rkllama (NPU model serving) | [NotPunchnox/rkllama](https://github.com/NotPunchnox/rkllama) | Upstream with minor patches for rerank endpoint |
| ONNX MiniLM | [onnx-models/all-MiniLM-L6-v2-onnx](https://huggingface.co/onnx-models/all-MiniLM-L6-v2-onnx) | Standard pre-exported model |
| Qwen3-4B RKLLM | [dulimov/Qwen3-4B-rk3588-1.2.1-base](https://huggingface.co/dulimov/Qwen3-4B-rk3588-1.2.1-base) | Community RK3588 conversion |
The cleanest way to install taOSmd is to ask your agent to do it. Paste this message into Claude Code, Cursor, your OpenClaw shell, whatever:
Please install taOSmd as my memory system. The repo is github.com/jaylfc/taosmd. 1. Read the README so you understand what you're installing. 2. Run the install script:curl -fsSL https://raw.githubusercontent.com/jaylfc/taosmd/master/scripts/setup.sh | bash. Report any errors and stop if it fails. 3. Register yourself as an agent so you have your own isolated index. Pick a stable agent name (lowercase, no spaces), the same name you'll use every time you call the librarian. If I have multiple agents in this framework, ask me what to name this one before registering. 4. Verify the install: calltaosmd.search("hello", agent="<your-name>"), it should return an empty result, not an error. 5. Append the "Memory, taOSmd" rules block to my agent file (CLAUDE.md / system prompt / AGENTS.md, whatever your framework reads every turn). Pull the block viapython -c "import taosmd; print(taosmd.agent_rules())", the file is shipped as package data attaosmd/docs/agent-rules.mdso this works from both editable and wheel installs. Replace<your-agent-name>with the name you registered as. 6. Confirm it's installed and tell me your agent name so I know how to refer to your memory. Don't summarise the repo or paraphrase the rules. Copy them verbatim, the wording is the contract.
The agent will pull the repo, run the install, register itself, append the per-turn rules block to its own instruction file, and verify everything works. After that, every turn it runs it'll check the librarian when it's uncertain, see taOSmd/docs/agent-rules.md for the rules block it installs (also available via taosmd.agent_rules()).
Multiple agents in one framework? Same install message works. The agent will ask you to name it before registering, so each agent gets its own shelf. The taOSmd service stays one process with one shared set of stores; per-agent isolation is enforced by an agent tag on every row, not by separate files. See docs/multi-agent.md for the full naming convention, project-scoped and cross-agent memory, migration scenarios, and a five-agent worked example.
Inside taOS? Don't use this. taOS provisions taOSmd automatically when you deploy an agent, and the rules block is baked into the agent template. This install path is for standalone framework users.
Install:pip install taosmd(add the MCP server withpip install "taosmd[mcp]"). For a source/dev install instead,git clonethenpip install -e .. The one-line bootstrap below additionally installs Ollama and downloads the embedding and LLM models; it is newer and still being validated across clean machines, so please report issues.
curl -fsSL https://raw.githubusercontent.com/jaylfc/taosmd/master/scripts/setup.sh | bash
This will: 1. Clone the repo and install Python dependencies 2. Download the all-MiniLM-L6-v2 ONNX embedding model (90MB) 3. Install Ollama and pull Qwen3-4B for fact extraction + answering (2.6GB) 4. On RK3588: download the NPU-optimised Qwen3-4B RKLLM model instead (4.6GB) 5. Create the data directory and run a self-test
```bash git clone https://github.com/jaylfc/taosmd.git cd taosmd pip install -e .
hf download dulimov/Qwen3-4B-rk3588-1.2.1-base \ Qwen3-4B-rk3588-w8a8-opt-1-hybrid-ratio-0.0.rkllm \ --local-dir ~/.rkllama/models/qwen3-4b-chat ```
Always install taOSmd into a virtual environment, and never with sudo into the system Python. A system-wide copy under /usr/lib/python3/dist-packages/taosmd (left by an earlier sudo pip install) will shadow a venv editable install, so import taosmd resolves to the stale system copy instead of your checkout. To check what is actually being imported and remove a stale system copy:
```bash
./scripts/install-server.sh
./scripts/install-client.sh http://pi.local:7900
This is the author's primary deployment and the exact stack the 97.0% benchmark was measured on. Other tiers (Pi 4B, Intel mini, Mac mini, GPU box) run the same code, they swap the runtime (Ollama instead of rkllama, CPU/GPU instead of NPU) but keep the same models and the same architecture.
| Component | Model | Purpose | Runtime |
|---|---|---|---|
| **Embedding** | all-MiniLM-L6-v2 (22M params) | Semantic vector search | ONNX Runtime on ARM CPU (0.3ms/embed) |
| **Embedding (alt)** | embeddinggemma-300M | Higher-quality 768-dim embeddings (vs MiniLM 384-dim) | qmd serve (llama.cpp, CPU) |
| **Reranker** | Qwen3-Reranker-0.6B | Result reranking | rkllama on RK3588 NPU |
| **Query Expansion** | qmd-query-expansion 1.7B | Search query enrichment | rkllama on RK3588 NPU |
| **LLM (extraction + answering)** | Qwen3-4B | Fact extraction (72% recall) + QA from context | rkllama on RK3588 NPU (17s/turn) |
| **Vector Store** | SQLite + numpy | Cosine similarity search | CPU |
| **Full-Text Search** | SQLite FTS5 | Keyword search over archive | CPU |
| **Knowledge Graph** | SQLite | Temporal entity-relationship triples | CPU |
Everything in this reference stack runs on the Pi itself; no external server needed for this tier. The Qwen3-4B handles both fact extraction and question answering on the NPU. The ONNX embedding model runs in-process on the CPU. An optional GPU worker (e.g. Fedora with RTX 3060) can accelerate LLM tasks ~10x but is not required, the Pi is fully self-contained.
hf download dulimov/Qwen3-4B-rk3588-1.2.1-base \ Qwen3-4B-rk3588-w8a8-opt-1-hybrid-ratio-0.0.rkllm \ --local-dir ~/.rkllama/models/qwen3-4b-chat ```
taosmd install-skill
Copies the bundled taosmd-a2a agent-setup skill into ~/.claude/skills/taosmd-a2a/ so it is available across all Claude Code projects. Pass --force to overwrite an existing installation.
```python from taosmd import KnowledgeGraph, VectorMemory, Archive
curl -fsSL https://ollama.com/install.sh | sh ollama pull qwen3:4b
sudo python3 -m pip uninstall -y taosmd # the SYSTEM python, outside any venv ```
Running taosmd serve as a systemd unit? Point the unit at your venv's interpreter (ExecStart=/path/to/.venv/bin/python -m taosmd serve ...); a PyPI or venv install needs no WorkingDirectory and no repo checkout to start.
for i, turn in enumerate(turns): await vmem.add(turn["text"], metadata={"position": i, "session": "conv1"})
hits = await retrieve( "what was discussed about the deploy?", sources={"vector": vmem}, adjacent_neighbors=2, # default 0, opt in for the lever position_key="position", group_key="session", # confine neighbours to the same session )
```
A recipe is a named, declared config bundle (retrieval + ingest + generator + librarian settings) that carries its own benchmark scores, target hardware tier, and pros/cons. Instead of leaving the retrieval levers at their defaults, taOSmd ships a small registry of recipes we have actually measured (for example the maxsim-rerank-9b leader for a 12 GB GPU and a lite-pi no-LLM-ingest profile for an Orange Pi / CPU), and a fresh install auto-detects your hardware and applies the best affordable recipe on first use, so you run a benchmarked configuration rather than unconfigured defaults. No taOS or network is required: the hardware probe is local, and the reranker model (when a recipe asks for one) downloads on first use with progress and degrades gracefully if it is not yet present.
```python import taosmd
taosmd config show # print server_url, whether a token is set, memory_model
taosmd config set-server --clear # revert to local mode
By default there is no authentication. If the server is on a trusted private network (Tailscale, a home LAN), the network boundary is the access control. For defence in depth, you can require a bearer token:
```bash
export TAOSMD_TOKEN=<your-secret-token> ```
The token is sent as Authorization: Bearer <token> on every request. GET /health and the web inspector (GET /) are always public so monitoring probes keep working. Never commit the token to version control.
Not required for any tier, the LLM runs locally on whatever you've got. A GPU worker accelerates LLM tasks ~10x if you want to offload from a smaller node:
```bash
The RemoteClient class (taosmd.remote) mirrors the same async interface as the local service module (taosmd.service). The CLI, Python API, and MCP server all check TAOSMD_SERVER_URL (and config.json) at startup and delegate to RemoteClient when a URL is configured. From the caller's perspective nothing changes: the same taosmd.ingest(), taosmd.search(), and A2A calls work in both modes.
taosmd serve starts a local HTTP/REST server (default 127.0.0.1:7900, stdlib only, no new dependencies). It is a thin JSON shell over the same service layer as the Python API and CLI, so behaviour is identical across surfaces. Every endpoint that takes an agent parameter forwards it to the service layer, honouring the same per-agent isolation as the Python API.
Security note: the server binds 127.0.0.1 by default, no auth is needed because only local processes can reach it. If you pass --host 0.0.0.0 to expose it on a LAN, there is no authentication; put it behind your own network controls.
| Method | Path | Request | Response |
|---|---|---|---|
GET | /health | (none) | {"status": "ok", "version": <str>} |
POST | /ingest | {"text": str, "agent": str, "project"?: str} | {"archived": int, "agent": str, "project": str\|null, "data_dir": str} |
POST | /ingest/batch | {"items": [{"text": str, "id"?: str, "metadata"?: obj}], "agent": str, "project"?: str} | {"ingested": int, "skipped": int, ...} |
POST | /search | {"query": str, "agent": str, "limit"?: int, "project"?: str, "also_include"?: [str], "mode"?: "bm25"} | {"hits": [...]} |
GET | /search | ?q=<query>&agent=<agent>&limit=<int>&project=<id>&also_include=a,b&mode=bm25 | {"hits": [...]} |
GET | /projects | (none) | {"projects": [{"project_id", "agents", "last_ingest"}]} |
GET | /shelves | ?project=<id> | {"shelves": [{"agent", "facts", "last_ingest"}]} |
GET | /pending | ?agent=<agent>&limit=<int> | {"pending": [...]} |
POST | /pending/resolve | {"id": str, "decision": "accept"\|"reject"\|"modify", "note"?: str} | {"ok": bool, "applied_kg": bool, "resolution": str} |
POST | /tasks | {"title": str, "body"?, "project"?, "assignee"?, "priority"?, "depends_on"?: [id]} | task object |
GET | /tasks | ?status=&project=&assignee=&limit= | {"tasks": [...]} |
GET | /tasks/ready | ?project=&assignee=&limit= | unblocked tasks, priority order |
GET | /tasks/prime | ?project=&assignee= | {"text": <briefing>, "tasks": [...]} |
POST | /tasks/{id} | {"status"?, "assignee"?, "priority"?, "body"?} | updated task |
POST | /tasks/{id}/edges (+ /edges/remove) | {"to_id": str, "type": "blocks"\|"parent"\|"relates"\|"duplicates"} | edge receipt |
Each hit in /search results has the agent-rules contract shape: {text, source, timestamp, confidence, metadata}.
/ingest/batch is the bulk-import path: each item can carry a stable id (your content hash), preserved as source_id and used to skip already-imported items, so the whole batch can be re-POSTed safely after a partial migration. mode=bm25 on /search skips query embedding entirely and returns keyword-ranked hits in about 10ms, built for search-as-you-type UIs over short-form memory; the default mode remains the full recipe-driven retrieval.
| Strategy | Judge accuracy | Delta |
|---|---|---|
| Raw cosine (same algorithm as MemPalace) | 95.0% | baseline |
| Additive keyword boost | 96.6% | +1.6 |
| **Hybrid + query expansion (default)** | **97.0%** | **+2.0** |
| All-turns hybrid (harder test) | 93.2% | -1.8 |
python benchmarks/longmemeval_runner.py
python benchmarks/locomo_runner.py --model gemma4:e2b
创新的本地AI记忆方案,离线优先设计符合隐私趋势。框架无关性强,适配边缘设备。但社区规模小,需验证稳定性。
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ MIT 协议 — 最宽松的开源协议之一,可自由商用、修改、分发,仅需保留版权声明。
经综合评估,Taos本地AI记忆系统 在AI工具赛道中表现稳健,质量良好。如果你已有明确的使用需求,可以直接上手体验;如果还在评估阶段,建议对比同类工具后再做决策。
| 原始名称 | taosmd |
| 原始描述 | 开源AI工具:Local-first AI memory — runs offline on any machine with 8 GB+ RAM (SBC, mini PC。⭐44 · Python |
| Topics | 本地AI离线运行边缘计算嵌入式隐私保护 |
| GitHub | https://github.com/jaylfc/taosmd |
| License | MIT |
| 语言 | Python |
收录时间:2026-06-08 · 更新时间:2026-06-11 · License:MIT · AI Skill Hub 不对第三方内容的准确性作法律背书。