能力标签

🔌 MCP 🤖 Agent 🔄 工作流 👁 OCR 🐳 Docker 💻 CLI 🔗 REST API 🧬 Embedding 📚 RAG 🧠 Claude

🔌

MCP工具

知识检索工具

Q: knowledge-rag 是什么工具？

knowledge-rag 是一款Python开发的AI辅助工具。开源MCP工具：Drop docs, search instantly from Claude Code — 12 MCP tools, 20 format parsers, 。⭐84 · Python 主要应用场景包括：快速搜索和管理文档。

Q: knowledge-rag 如何安装和开始使用？

访问 knowledge-rag 的 GitHub 仓库或官方网站，按照 README 文档中的步骤安装依赖并运行。通常需要 Python 3.8+ 或 Node.js 16+ 基础环境。

Q: knowledge-rag 是否免费？许可证是什么？

knowledge-rag 完全免费，采用 MIT 许可证开源发布，任何人都可以免费使用、修改和分发。

Q: knowledge-rag 适合哪些用户使用？

knowledge-rag 主要面向有一定技术基础的用户，包括开发者、数据分析师、AI 工程师等专业人士。

Q: knowledge-rag 的社区活跃度和项目维护状况如何？

knowledge-rag 在 GitHub 上已获得 84 个 Star，处于积极发展阶段，社区在持续扩大。

基于 Python · 让 AI 助手直接操作你的系统与工具

英文名：knowledge-rag

⭐ 84 Stars 🍴 14 Forks 💻 Python 📄 MIT 🏷 AI 8.0分

8.0AI 综合评分

mcpbm25chromadbclaudedocument-search

⬇ 下载源码 ZIP ⚙️ 配置说明 📺 TG 频道

✦ AI Skill Hub 推荐

知识检索工具是 AI Skill Hub 本期精选MCP工具之一。综合评分 8.0 分，整体质量较高。我们强烈推荐将其纳入你的 AI 工具库，帮助提升工作效率。

📚 深度解析

知识检索工具是一款基于 MCP（Model Context Protocol）标准协议的 AI 工具扩展。MCP 协议由 Anthropic 开发并开源，旨在建立 AI 模型与外部工具之间的标准化通信接口，目前已被 Claude Desktop、Claude Code、Cursor 等主流 AI 工具采纳。

通过安装知识检索工具，你的 AI 助手将获得额外的工具调用能力，可以用自然语言直接操控该工具的功能，无需学习复杂的命令行语法。MCP 工具的核心价值在于"一次配置，永久增强"——配置完成后，每次与 AI 对话时都可以无缝调用这些工具。

在技术实现上，MCP 工具通过标准的 JSON-RPC 协议与 AI 客户端通信，工具的功能以"工具列表"的形式暴露给 AI 模型，AI 可以按需调用。知识检索工具提供了结构化的工具调用接口，使 AI 模型能够精确地理解和使用每个功能点，显著降低 AI 在工具使用上的错误率。

与传统的 API 集成相比，MCP 工具的优势在于无需编写代码——用户只需在配置文件中添加几行 JSON，即可让 AI 获得全新能力。AI Skill Hub 将知识检索工具评为 AI 评分 8.0 分，属于同类工具中的优质选择。

📋 工具概览

知识检索工具是一款遵循 MCP（Model Context Protocol）标准协议的 AI 工具扩展。通过 MCP 协议，它可以让 Claude、Cursor 等主流 AI 客户端直接访问和操作外部工具、数据源和服务，实现 AI 能力的无缝扩展。无论是文件操作、数据库查询还是 API 调用，都可以通过自然语言在 AI 对话中直接触发，极大提升生产效率。

GitHub Stars

⭐ 84

开发语言

Python

支持平台

Windows / macOS / Linux

维护状态

轻量级项目，按需更新

开源协议

MIT

AI 综合评分

8.0 分

工具类型

MCP工具

Forks

📖 中文文档

以下内容由 AI Skill Hub 根据项目信息自动整理，如需查看完整原始文档请访问底部「原始来源」。

📌 核心特色

通过标准 MCP 协议与 Claude、Cursor 等主流 AI 客户端深度集成
提供结构化工具调用接口，显著降低 AI 集成复杂度
支持 Claude Desktop 和 Claude Code 无缝接入，开箱即用
可与其他 MCP 工具组合叠加，构建完整 AI 工作站
轻量无侵入设计，不影响现有系统架构

🎯 主要使用场景

在 Claude Desktop 对话中直接调用本地工具，实现 AI 与系统的深度联动
通过自然语言驱动复杂的多步骤自动化任务，代替繁琐手动操作
将多个 MCP 工具组合使用，构建个人专属 AI 工作站

以下安装命令基于项目开发语言和类型自动生成，实际以官方 README 为准。

安装命令

# 方式一：通过 Claude Code CLI 一键安装
claude skill install https://github.com/lyonzin/knowledge-rag

# 方式二：手动配置 claude_desktop_config.json
{
  "mcpServers": {
    "------": {
      "command": "npx",
      "args": ["-y", "knowledge-rag"]
    }
  }
}

# 配置文件位置
# macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
# Windows: %APPDATA%/Claude/claude_desktop_config.json

📋 安装步骤说明

确认已安装 Node.js（v18 或以上版本）
打开 Claude Desktop 或 Claude Code 的 MCP 配置文件
按「交给 Agent 安装 → Claude Desktop」标签中的 JSON 配置填入 mcpServers 字段
保存配置文件并重启 Claude 客户端
重启后，在对话中即可使用本工具

以下用法示例由 AI Skill Hub 整理，涵盖最常见的使用场景。

常用命令 / 代码示例

# 安装后在 Claude 对话中直接使用
# 示例：
用户: 请帮我用 知识检索工具 执行以下任务...
Claude: [自动调用 知识检索工具 MCP 工具处理请求]

# 查看可用工具列表
# 在 Claude 中输入："列出所有可用的 MCP 工具"

以下配置示例基于典型使用场景生成，具体参数请参照官方文档调整。

配置示例

// claude_desktop_config.json 配置示例
{
  "mcpServers": {
    "______": {
      "command": "npx",
      "args": ["-y", "knowledge-rag"],
      "env": {
        // "API_KEY": "your-api-key-here"
      }
    }
  }
}

// 保存后重启 Claude Desktop 生效

📑 README 深度解析真实文档完整度 82/100 含工作流图查看 GitHub 原文 →

以下内容由系统直接从 GitHub README 解析整理，保留代码块、表格与列表结构。

Knowledge RAG

System Overview

flowchart TB subgraph MCP["MCP SERVER (FastMCP)"] direction TB TOOLS["13 MCP Tools
search | get | add | update | remove
reindex | reindex_status | list | stats | url | similar | evaluate"] end subgraph SEARCH["HYBRID SEARCH ENGINE"] direction LR ROUTER["Keyword Router
(word boundaries)"] SEMANTIC["Semantic Search
(ChromaDB)"] BM25["BM25 Keyword
(inverted-index + expansion)"] RRF["Reciprocal Rank
Fusion (RRF)"] RERANK["Cross-Encoder
Reranker"] ROUTER --> SEMANTIC ROUTER --> BM25 SEMANTIC --> RRF BM25 --> RRF RRF --> RERANK end subgraph STORAGE["STORAGE LAYER"] direction LR CHROMA[("ChromaDB
Vector Database")] COLLECTIONS["Collections
security | ctf
logscale | development"] CHROMA --- COLLECTIONS end subgraph EMBED["EMBEDDINGS (In-Process)"] FASTEMBED["FastEmbed ONNX
BAAI/bge-small-en-v1.5
(384D, CPU or GPU)"] CROSSENC["Cross-Encoder
ms-marco-MiniLM-L-6-v2"] FASTEMBED --- CROSSENC end subgraph INGEST["DOCUMENT INGESTION"] PARSERS["20 Parsers
MD | PDF | TXT | PY | C | H | CPP | JS | JSX | TS | TSX | JSON | XML | CSV
DOCX | XLSX | PPTX | IPYNB | MQH | MQ4"] CHUNKER["Chunking
MD: section-aware
Other: 1000 chars + 200 overlap"] PARSERS --> CHUNKER end CLAUDE["Claude Code"] --> MCP MCP --> SEARCH SEARCH --> STORAGE STORAGE --> EMBED INGEST --> EMBED EMBED --> STORAGE

What's New in v4.2.0

Recent Highlights

v4.0.0 — Enterprise concurrent access: SSE/HTTP transport (1 server → N clients), thread-safe shared state, optional rate limiting + Prometheus metrics, ChromaDB WAL mode, --transport CLI
v3.9.0 — Quality Gate activated: 35+ automated PR checks across 7 pillars (Security, Stability, Memory Leak, Versatility, Scalability, Versioning, Quality) + nightly resilience suite (chaos, soak, determinism, mutation)
v3.8.1 — Critical hotfix: loud-fail embeddings (no more silent zero-vector corruption); Windows CI flake erradicated (HF_HUB_OFFLINE + shell:bash + atexit wrapper)
v3.8.0 — Lazy-load embeddings, opt-in single-instance guard, version sync across PyPI/NPM/Docker
v3.6.0 — Multi-language code parsing (C/C++/JS/TS/XML), NPM wrapper, Docker image, automated release pipeline
v3.5.2 — CUDA DLL auto-discovery from pip packages, graceful GPU→CPU fallback, explicit CPU provider (no CUDA noise when gpu: false), BASE_DIR resolution fix for editable installs
v3.5.1 — Remove Python <3.13 upper bound — 3.13 and 3.14 now supported
v3.5.0 — Optional GPU acceleration, supported formats table, full README rewrite
v3.4.3 — MCP stdout save/restore fix (v3.4.2 broke JSON-RPC responses)
v3.4.0 — Persistent model cache, exclude patterns, Jupyter Notebook parser, inotify resilience, MetaTrader support

See Changelog for full history.

---

Features

Feature	Description
Hybrid Search	Semantic + BM25 keyword search with Reciprocal Rank Fusion
Cross-Encoder Reranker	Xenova/ms-marco-MiniLM-L-6-v2 re-scores top candidates for precision
GPU Acceleration	Optional ONNX CUDA support for 5-10x faster indexing
YAML Configuration	Fully customizable via `config.yaml` with domain-specific presets
Query Expansion	Configurable synonym mappings (69 security-term defaults)
Markdown-Aware Chunking	`.md` files split by `##`/`###` sections instead of fixed windows
In-Process Embeddings	FastEmbed ONNX Runtime (BAAI/bge-small-en-v1.5, 384D)
Keyword Routing	Word-boundary aware routing for domain-specific queries
20 Format Parsers	MD, TXT, PDF, PY, C, H, CPP, JS, JSX, TS, TSX, JSON, XML, CSV, DOCX, XLSX, PPTX, IPYNB + opt-in MQH/MQ4
Category Organization	Organize docs by folder, auto-tagged by path
Incremental Indexing	Change detection via mtime/size — only re-indexes modified files
Chunk Deduplication	SHA256 content hashing prevents duplicate chunks
Query Cache	LRU cache with 5-min TTL for instant repeat queries
Document CRUD	Add, update, remove documents via MCP tools
URL Ingestion	Fetch URLs, strip HTML, convert to markdown, index
Similarity Search	Find documents similar to a reference document
Retrieval Evaluation	Built-in MRR@5 and Recall@5 metrics
File Watcher	Auto-reindex on document changes via watchdog (5s debounce)
Exclude Patterns	Glob-based file/directory exclusion during indexing
MMR Diversification	Maximal Marginal Relevance reduces redundant results
Persistent Model Cache	Embedding models cached in `models_cache/` — survives reboots
Auto-Migration	Detects embedding dimension mismatch and rebuilds automatically
13 MCP Tools	Full CRUD + search + evaluation via Claude Code

---

Server — enterprise features (new in v4.0.0)

server: transport: "stdio" # "stdio" | "sse" | "streamable-http" host: "127.0.0.1" # Bind address (SSE/HTTP only) port: 8179 # Bind port (SSE/HTTP only) auth: bearer_token: "" # Set a secret to enable auth (SSE/HTTP only) rate_limit: enabled: false requests_per_minute: 60 burst: 10 metrics: enabled: false port: 9179 # Separate port for Prometheus scraping ```

See config.example.yaml for the fully documented template with explanations for every field.

Prerequisites

Python 3.11+
Claude Code CLI
…or any other MCP client (Claude Desktop, Cursor, VS Code, Antigravity, opencode, Windsurf) — see Use with other MCP clients
~200MB disk for model cache (auto-downloaded on first run)
Optional: NVIDIA GPU + CUDA 12 for accelerated embeddings (see GPU Acceleration below)

1. Install GPU dependencies (onnxruntime-gpu + all CUDA 12 runtime DLLs)

pip install knowledge-rag[gpu]

5 Ways to Install

npx -y knowledge-rag                    # NPM — zero setup, auto-manages Python venv
pip install knowledge-rag               # PyPI — classic Python install
curl -fsSL .../install.sh | bash        # One-line installer (Linux/macOS/Windows)
docker pull ghcr.io/lyonzin/knowledge-rag  # Docker — models pre-downloaded
git clone ... && pip install -r ...     # From source

All methods produce the same MCP server. See Installation for full instructions.

Installation

Install Methods

Pick one — all produce the same running server.

Option A: NPX (fastest)

Requires Node.js 16+. Handles Python venv, pip install, and version upgrades automatically.

claude mcp add knowledge-rag -s user -- npx -y knowledge-rag

That's it. On first run, npx creates a venv at ~/.knowledge-rag/, installs the PyPI package, and starts the MCP server. Subsequent runs reuse the cached venv.

Option B: One-line installer

```bash

Smart reindex: detect changes + rebuild BM25

reindex_documents(force=True)

Nuclear rebuild: delete everything, re-embed all (use after model change)

reindex_documents(full_rebuild=True)

Or nuclear rebuild if model changed:

reindex_documents(full_rebuild=True)

```

Usage

Quick Start

```bash

Memory usage

With ~200 documents, expect ~300-500MB RAM. The embedding model (~200MB ONNX runtime resident, lazy-loaded on first query since v3.8.0) and reranker (~25MB, lazy-loaded) are loaded into memory only when actually used. For very large knowledge bases (1000+ documents), consider enabling GPU acceleration and using exclude patterns to limit index scope.

config.yaml

server: transport: "sse" # "stdio" | "sse" | "streamable-http" host: "127.0.0.1" port: 8179 ```

Or via CLI: knowledge-rag --transport sse

Optional enterprise features (all disabled by default): - Rate limiting: Sliding-window counter, configurable RPM and burst - Prometheus metrics: /metrics endpoint on separate port - Bearer auth: Token validation for SSE/HTTP connections

All 13 MCP tools are instrumented with @rate_limited and @instrument decorators — zero overhead when features are disabled. Default transport remains stdio for full backwards compatibility.

Migration: Existing users need zero changes. SSE mode is opt-in via server.transport: "sse" in config.yaml. See Configuration for details.

2. Enable in config.yaml

Configuration

Knowledge RAG is fully configurable via a config.yaml file in the project root. If no config.yaml exists, sensible defaults are used — the system works out of the box with zero configuration.

Option 1: Use a preset

cp presets/cybersecurity.yaml config.yaml # Offensive/defensive security, CTFs cp presets/developer.yaml config.yaml # Software engineering, APIs, DevOps cp presets/research.yaml config.yaml # Academic research, papers, studies cp presets/general.yaml config.yaml # Blank slate, pure semantic search

Option 2: Start from the documented template

cp config.example.yaml config.yaml

Edit config.yaml to your needs

```

Restart Claude Code after changing config.yaml.

config.yaml Structure

```yaml

Configuration Reference

Server

Field	Default	Description
`server.transport`	`"stdio"`	Transport protocol: `"stdio"`, `"sse"`, or `"streamable-http"`
`server.host`	`"127.0.0.1"`	Bind address for SSE/HTTP mode
`server.port`	`8179`	Bind port for SSE/HTTP mode
`server.auth.bearer_token`	`""` (disabled)	Bearer token for SSE/HTTP auth. Empty = no auth
`server.rate_limit.enabled`	`false`	Enable per-client rate limiting
`server.rate_limit.requests_per_minute`	`60`	Max requests per minute
`server.rate_limit.burst`	`10`	Burst allowance above steady rate
`server.metrics.enabled`	`false`	Enable Prometheus `/metrics` endpoint
`server.metrics.port`	`9179`	Port for metrics scraping

In stdio mode (default), server settings are ignored. SSE/HTTP mode auto-enables the single-instance lock.

Paths

Field	Default	Description
`paths.documents_dir`	`./documents`	Root folder scanned recursively for documents
`paths.data_dir`	`./data`	Internal storage for ChromaDB and index metadata
`paths.models_cache_dir`	`./models_cache`	Persistent cache for embedding models (~250MB). Survives reboots

Relative paths resolve from the project root. Absolute paths work too.

Documents

Field	Default	Description
`documents.supported_formats`	.md .txt .pdf .py .json .docx .xlsx .pptx .csv .ipynb	File extensions to index
`documents.exclude_patterns`	`[]` (empty)	Glob patterns for files/dirs to skip during indexing
`documents.chunking.chunk_size`	1000	Max characters per chunk
`documents.chunking.chunk_overlap`	200	Characters shared between consecutive chunks

Chunking guidelines: Short notes → 500/100. General use → 1000/200. Long technical docs → 1500/300.

For .md files, chunking splits at ## and ### header boundaries first. Sections larger than chunk_size are sub-chunked with overlap. Non-markdown files use fixed-size chunking.

Models

Field	Default	Description
`models.embedding.model`	`BAAI/bge-small-en-v1.5`	Embedding model (ONNX, runs locally)
`models.embedding.dimensions`	384	Vector dimensions (must match model)
`models.embedding.gpu`	false	Enable CUDA GPU acceleration. See [GPU Acceleration](#gpu-acceleration) for full setup
`models.reranker.enabled`	true	Enable cross-encoder reranking
`models.reranker.model`	`Xenova/ms-marco-MiniLM-L-6-v2`	Reranker model
`models.reranker.top_k_multiplier`	3	Fetch N*multiplier candidates for reranking

If the reranker model is not available locally and the machine cannot download it, search now falls back to the RRF order from hybrid semantic+BM25 retrieval. This keeps search_knowledge available offline, but result ordering may be less precise for ambiguous queries until the reranker model is cached.

Embedding model options (fastest → most accurate): - BAAI/bge-small-en-v1.5 — 384D, ~33MB (default) - BAAI/bge-base-en-v1.5 — 768D, ~130MB - BAAI/bge-large-en-v1.5 — 1024D, ~335MB - intfloat/multilingual-e5-small — 384D, 100+ languages

Warning: Changing the embedding model after indexing requires reindex_documents(full_rebuild=True).

Search

Field	Default	Description
`search.default_results`	5	Results returned when no limit specified
`search.max_results`	20	Hard cap even if client requests more
`search.collection_name`	`knowledge_base`	ChromaDB collection — change for separate KBs

Keyword Routing

Route queries to categories based on keywords. When a query contains listed keywords, results from that category are prioritized (not filtered — other categories still appear, ranked lower).

keyword_routes:
  redteam:
    - pentest
    - exploit
    - sqli

Single-word keywords use regex word boundaries (\b) — "api" won't match "RAPID". Multi-word keywords use substring matching.

Set keyword_routes: {} for pure semantic search.

Query Expansion

Expand search terms with synonyms before BM25 search. Supports single tokens, bigrams, and full query matches.

query_expansions:
  sqli:
    - sql injection
    - sqli
  k8s:
    - kubernetes
    - k8s

Set query_expansions: {} for no expansion.

query_expansions is directional: only the key on the left triggers the terms on the right. If you need mutual expansion without duplicating entries, use query_expansion_groups.

query_expansion_groups:
  - ["triple barrier", "tb", "trip_barr"]
  - ["profit factor", "pf"]

Each group is interpreted symmetrically, so every term expands to the rest of the group. The final internal expansion table is built by merging both sources:

query_expansions entries are loaded as-is.
query_expansion_groups adds reciprocal links for every term in each group.
Overlaps are merged by union with duplicate terms removed.

This keeps backward compatibility while allowing concise synonym groups.

API Reference

Models — AI models for search (all run locally, no API keys)

models: embedding: model: "BAAI/bge-small-en-v1.5" # ONNX, ~33MB, auto-downloaded dimensions: 384 gpu: false # Set true + pip install knowledge-rag[gpu] reranker: enabled: true # Falls back to RRF if model is unavailable model: "Xenova/ms-marco-MiniLM-L-6-v2" top_k_multiplier: 3 # Candidates fetched before reranking

Troubleshooting

🇨🇳 中文文档镜像 AI 翻译 2026-05-27

英文原文章节由系统翻译为中文摘要，便于快速理解。完整原文见上方 "📑 README 深度解析"。

📌 简介

Knowledge RAG 是一个基于 MCP (Model Context Protocol) 架构的高性能知识检索增强生成系统。它通过 FastMCP 构建了一个包含 12 种工具的 MCP Server，支持搜索、增删改查、索引重构及统计等功能。系统核心采用混合搜索引擎（Hybrid Search Engine），结合了关键词路由（Keyword Router）、语义搜索（Semantic Search/ChromaDB）以及 BM25 算法，并通过 RRF（Reciprocal Rank Fusion）技术实现多路召回结果的精准融合，为 AI Agent 提供可靠的本地知识库支撑。

⚡ 功能介绍

本项目在 v3.9.0 版本中引入了严格的 Quality Gate 质量门禁，涵盖安全、稳定性、内存泄漏等 7 大维度及混沌测试。核心功能包括：支持语义与 BM25 结合的 Hybrid Search 混合搜索；集成 Cross-Encoder Reranker（使用 Xenova/ms-marco-MiniLM-L-6-v2）进行高精度重排序；支持通过 ONNX CUDA 实现 GPU 加速，使索引速度提升 5-10 倍；并提供高度灵活的 YAML Configuration，允许用户针对特定领域进行深度定制。

📋 环境依赖

运行本项目需要 Python 3.11 或更高版本。用户需安装 Claude Code CLI，或者使用任何兼容 MCP 协议的客户端，如 Claude Desktop、Cursor、VS Code、Windsurf 等。此外，系统在首次运行时会自动下载模型缓存，请确保磁盘预留约 200MB 的空间。

🛠 安装步骤（Docker/pip/源码）

项目提供了五种灵活的安装方式，最终生成的 MCP server 效果一致：1. NPX（最快）：通过 `npx -y knowledge-rag` 实现零配置安装，自动管理 Python 虚拟环境；2. PyPI：使用经典的 `pip install knowledge-rag`；3. 一键脚本：适用于 Linux/macOS/Windows 的 shell 安装；4. Docker：通过 `docker pull` 获取预下载好模型的镜像；5. 源码安装：通过 `git clone` 并手动安装依赖。建议初学者优先使用 NPX 方式。

🚀 使用教程

项目支持快速启动，通过 MCP 客户端即可直接调用。在内存使用方面，处理约 200 份文档时，RAM 占用约为 300-500MB。为了优化性能，Embedding 模型和 Reranker 模型均采用延迟加载（Lazy-loading）机制，仅在实际查询时才载入内存。对于包含 1000 份以上文档的大型知识库，建议开启 GPU 加速并使用 exclude patterns 来限制索引范围。

⚙️ 配置说明（含 MCP / env）

Knowledge RAG 支持通过项目根目录下的 `config.yaml` 进行完全自定义配置。如果不存在该文件，系统将以默认配置“开箱即用”。用户可以根据需求选择预设模板（如 cybersecurity、developer、research 或 general 模式）进行快速初始化，也可以基于 `config.example.yaml` 模板自行编写特定领域的配置规则。

🔌 API 说明

本项目采用本地化 AI 模型方案，无需外部 API Key 即可运行。配置中定义了 embedding 模型（如 BAAI/bge-small-en-v1.5）和 reranker 模型（如 Xenova/ms-marco-MiniLM-L-6-v2）。用户可以通过配置参数控制模型是否启用 GPU 加速（`gpu: true`）以及 reranker 的检索倍率（`top_k_multiplier`），确保检索精度与性能的平衡。

❓ FAQ 摘要

针对常见问题，本项目已在 v3.8.1 版本中修复了 Windows CI 环境下的稳定性问题，并解决了 Embedding 向量可能出现的静默零向��损坏问题（Loud-fail 机制）。如果遇到模型下载或环境配置问题，请检查网络连接及 Python 虚拟环境状态。

🎯 aiskill88 AI 点评 A 级 2026-05-25

高质量的开源MCP工具，支持多种格式解析

📚 实用指南（长尾问题）

适合谁

需要让 Claude / Cursor 操作本地工具的 AI 工程师
构建多智能体协作系统的 Agent 开发者
构建企业知识库 / RAG 检索应用的团队
需要从图片、PDF 提取文字的文档自动化场景

最佳实践

配置 MCP 服务器时建议使用 stdio 传输 + JSON-RPC，避免暴露公网
生产部署优先使用 Docker Compose 隔离依赖，并挂载 volume 持久化数据
本地部署优先选 GGUF 量化模型，节省显存并保持响应速度
分块大小建议 256-512 tokens，向量库优选 pgvector 或 Qdrant
Agent 任务先做 dry-run 验证工具调用链，再开启自主执行

常见错误

API key 直接提交到 git 仓库（请用 .env 并加入 .gitignore）
MCP 配置路径拼错或权限不足，重启 Claude Desktop 才生效
容器内无法访问宿主机 localhost — 使用 host.docker.internal
embedding 模型与查询模型不一致导致检索失效
显存不足直接 OOM — 优先降低 context 或换更小的量化模型
Python 依赖冲突：建议用 venv / uv 隔离环境

部署方案

Docker：knowledge-rag 提供官方镜像，docker compose up 一键启动
CLI：直接 npm install -g / pip install，命令行调用
本地部署：CPU 8GB 起，GPU 推荐 16GB+ 显存
云端托管：可放在 Vercel / Railway / Fly.io 等 PaaS 平台

⚡ 核心功能

通过标准 MCP 协议与 Claude、Cursor 等主流 AI 客户端深度集成
提供结构化工具调用接口，显著降低 AI 集成复杂度
支持 Claude Desktop 和 Claude Code 无缝接入，开箱即用
可与其他 MCP 工具组合叠加，构建完整 AI 工作站
轻量无侵入设计，不影响现有系统架构

👥 适合谁

需要让 Claude / Cursor 操作本地工具的 AI 工程师
构建多智能体协作系统的 Agent 开发者
构建企业知识库 / RAG 检索应用的团队
需要从图片、PDF 提取文字的文档自动化场景

⭐ 最佳实践

配置 MCP 服务器时建议使用 stdio 传输 + JSON-RPC，避免暴露公网
生产部署优先使用 Docker Compose 隔离依赖，并挂载 volume 持久化数据
本地部署优先选 GGUF 量化模型，节省显存并保持响应速度
分块大小建议 256-512 tokens，向量库优选 pgvector 或 Qdrant

⚠️ 常见错误

API key 直接提交到 git 仓库（请用 .env 并加入 .gitignore）
MCP 配置路径拼错或权限不足，重启 Claude Desktop 才生效
容器内无法访问宿主机 localhost — 使用 host.docker.internal
embedding 模型与查询模型不一致导致检索失效

👥 适合人群

Claude Desktop / Claude Code 用户AI 工具开发者需要扩展 AI 能力的专业人士自动化工程师

🎯 使用场景

在 Claude Desktop 对话中直接调用本地工具，实现 AI 与系统的深度联动
通过自然语言驱动复杂的多步骤自动化任务，代替繁琐手动操作
将多个 MCP 工具组合使用，构建个人专属 AI 工作站

⚖️ 优点与不足

✅ 优点

+MIT 协议，可免费商用
+标准化 MCP 协议，生态互联性强
+与 Claude 官方生态无缝对接
+即插即用，配置简单快捷

⚠️ 不足

−依赖 Claude 客户端，非 Claude 用户无法使用
−MCP 协议仍在持续演进，接口可能变更
−需要一定的配置步骤

⚠️ 使用须知

AI Skill Hub 为第三方内容聚合平台，本页面信息基于公开数据整理，不对工具功能和质量作任何法律背书。

建议在沙箱或测试环境中充分验证后，再部署至生产环境，并做好必要的安全评估。

📄 License 说明

🔗 相关工具推荐

📚 相关教程推荐

Cursor AI 编程完全指南：Rules 配置、Composer 使用、MCP 集成

帮助中心 · AI Skill Hub

MCP 工作流生产级配置方案：从开发环境到团队共享

帮助中心 · AI Skill Hub

AI 资讯 · 知识关联

AI 前沿资讯：What ClickUps mass layoff tell…

AI 资讯 · 知识关联

AI 前沿资讯：The pope’s AI encyclical isn’t…

AI 资讯 · 知识关联

AI Agent 自主化能力最新进展

AI 资讯 · 知识关联

🍿 AI 圈相关吃瓜

配了5个 MCP 工具，Claude 一个都没用

AI 圈观察

Filesystem MCP 帮 Claude 找文件，找了整个 node_modules

AI 圈观察

Claude 回复了30页，我只问了"你好"

🗺️ 相关解决方案

ai-workflow-templates

ocr

document-ocr-pipeline

🧩 你可能还需要

基于当前 Skill 的能力图谱，自动补全的工具组合

技能寻求者

MCP · Agent · 工作流

Augustus

LLM安全测试框架，检测prompt注入、越狱等

natively-cluely-ai-assistant — Claude Skill 中文使用文档

免费开源的AI面试助手，实时转录，隐蔽模式，局部RAG，BYOK。无订阅，防止数据泄露。

AI公司多智能体操作系统

为Claude Code设计的开源MCP工具集，包含108个MCP工具和40+智能体模板。支持多智能体协作编排，提供完整的自主代理框架。适合

PgStudio

智能PostgreSQL工具，支持构建、探索和查询

CrewAI 多代理协作平台

MCP · Agent · 工作流

❓ 常见问题 FAQ

knowledge-rag 是什么工具？−

knowledge-rag 是一款Python开发的AI辅助工具。开源MCP工具：Drop docs, search instantly from Claude Code — 12 MCP tools, 20 format parsers, 。⭐84 · Python 主要应用场景包括：快速搜索和管理文档。

knowledge-rag 如何安装和开始使用？+

knowledge-rag 是否免费？许可证是什么？+

knowledge-rag 适合哪些用户使用？+

knowledge-rag 的社区活跃度和项目维护状况如何？+

MCP 是什么？和普通 API 有什么区别？+

我需要编程基础才能使用这个 MCP 工具吗？+

这个工具支持 Claude Code 吗？还是只有 Claude Desktop？+

💡 AI Skill Hub 点评

经综合评估，知识检索工具在MCP工具赛道中表现稳健，质量优秀。如果你已有明确的使用需求，可以直接上手体验；如果还在评估阶段，建议对比同类工具后再做决策。

⬇️ 获取与下载

⬇ 下载源码 ZIP

✅ MIT 协议 · 可免费商用 · 直接从 aiskill88 服务器下载，无需跳转 GitHub

📚 深入学习知识检索工具

查看分步骤安装教程和完整使用指南，快速上手这款工具

⚙️ 安装教程 📚 使用教程

🌐 原始信息

原始名称	`knowledge-rag`
原始描述	开源MCP工具：Drop docs, search instantly from Claude Code — 12 MCP tools, 20 format parsers, 。⭐84 · Python
Topics	`mcpbm25chromadbclaudedocument-search`
GitHub	https://github.com/lyonzin/knowledge-rag
License	MIT
语言	Python

🔗 原始来源

🐙 GitHub 仓库 https://github.com/lyonzin/knowledge-rag 🌐 官方网站 https://pypi.org/project/knowledge-rag/

收录时间：2026-05-25 · 更新时间：2026-05-30 · License：MIT · AI Skill Hub 不对第三方内容的准确性作法律背书。

📺 订阅 AI Skill Hub Daily Telegram 频道

每天 8 条精选 AI Skill、MCP、Agent 与自动化工具推送

加入频道 →

知识检索工具

📚 深度解析

📋 工具概览

📖 中文文档

Knowledge RAG

System Overview

What's New in v4.2.0

Recent Highlights

Features

Server — enterprise features (new in v4.0.0)

Prerequisites

1. Install GPU dependencies (onnxruntime-gpu + all CUDA 12 runtime DLLs)

5 Ways to Install

Installation

Install Methods

Option A: NPX (fastest)

Option B: One-line installer

Smart reindex: detect changes + rebuild BM25

Nuclear rebuild: delete everything, re-embed all (use after model change)

Or nuclear rebuild if model changed:

reindex_documents(full_rebuild=True)

Usage

Quick Start

Memory usage

config.yaml

2. Enable in config.yaml

Configuration

Option 1: Use a preset

Option 2: Start from the documented template

Edit config.yaml to your needs

config.yaml Structure

Configuration Reference

Server

Paths

Documents

Models

Search

Categories

Keyword Routing

Query Expansion

API Reference

Models — AI models for search (all run locally, no API keys)

Troubleshooting

⚡ 核心功能

👥 适合人群

🎯 使用场景

⚖️ 优点与不足

🔗 相关工具推荐

❓ 常见问题 FAQ

🤖 交给 Agent 安装 · 知识检索工具