经 AI Skill Hub 精选评估,Verba — RAG 知识库工具中文文档 获评「强烈推荐」。已获得 7.7k 颗 GitHub Star,这款AI工具在功能完整性、社区活跃度和易用性方面表现出色,AI 评分 9.1 分,适合有一定技术背景的用户使用。
Verba — RAG 知识库工具中文文档 是一款基于 Python 开发的开源工具,专注于 rag 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
Verba — RAG 知识库工具中文文档 是一款基于 Python 开发的开源工具,专注于 rag 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
# 方式一:pip 安装(推荐)
pip install verba
# 方式二:虚拟环境安装(推荐生产环境)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install verba
# 方式三:从源码安装(获取最新功能)
git clone https://github.com/weaviate/Verba
cd Verba
pip install -e .
# 验证安装
python -c "import verba; print('安装成功')"
# 命令行使用
verba --help
# 基本用法
verba input_file -o output_file
# Python 代码中调用
import verba
# 示例
result = verba.process("input")
print(result)
# verba 配置文件示例(config.yml) app: name: "verba" debug: false log_level: "INFO" # 运行时指定配置文件 verba --config config.yml # 或通过环境变量配置 export VERBA_API_KEY="your-key" export VERBA_OUTPUT_DIR="./output"
| 🤖 Model Support | Implemented | Description |
|---|---|---|
| Ollama (e.g. Llama3) | ✅ | Local Embedding and Generation Models powered by Ollama |
| HuggingFace (e.g. MiniLMEmbedder) | ✅ | Local Embedding Models powered by HuggingFace |
| Cohere (e.g. Command R+) | ✅ | Embedding and Generation Models by Cohere |
| Anthrophic (e.g. Claude Sonnet) | ✅ | Embedding and Generation Models by Anthrophic |
| OpenAI (e.g. GPT4) | ✅ | Embedding and Generation Models by OpenAI |
| Groq (e.g. Llama3) | ✅ | Generation Models by Groq (LPU inference) |
| Novita AI (e.g. Llama3.3) | ✅ | Generation Models by Novita AI |
| Upstage (e.g. Solar) | ✅ | Embedding and Generation Models by Upstage |
| 🤖 Embedding Support | Implemented | Description |
|---|---|---|
| Weaviate | ✅ | Embedding Models powered by Weaviate |
| Ollama | ✅ | Local Embedding Models powered by Ollama |
| SentenceTransformers | ✅ | Embedding Models powered by HuggingFace |
| Cohere | ✅ | Embedding Models by Cohere |
| VoyageAI | ✅ | Embedding Models by VoyageAI |
| OpenAI | ✅ | Embedding Models by OpenAI |
| Upstage | ✅ | Embedding Models by Upstage |
| 📁 Data Support | Implemented | Description |
|---|---|---|
| [UnstructuredIO](https://docs.unstructured.io/welcome) | ✅ | Import Data through Unstructured |
| [Firecrawl](https://www.firecrawl.dev/) | ✅ | Scrape and Crawl URL through Firecrawl |
| [UpstageDocumentParse](https://upstage.ai/) | ✅ | Parse Documents through Upstage Document AI |
| PDF Ingestion | ✅ | Import PDF into Verba |
| GitHub & GitLab | ✅ | Import Files from Github and GitLab |
| CSV/XLSX Ingestion | ✅ | Import Table Data into Verba |
| .DOCX | ✅ | Import .docx files |
| Multi-Modal (using [AssemblyAI](https://assemblyai.com)) | ✅ | Import and Transcribe Audio through AssemblyAI |
| ✨ RAG Features | Implemented | Description |
|---|---|---|
| Hybrid Search | ✅ | Semantic Search combined with Keyword Search |
| Autocomplete Suggestion | ✅ | Verba suggests autocompletion |
| Filtering | ✅ | Apply Filters (e.g. documents, document types etc.) before performing RAG |
| Customizable Metadata | ✅ | Free control over Metadata |
| Async Ingestion | ✅ | Ingest data asynchronously to speed up the process |
| Advanced Querying | planned ⏱️ | Task Delegation Based on LLM Evaluation |
| Reranking | planned ⏱️ | Rerank results based on context for improved results |
| RAG Evaluation | planned ⏱️ | Interface for Evaluating RAG pipelines |
| Agentic RAG | out of scope ❌ | Agentic RAG pipelines |
| Graph RAG | out of scope ❌ | Graph-based RAG pipelines |
| 🗡️ Chunking Techniques | Implemented | Description |
|---|---|---|
| Token | ✅ | Chunk by Token powered by [spaCy](https://spacy.io/) |
| Sentence | ✅ | Chunk by Sentence powered by [spaCy](https://spacy.io/) |
| Semantic | ✅ | Chunk and group by semantic sentence similarity |
| Recursive | ✅ | Recursively chunk data based on rules |
| HTML | ✅ | Chunk HTML files |
| Markdown | ✅ | Chunk Markdown files |
| Code | ✅ | Chunk Code files |
| JSON | ✅ | Chunk JSON files |
| 🆒 Cool Bonus | Implemented | Description |
|---|---|---|
| Docker Support | ✅ | Verba is deployable via Docker |
| Customizable Frontend | ✅ | Verba's frontend is fully-customizable via the frontend |
| Vector Viewer | ✅ | Visualize your data in 3D |
| Multi-User Collaboration | out of scope ❌ | Multi-User Collaboration in Verba |
| 🤝 RAG Libraries | Implemented | Description |
|---|---|---|
| LangChain | ✅ | Implement LangChain RAG pipelines |
| Haystack | planned ⏱️ | Implement Haystack RAG pipelines |
| LlamaIndex | planned ⏱️ | Implement LlamaIndex RAG pipelines |
Something is missing? Feel free to create a new issue or discussion with your idea!

---
You have three deployment options for Verba:
pip install goldenverba
git clone https://github.com/weaviate/Verba
pip install -e .
Prerequisites: If you're not using Docker, ensure that you have Python >=3.10.0,<3.13.0 installed on your system.
git clone https://github.com/weaviate/Verba
docker compose --env-file <your-env-file> up -d --build
If you're unfamiliar with Python and Virtual Environments, please read the python tutorial guidelines.
Python >=3.10.0
python3 -m virtualenv venv
source venv/bin/activate
pip install goldenverba
verba start
You can specify the --port and --host via flags
Visit localhost:8000
git clone https://github.com/weaviate/Verba.git
python3 -m virtualenv venv
source venv/bin/activate
pip install -e .
verba start
You can specify the --port and --host via flags
Visit localhost:8000
Docker is a set of platform-as-a-service products that use OS-level virtualization to deliver software in packages called containers. To get started with deploying Verba using Docker, follow the steps below. If you need more detailed instructions on Docker usage, check out the Docker Curriculum.
You can use docker pull semitechnologies/verba to pull the latest Verba Docker Image. Please note, that by pulling directly from Docker Hub you're only able to install the vanilla Verba version that does not include packages e.g HuggingFace. If you want to use Docker and HuggingFace please follow the steps below.
To build the image yourself, you can clone the Verba repository and run docker build -t verba . inside the Verba directory.
0. Clone the Verba repos Ensure you have Git installed on your system. Then, open a terminal or command prompt and run the following command to clone the Verba repository:
git clone https://github.com/weaviate/Verba.git
1. Set necessary environment variables Make sure to set your required environment variables in the .env file. You can read more about how to set them up in the API Keys Section
2. Adjust the docker-compose file You can use the docker-compose.yml to add required environment variables under the verba service and can also adjust the Weaviate Docker settings to enable Authentification or change other settings of your database instance. You can read more about the Weaviate configuration in our docker-compose documentation. You can also uncomment the ollama service to use Ollama within the same docker compose.
Please make sure to only add environment variables that you really need.
2. Deploy using Docker With Docker installed and the Verba repository cloned, navigate to the directory containing the Docker Compose file in your terminal or command prompt. Run the following command to start the Verba application in detached mode, which allows it to run in the background:
docker compose up -d
docker compose --env-file goldenverba/.env up -d --build
This command will download the necessary Docker images, create containers, and start Verba. Remember, Docker must be installed on your system to use this method. For installation instructions and more details about Docker, visit the official Docker documentation.
localhost:8080localhost:8000If you want your Docker Instance to install a specific version of Verba you can edit the Dockerfile and change the installation line.
RUN pip install -e '.'
The first screen you'll see is the deployment screen. Here you can select between Local, Docker, Weaviate Cloud, or Custom deployment. The Local deployment is using Weaviate Embedded under the hood, which initializes a Weaviate instance behind the scenes. The Docker deployment is using a separate Weaviate instance that is running inside the same Docker network. The Weaviate Cloud deployment is using a Weaviate instance that is hosted on Weaviate Cloud Services (WCS). The Custom deployment allows you to specify your own Weaviate instance URL, PORT, and API key.
You can skip this part by setting the DEFAULT_DEPLOYMENT environment variable to Local, Docker, Weaviate, or Custom.
You can set all API keys in the Verba frontend, but to make your life easier, we can also prepare a .env file in which Verba will automatically look for the keys. Create a .env in the same directory you want to start Verba in. You can find an .env.example file in the goldenverba directory.
Make sure to only set environment variables you intend to use, environment variables with missing or incorrect values may lead to errors.
Below is a comprehensive list of the API keys and variables you may require:
| Environment Variable | Value | Description | |
|---|---|---|---|
| WEAVIATE_URL_VERBA | URL to your hosted Weaviate Cluster | Connect to your [WCS](https://console.weaviate.cloud/) Cluster | |
| WEAVIATE_API_KEY_VERBA | API Credentials to your hosted Weaviate Cluster | Connect to your [WCS](https://console.weaviate.cloud/) Cluster | |
| ANTHROPIC_API_KEY | Your Anthropic API Key | Get Access to [Anthropic](https://www.anthropic.com/) Models | |
| OPENAI_API_KEY | Your OpenAI Key | Get Access to [OpenAI](https://openai.com/) Models | |
| OPENAI_EMBED_API_KEY | Your OpenAI Key | Use a different endpoint for embeddings | |
| OPENAI_BASE_URL | URL to OpenAI instance | Models | |
| OPENAI_EMBED_BASE_URL | URL to OpenAI instance | Use a different endpoint for embeddings | |
| OPENAI_MODEL | The name of the model to be used when selecting OpenAI as a Generator | Default: the first model in the list returned by the endpoint | |
| OPENAI_EMBED_MODEL | The name of the OpenAI embedding model to be used when selecting OpenAI as an Embedder | Default: text-embedding-3-small | |
| OPENAI_CUSTOM_EMBED | true \ | false | Allow Verba to recognize custom embedding model names (not only OpenAI ones) |
| COHERE_API_KEY | Your API Key | Get Access to [Cohere](https://cohere.com/) Models | |
| GROQ_API_KEY | Your Groq API Key | Get Access to [Groq](https://groq.com/) Models | |
| NOVITA_API_KEY | Your Novita API Key | Get Access to [Novita AI](https://novita.ai?utm_source=github_verba&utm_medium=github_readme&utm_campaign=github_link) Models | |
| OLLAMA_URL | URL to your Ollama instance (e.g. http://localhost:11434 ) | Get Access to [Ollama](https://ollama.com/) Models | |
| UNSTRUCTURED_API_KEY | Your API Key | Get Access to [Unstructured](https://docs.unstructured.io/welcome) Data Ingestion | |
| UNSTRUCTURED_API_URL | URL to Unstructured Instance | Get Access to [Unstructured](https://docs.unstructured.io/welcome) Data Ingestion | |
| ASSEMBLYAI_API_KEY | Your API Key | Get Access to [AssemblyAI](https://assemblyai.com) Data Ingestion | |
| GITHUB_TOKEN | Your GitHub Token | Get Access to Data Ingestion via GitHub | |
| GITLAB_TOKEN | Your GitLab Token | Get Access to Data Ingestion via GitLab | |
| FIRECRAWL_API_KEY | Your Firecrawl API Key | Get Access to Data Ingestion via Firecrawl | |
| VOYAGE_API_KEY | Your VoyageAI API Key | Get Access to Embedding Models via VoyageAI | |
| EMBEDDING_SERVICE_URL | URL to your Embedding Service Instance | Get Access to Embedding Models via [Weaviate Embedding Service](https://weaviate.io/developers/wcs/embeddings) | |
| EMBEDDING_SERVICE_KEY | Your Embedding Service Key | Get Access to Embedding Models via [Weaviate Embedding Service](https://weaviate.io/developers/wcs/embeddings) | |
| UPSTAGE_API_KEY | Your Upstage API Key | Get Access to [Upstage](https://upstage.ai/) Models | |
| UPSTAGE_BASE_URL | URL to Upstage instance | Models | |
| DEFAULT_DEPLOYMENT | Local, Weaviate, Custom, Docker | Set the default deployment mode | |
| SYSYEM_MESSAGE_PROMPT | Prompt text value | Default value starts with: "You are Verba, a chatbot for..." | |
| OLLAMA_MODEL | Your Ollama Model | Set the default Ollama model to use | |
| OLLAMA_EMBED_MODEL | Your Ollama Embedding Model | Set the default Ollama embedding model to use |

OLLAMA_URL=http://host.docker.internal:11434OLLAMA_URL="http://YOUR-IP-OF-OLLAMA:11434"~/.local/share/weaviateverba start --port 9000 --host 0.0.0.0.env file, this will allow Verba to start up and retrieve the models from your custom OpenAI Server. OPENAI_BASE_URL is set to https://api.openai.com/v1 by default.OPENAI_EMBED_API_KEY and OPENAI_EMBED_BASE_URL environment variables and setting OPENAI_CUSTOM_EMBED=true. For more details, see OpenAI Embeddings.AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ BSD 3-Clause — 宽松协议,可商用修改分发,禁止使用原作者名称进行背书宣传。
AI Skill Hub 点评:Verba — RAG 知识库工具中文文档 的核心功能完整,质量优秀。对于AI 技术爱好者来说,这是一个值得纳入个人工具库的选择。建议先在非生产环境试用,再逐步推广。
| 原始名称 | Verba |
| 原始描述 | Retrieval Augmented Generation (RAG) chatbot powered by Weaviate |
| Topics | rag |
| GitHub | https://github.com/weaviate/Verba |
| License | BSD-3-Clause |
| 语言 | Python |
收录时间:2026-05-22 · 更新时间:2026-05-22 · License:BSD-3-Clause · AI Skill Hub 不对第三方内容的准确性作法律背书。