RAG开源工具 是 AI Skill Hub 本期精选AI工具之一。综合评分 7.5 分,整体质量较高。我们推荐使用将其纳入你的 AI 工具库,帮助提升工作效率。
基于Python的RAG系统,支持多模块和多语言,开源AI工具,值得关注。
RAG开源工具 是一款基于 Python 开发的开源工具,专注于 RAG、Gemma3、Python 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
基于Python的RAG系统,支持多模块和多语言,开源AI工具,值得关注。
RAG开源工具 是一款基于 Python 开发的开源工具,专注于 RAG、Gemma3、Python 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
# 方式一:pip 安装(推荐)
pip install rag-with-gemma3
# 方式二:虚拟环境安装(推荐生产环境)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install rag-with-gemma3
# 方式三:从源码安装(获取最新功能)
git clone https://github.com/Bbs1412/rag-with-gemma3
cd rag-with-gemma3
pip install -e .
# 验证安装
python -c "import rag_with_gemma3; print('安装成功')"
# 命令行使用
rag-with-gemma3 --help
# 基本用法
rag-with-gemma3 input_file -o output_file
# Python 代码中调用
import rag_with_gemma3
# 示例
result = rag_with_gemma3.process("input")
print(result)
# rag-with-gemma3 配置文件示例(config.yml) app: name: "rag-with-gemma3" debug: false log_level: "INFO" # 运行时指定配置文件 rag-with-gemma3 --config config.yml # 或通过环境变量配置 export RAG_WITH_GEMMA3_API_KEY="your-key" export RAG_WITH_GEMMA3_OUTPUT_DIR="./output"
This project is a modular Retrieval-Augmented Generation (RAG) system built with Google DeepMind's - Gemma 3 served locally using Ollama. It allows users to upload documents (PDF, TXT, Markdown etc.), and then chat with the content using natural language queries - all processed through a local setup for privacy and full control.
Designed with modularity and performance in mind, the system handles end-to-end workflows including file ingestion, vector embedding, history summarization, document retrieval, context-aware response generation, and streaming replies to a frontend. It supports multi-file embeddings per user, persistent session history and document storage, and offers live document previews - making it complete end-to-end RAG pipeline useful for educational, or personal assistants.
- User Authentication: + Authenticate users using SQLite-3 database and bcrypt based password hashing and salt.
+ Store user data securely and also auto clear stale sessions data.
- UI and User Controls: + Build a responsive UI using Streamlit app. Provide a chat interface for users to ask questions about their documents, get file previews and receive context-aware responses.
+ User uploaded files and corresponding data are tracked in a SQLite-3 database. + Allow users to delete their uploaded documents, and manage their session history.
+ Note: Files previews are cached for 10 minutes, so even after deletion, the file preview might be available for that duration. + Also works with FastAPI SSE to show real-time responses from the LLM and retrieved documents and metadata for verification.
+ UI supports thinking models also to show the LLM's thought process while generating responses. 
LangChain components to convert documents into vector representations.mxbai-embed-large model is used for generating embeddings, which is a lightweight and efficient embedding model.FAISS for efficient vector storage and retrieval of user-specific + public documents.- FastAPI Backend: + Build a FastAPI backend to handle file uploads, document processing, user authentication, and streaming LLM responses. + Integrate with 'LLM System' module to handle LLM tasks. + Provide status updates to UI for long running tasks:
+ Implement Server-Sent Events (SSE) for real-time streaming of LLM responses to the frontend with NDJSON format for data transfer.
+ Provide UI with retrieved documents and metadata for verification of responses.
- LLM System: + Modular LLM System using LangChain components for: 1. Document Ingestion: Load files and process them into document chunks. 1. Vector Embedding: Convert documents into vector representations. 1. History Summarization: Summarize user session history for querying vector embeddings and retrieving relevant documents. 1. Document Retrieval: Fetch relevant documents based on standalone query and user's metadata filters. 1. History Management: Maintain session history for context-aware interactions. 1. Response Generation: Generate context-aware responses using the LLM. 1. Tracing: Enable tracing of LLM interactions using LangSmith for debugging and monitoring LLM interactions. 1. Models: Use Ollama to run the Gemma-3 LLM and mxbai embeddings locally for inference, ensuring low latency and privacy.
Docker setup for easy deployment as either a development or deployment environment.Dockerfile to manage both FastAPI and Streamlit server in a single container (mainly due to Hugging Face Spaces limitations).There are two ways to run this project - either directly using a Virtual Environment or using Dockerfile.
Dockerfile is coded dynamically to support both development and deployment environments.
There are two Docker methods in this repo (single-container vs multi-container), and that is why you see four Docker-related files.
1. Single-container (Hugging Face / simple local run) - Dockerfile: Builds one container that runs FastAPI + Streamlit together via entrypoint.sh. - Use this when: + You are deploying on Hugging Face Spaces (no Compose support), or + You want a single container locally with ENV_TYPE=dev or ENV_TYPE=deploy. - Uses root requirements.txt for a full install.
1. Multi-container (Local Docker Compose) - Dockerfile.fastapi: Backend-only container, tuned for local dev (Ollama on host). - Dockerfile.streamlit: Frontend-only container, Streamlit UI. - docker-compose.yml: Wires them together for local dev. - Uses docker/requirements_be.txt and docker/requirements_fe.txt to avoid bloat. - Update .streamlit/secrets.toml when using Compose: + Set server.ip_address = "http://fastapi:8000" (container-to-container URL, not localhost).
TL;DR: UseDockerfilefor HF (single container). Usedocker-compose.ymlfor local dev (two containers).
1. Clone the repository:
git clone --depth 1 https://github.com/Bbs1412/rag-with-gemma3.git
1. Create virtual environment and install dependencies:
# Create environment:
python -m venv venv
# Activate environment:
source venv/bin/activate # On Linux/Mac
# or
venv\Scripts\activate # On Windows
# Install dependencies:
pip install -r requirements.txt
1. (Optional) If you want to use LangSmith tracing, create a .env file in the server directory and add these credentials:
# ./server/.env
LANGCHAIN_TRACING_V2=true
LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"
LANGCHAIN_API_KEY="<paste_your_api_key_here>"
LANGCHAIN_PROJECT="rag-with-gemma3"
1. Start the FastAPI server:
cd server
uvicorn server:app
# For development with hot-reloading
# uvicorn server:app --reload --port 8000
1. Start the Streamlit server:
cd ..
streamlit run app.py
1. You can now access these servers: - FastAPI backend at http://localhost:8000 - Streamlit frontend at http://localhost:8501 - FastAPI Swagger UI at http://localhost:8000/docs
1. Development: - Project uses http://host.docker.internal:11434 as the Ollama server for local inference. - This is to ensure that existing Ollama models in the host machine are accessible from the docker container. - In this env, all three ports {8000:FastAPI, 8501:Streamlit, 11434:Ollama} are exposed for easy access. 1. Deployment: - Project uses Google Gemini-2.0-Flash-Lite as the LLM and text-embedding-004 as embedding model. - Primarily due to deployment and API limitations of Gemma3 model. - In this env, only port 7860 is exposed for the Streamlit frontend.
#### Development: 1. Build the Docker image:
docker build -t BBS/rag-with-gemma3:dev --build-arg ENV_TYPE=dev .
1. Create a Docker container:
docker create --name rag-gemma-cont-dev \
-e ENV_TYPE=dev \
# Below 4 are optional env-vars for LangSmith tracing \
-e LANGCHAIN_TRACING_V2=true \
-e LANGCHAIN_ENDPOINT="https://api.smith.langchain.com" \
-e LANGCHAIN_API_KEY="<paste_your_api_key_here>" \
-e LANGCHAIN_PROJECT="rag-with-gemma3" \
# Port mapping for FastAPI, Streamlit, and Ollama \
-p 8000:8000 -p 8501:8501 -p 11434:11434 \
BBS/rag-with-gemma3:dev
1. Start the Docker container:
docker start -a rag-gemma-cont-dev
1. You can now access these servers: - FastAPI backend at http://localhost:8000 - Streamlit frontend at http://localhost:8501
#### Deployment: 1. Build the Docker image:
docker build -t BBS/rag-with-gemma3:prod --build-arg ENV_TYPE=deploy .
1. Create a Docker container:
docker create --name rag-gemma-cont-prod \
-e ENV_TYPE=deploy \
# Below 4 are optional env-vars for LangSmith tracing \
-e LANGCHAIN_TRACING_V2=true \
-e LANGCHAIN_ENDPOINT="https://api.smith.langchain.com" \
-e LANGCHAIN_API_KEY="<paste_your_api_key_here>" \
-e LANGCHAIN_PROJECT=deployed-rag-gemma3 \
# This is necessary env variable \
-e GOOGLE_API_KEY="<paste_your_google_api_key_here>" \
# Port mapping, only 7860 is exposed \
-p 7860:7860 \
BBS/rag-with-gemma3:prod
1. Start the Docker container:
docker start -a rag-gemma-cont-prod
- This ensures that relative imports work correctly in the project.
cd server
python -m llm_system.utils.loader
该工具基于Python,支持多模块和多语言,开源AI工具,值得关注,但需要进一步优化和完善。
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
⚠️ GPL 3.0 — 强 Copyleft,衍生作品须开源,含专利保护条款,不可闭源使用。
经综合评估,RAG开源工具 在AI工具赛道中表现稳健,质量良好。如果你已有明确的使用需求,可以直接上手体验;如果还在评估阶段,建议对比同类工具后再做决策。
| 原始名称 | rag-with-gemma3 |
| 原始描述 | 开源AI工具:This project is a modular Retrieval-Augmented Generation (RAG) system built with。⭐12 · Python |
| Topics | RAGGemma3Python |
| GitHub | https://github.com/Bbs1412/rag-with-gemma3 |
| License | GPL-3.0 |
| 语言 | Python |
收录时间:2026-05-23 · 更新时间:2026-05-23 · License:GPL-3.0 · AI Skill Hub 不对第三方内容的准确性作法律背书。