PDF问答机器人 是 AI Skill Hub 本期精选AI工具之一。综合评分 7.5 分,整体质量较高。我们推荐使用将其纳入你的 AI 工具库,帮助提升工作效率。
PDF问答机器人 是一款基于 JavaScript 开发的开源工具,专注于 chatbot、huggingface、langchain 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
PDF问答机器人 是一款基于 JavaScript 开发的开源工具,专注于 chatbot、huggingface、langchain 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
# 方式一:npm 全局安装 npm install -g pdf-qa-bot # 方式二:npx 直接运行(无需安装) npx pdf-qa-bot --help # 方式三:项目依赖安装 npm install pdf-qa-bot # 方式四:从源码运行 git clone https://github.com/FireFistisDead/pdf-qa-bot cd pdf-qa-bot npm install npm start
# 命令行使用
pdf-qa-bot --help
# 基本用法
pdf-qa-bot [options] <input>
# Node.js 代码中使用
const pdf_qa_bot = require('pdf-qa-bot');
const result = await pdf_qa_bot.run(options);
console.log(result);
# pdf-qa-bot 配置说明 # 查看配置选项 pdf-qa-bot --config-example > config.yml # 常见配置项 # output_dir: ./output # log_level: info # workers: 4 # 环境变量(覆盖配置文件) export PDF_QA_BOT_CONFIG="/path/to/config.yml"
| Capability | Description |
|---|---|
| **PDF upload** | Multipart upload with server-side parsing, chunking, and vector indexing |
| **Question answering** | Semantic search over document chunks, then local HF model generation |
| **Reading Modes** | Choose between Standard, Tutor, Socratic, Simple, or Concise answering styles |
| **Summarization** | Bullet-style summaries from retrieved context |
| **Multi-document UI** | Upload and switch between multiple PDFs (frontend/) |
| **In-browser viewer** | Page-by-page PDF preview with react-pdf |
| **Chat export** | Export conversation history as CSV or plain text |
---
| Tool | Version | Purpose |
|---|---|---|
| **Node.js** | LTS (18+) recommended | Express gateway and React dev server |
| **npm** | Bundled with Node.js | JavaScript dependencies |
| **Python** | 3.10 or newer | RAG service |
| **pip** | Current | Python dependencies |
Optional but recommended:
---
| Error | Fix |
|---|---|
ModuleNotFoundError | Activate rag-service/venv and pip install -r requirements.txt |
torch install fails on Windows | Install Python 3.10–3.12 x64; use official pytorch.org wheel instructions if needed |
faiss-cpu errors | Ensure 64-bit Python; reinstall: pip install --force-reinstall faiss-cpu |
---
Install dependencies in all three locations. Use three separate terminal sessions when running locally.
uvicorn main:app --host 0.0.0.0 --port 5000 --reload
Alternative:
bash python main.py ```
You can run the entire multi-service application easily using Docker Compose. This ensures reproducibility and eliminates the need to install Python, Node.js, and dependencies manually on your host machine.
docker-compose up -d --build
3. The services will be available at: - Frontend UI: http://localhost:3000 - Express API Gateway: http://localhost:4000 - FastAPI RAG Service: http://localhost:5000
Note on Initial Startup: On the first run, the RAG service container will download the necessary Hugging Face models (~1GB total). These models are cached in a persistent Docker volume (pdf-qa-bot-hf-cache) so they will not be re-downloaded on subsequent restarts.
cp ../.env.example .env # macOS / Linux copy ..\.env.example .env # Windows (cmd) Copy-Item ..\.env.example .env # Windows (PowerShell) ```
Edit .env if you want a smaller or faster generation model (see Configuration).
Environment variables are read from rag-service/.env (create from .env.example at the repo root).
| Variable | Default | Description |
|---|---|---|
HF_GENERATION_MODEL | google/flan-t5-base | Hugging Face model ID for answer/summary generation |
OPENAI_API_KEY | *(empty)* | Reserved; not used by the current local HF pipeline |
HOST | 127.0.0.1 | Documented for optional deployment tuning |
PORT | 5000 | Documented RAG port (uvicorn CLI flag takes precedence in dev) |
INTERNAL_RAG_TOKEN | *(empty)* | Optional shared secret: when set, RAG endpoints require X-Internal-Token |
PDF_PARSE_TIMEOUT_SECONDS | 20 | Hard timeout for PDF parsing/extraction (mitigates DoS-grade PDFs) |
MAX_PDF_PAGES | 200 | Reject PDFs with too many pages |
MAX_PDF_EXTRACT_CHARS | 400000 | Cap extracted text before chunking |
Faster, lighter generation (recommended on CPU-only machines):
HF_GENERATION_MODEL=google/flan-t5-small
Optional frontend override — set before npm start in frontend/:
REACT_APP_API_URL=http://localhost:4000
Leave unset to use the CRA dev proxy (/upload → http://localhost:4000/upload).
---
python -c "from langchain_community.embeddings import HuggingFaceEmbeddings; HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')" python -c "from transformers import AutoTokenizer, AutoModelForSeq2SeqLM; AutoTokenizer.from_pretrained('google/flan-t5-base'); AutoModelForSeq2SeqLM.from_pretrained('google/flan-t5-base')"
Set a custom cache directory if needed:
bash
cd .. # repository root (parent of rag-service/)
npm install
Multer writes uploads to an uploads/ directory at runtime; it is created automatically on first upload.
```bash
Public-facing routes used by the React app. All paths are relative to the gateway origin.
| Method | Endpoint | Content-Type | Request body / form | Success response | Error responses |
|---|---|---|---|---|---|
POST | /upload | multipart/form-data | Field name **file** (PDF binary) | 200 — { "message": string, "session_id": string } | 400 — no file; 500 — RAG processing failed |
POST | /ask | application/json | { "question": string, "session_id": string } | 200 — { "answer": string } | 500 — upstream or internal error |
POST | /summarize | application/json | { "session_id": string, "pdf"?: string } | 200 — { "summary": string } | 500 — upstream or internal error |
Example — upload (curl)
curl -X POST http://localhost:4000/upload \
-F "file=@/path/to/document.pdf"
Example — ask
curl -X POST http://localhost:4000/ask \
-H "Content-Type: application/json" \
-d '{"question":"What is the main topic?","session_id":"<uuid-from-upload>"}'
---
Internal service called by Express. You can call it directly for debugging.
| Method | Endpoint | Request body (JSON) | Success response | Notes |
|---|---|---|---|---|
POST | /process-pdf | { "filePath": string } | { "message": string, "session_id": string } | Absolute or relative path to PDF on the machine running FastAPI |
POST | /ask | { "question": string, "session_id": string } | { "answer": string } | Returns a friendly message if session_id is unknown |
POST | /summarize | { "session_id": string, "pdf"?: string \| null } | { "summary": string } | pdf is accepted for API compatibility; indexing uses session_id only |
Interactive OpenAPI docs: http://localhost:5000/docs (recommended for local development only; do not expose publicly)
Example — process PDF (via gateway, recommended)
curl -X POST http://localhost:4000/upload \
-F "file=@/path/to/your.pdf"
---
| Check | Action |
|---|---|
| Express running? | curl http://localhost:4000 may fail (no GET routes) — test with upload or check Terminal 2 logs |
Using REACT_APP_API_URL? | Must include scheme: http://localhost:4000 |
| CORS errors in browser | Use npm start with default proxy, or ensure Express cors() remains enabled |
| Mixed content | Use http:// locally, not https://, unless you terminate TLS yourself |
---
Upload PDF documents, ask natural-language questions grounded in their content, and generate concise summaries — all through a local, three-service stack. The React UI talks to a Node.js API gateway, which orchestrates a Python RAG (retrieval-augmented generation) service powered by Hugging Face embeddings and a configurable text-generation model.
---
高质量的PDF问答应用
该工具未明确声明开源协议,商业使用前请联系原作者确认授权范围,避免侵权风险。
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
经综合评估,PDF问答机器人 在AI工具赛道中表现稳健,质量良好。如果你已有明确的使用需求,可以直接上手体验;如果还在评估阶段,建议对比同类工具后再做决策。
| 原始名称 | pdf-qa-bot |
| 原始描述 | 开源AI工具:A PDF Question-Answering App built with RAG (Retrieval-Augmented Generation), al。⭐11 · JavaScript |
| Topics | chatbothuggingfacelangchain |
| GitHub | https://github.com/FireFistisDead/pdf-qa-bot |
| 语言 | JavaScript |
收录时间:2026-05-26 · 更新时间:2026-05-30 · License:未公布 · AI Skill Hub 不对第三方内容的准确性作法律背书。