AI Skill Hub 推荐使用:爬虫lama 是一款优质的Agent工作流。AI 综合评分 7.5 分,在同类工具中表现稳健。如果你正在寻找可靠的Agent工作流解决方案,这是一个值得深入了解的选择。
爬虫lama 是一套完整的 AI Agent 自动化工作流方案。通过可视化的节点编排,将复杂的多步骤任务拆解为清晰的自动化流程,实现全程无人值守的智能处理。支持与数百种外部服务和 API 无缝集成,适合构建数据处理管线、业务自动化和 AI 辅助决策系统。
爬虫lama 是一套完整的 AI Agent 自动化工作流方案。通过可视化的节点编排,将复杂的多步骤任务拆解为清晰的自动化流程,实现全程无人值守的智能处理。支持与数百种外部服务和 API 无缝集成,适合构建数据处理管线、业务自动化和 AI 辅助决策系统。
# 方式一:pip 安装(推荐)
pip install crawllama
# 方式二:虚拟环境安装(推荐生产环境)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install crawllama
# 方式三:从源码安装(获取最新功能)
git clone https://github.com/arn-c0de/Crawllama
cd Crawllama
pip install -e .
# 验证安装
python -c "import crawllama; print('安装成功')"
# 命令行使用
crawllama --help
# 基本用法
crawllama input_file -o output_file
# Python 代码中调用
import crawllama
# 示例
result = crawllama.process("input")
print(result)
# crawllama 配置文件示例(config.yml) app: name: "crawllama" debug: false log_level: "INFO" # 运行时指定配置文件 crawllama --config config.yml # 或通过环境变量配置 export CRAWLLAMA_API_KEY="your-key" export CRAWLLAMA_OUTPUT_DIR="./output"
Documentation | Quickstart | API Guide | Adaptive Hops | Security | Changelog
AI Research Agent with OSINT and Multi-Hop Reasoning Current Version: 1.4.10 (Improvements & Fixes) New in 1.4.8: Company Intelligence Developer Documentation
A locally-capable AI research agent with advanced intelligence features (local processing requires a local LLM backend such as Ollama; cloud-based LLM APIs like Claude or OpenAI are not local): - OSINT module: email, phone, and IP intelligence; social media analysis; advanced search operators - Multi-hop reasoning using LangGraph for complex queries - Adaptive agent selection based on query complexity (low/mid/high) - REST API with FastAPI for integration - Plugin system for extensibility - Performance optimizations with large context support and asynchronous execution
/query, /plugins, /stats, /health) (see app.py)setup.bat, setup.sh with auto-configurationsite:, inurl:, intext:, filetype:, email:, phone:, ip:clear; stores emails, phones, IPs, usernames, domains and notesforget commandpip install -r requirements.txt
pip install -r requirements.txt
pip install -r requirements.txt
pip install -r requirements.txt
Important — Keep dependencies up to date: If you are using a pre-built / downloaded package release (zip/tar), always install dependencies from the latestrequirements.txtin this repository rather than any bundled copy. Package versions are updated regularly to address security vulnerabilities and compatibility issues.Always fetch the current file from the repository before installing to ensure you have the latest, secure package versions.> pip install -r requirements.txt >
Windows: 1. Download Crawllama 2. Extract to any folder (e.g., C:\Crawllama) 3. Install Ollama from ollama.ai/download 4. Start Ollama and load model:
ollama serve
ollama pull qwen3:4b
5. In the Crawllama folder: setup.bat
run.bat
Linux/macOS: 1. Download and extract:
wget https://github.com/arn-c0de/Crawllama/archive/refs/tags/v1.4.7-preview.zip
unzip v1.4.7-preview.zip
cd Crawllama-v1.4.7-preview
2. Install Ollama: curl -fsSL https://ollama.ai/install.sh | sh
ollama serve &
ollama pull qwen3:4b
3. Setup and start: chmod +x setup.sh run.sh
./setup.sh
./run.sh
---
Windows:
setup.bat
Linux/macOS:
chmod +x setup.sh
./setup.sh
Note: After the initial setup, you must select at least one LLM model during setup. If a model is already installed, you can skip this step—otherwise, selection is required to avoid errors in the test program.
The setup script: - Checks Python version (3.10+) - Creates virtual environment - Lets you select features and LLM models to install (core is always installed) - Installs all selected dependencies - Creates necessary directories - Copies .env.example to .env - Checks Ollama status
Note for initial installation:
When running pip install -r requirements.txt for the first time within the newly created virtual environment, installing all dependencies—especially packages like torch, sentence-transformers, and scientific libraries—may take 5–10 minutes (or longer, depending on connection and hardware). Please wait until the process completes; afterward, the virtual environment is ready for use.
Note on disk space: After installation (including venv), the project typically requires about 1.2–1.5 GB of free disk space (v1.4: ~1.23 GB). This value may vary significantly depending on the operating system, Python packages (e.g., larger PyTorch/CUDA wheels), and additional models. Plan for ample additional space if storage is limited.
Model download sizes (approximate):
qwen3:4b — ~2–4 GB (depending on format/quantization)qwen3:8b — ~8–12 GBdeepseek-r1:8b — ~6–10 GBllama3:7b — ~6–9 GBmistral:7b — ~4–8 GBphi3:14b — ~12–20+ GBNote: Model sizes vary significantly depending on the provider, format (FP16, INT8 quantization, etc.), and additional assets. Quantized models (e.g., INT8) can significantly reduce size, while FP32/FP16 or models with additional tokenizer/vocab files require more space. Plan for sufficient additional storage if using larger models or multiple models simultaneously.
Prerequisites: - Python 3.10+ (python.org) - Git (git-scm.com) - Ollama (ollama.ai/download)
Windows - Step by Step:
```cmd
curl -fsSL https://ollama.ai/install.sh | sh ollama serve &
```bash
```bash
curl -fsSL https://ollama.ai/install.sh | sh # Linux/macOS
run.bat # Windows ./run.sh # Linux/macOS
╭──────────────────────────────────────────────────────────────╮ │ CrawlLama - Local Search and Response Agent │ │ Commands: │ │ clear - Reset session (history + cache) │ │ clear-cache - Clear cache only │ │ save - Manually save session │ │ load - Reload session │ │ stats - Display statistics │ │ status - Show context usage │ │ settings - Show/edit settings │ │ restart - Restart agent (reload config) │ │ exit, quit - Exit │ ╰──────────────────────────────────────────────────────────────╯
What is Machine Learning?
**New Commands:**
- `status` - Shows token usage and available context capacity
status
Context Usage Tracker ┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┓ ┃ Source ┃ Tokens ┃ Share ┃ ┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━┩ │ Conversation │ 850 │ 8.5% │ │ Search Results │ 320 │ 3.2% │ │ Total Used │ 1,170 │ 11.7% │ │ Available │ 8,830 │ 88.3% │ │ Maximum │ 10,000 │ 100% │ └───────────────────┴───────────┴───────────┘
- `settings` - Interactive configuration editor
settings
Displays all settings and allows: • Category selection (llm, search, rag, cache, osint, all) • Change LLM model (qwen3:8b, deepseek-r1:8b, etc.) • Adjust temperature (0.0-1.0) • Configure max tokens (now 16,000 for RTX 3080+) • Change search region (de-de, us-en, wt-wt) • Configure OSINT max results & rate limits • Enable/disable RAG • Enable/disable cache • Save changes directly to config.json • Auto-restart after saving (optional)
- `restart` - Restart agent
restart
• Reloads config.json • Fully reinitializes agent • Optional session preservation • No session interruption ```
./setup.sh # or setup.bat ```
Note: The first start may take significantly longer than subsequent starts! Initialization, dependency installation, and model downloads may take several minutes, depending on hardware and internet connection. After the first successful start, all subsequent starts are significantly faster.
1. Start API Server
run_api.bat
2. Open API Documentation - Interactive Docs: http://localhost:8000/docs - ReDoc: http://localhost:8000/redoc
3. Send Query
curl -X POST http://localhost:8000/query \
-H "X-API-Key: your-key" \
-H "Content-Type: application/json" \
-d '{"query": "What is Python?", "use_tools": false}'
python -m venv venv venv\Scripts\activate
copy .env.example .env notepad .env # Optional: Add API keys
python3 -m venv venv source venv/bin/activate
cp .env.example .env nano .env # Optional: Add API keys
python -m venv venv source venv/bin/activate # Linux/macOS venv\Scripts\activate # Windows
cp .env.example .env ```
CRAWLLAMA_API_KEY=your_secure_api_key_min_32_chars
RATE_LIMIT=100
ALLOWED_ORIGINS=http://localhost:3000,http://localhost:8080
**Usage with API Key:**bash
| Option | Description |
|---|---|
--interactive | Interactive mode |
--debug | Enable debug logging |
--no-web | Offline mode (no web search) |
--model MODEL | Choose Ollama model |
--stats | Display system statistics |
--clear-cache | Clear cache |
| Option | Description |
|---|---|
--multihop | Enable multi-hop reasoning |
--max-hops N | Max reasoning steps (1-5) |
--api | Start API server |
--plugins | List available plugins |
--load-plugin NAME | Load plugin |
--help-extended | Show extended help |
--examples | Show usage examples |
--setup-keys | Securely set up API keys |
{
"llm": {
"base_url": "http://127.0.0.1:11434",
"model": "qwen3:8b",
"temperature": 0.7,
"max_tokens": 10000,
"stream": true
},
"search": {
"provider": "duckduckgo",
"max_results": 5,
"timeout": 10
},
"rag": {
"enabled": true,
"batch_size": 100,
"max_workers": 4
},
"cache": {
"enabled": true,
"ttl_hours": 24,
"max_size_mb": 500,
"clear_on_startup": false
},
"osint": {
"max_results": 20,
"email_search_limit": 50,
"phone_search_limit": 50,
"general_osint_limit": 100
},
"multihop": {
"enabled": true,
"max_hops": 3,
"confidence_threshold": 0.7,
"enable_critique": true
},
"plugins": {
"example_plugin": {
"enabled": true
}
},
"security": {
"rate_limit": 1.0,
"max_context_length": 8000,
"check_robots_txt": true
}
}
Recommended max_tokens Settings:
| GPU/Hardware | Recommended max_tokens | Model |
|---|---|---|
| RTX 3080+ (10GB+) | 10,000 - 16,000 | qwen3:8b, deepseek-r1:8b |
| RTX 3060/3070 (8GB) | 6,000 - 8,000 | qwen3:4b, llama3:7b |
| CPU Only | 2,000 - 4,000 | qwen3:4b |
Tip: Use the status command to monitor your token usage in real-time!
```bash
BRAVE_API_KEY=your_brave_api_key SERPER_API_KEY=your_serper_api_key
HTTP_PROXY=http://proxy:port HTTPS_PROXY=https://proxy:port ```
"security": { "rate_limit": 2.0 # 2 req/s } ```
CrawlLama's adaptive intelligence system with automatic agent selection and interactive commands.

```bash python main.py --interactive
```bash
```bash
CRAWLLAMA_DEV_MODE=true
curl -X POST http://localhost:8000/query \ -H "Content-Type: application/json" \ -H "X-API-Key: your_api_key_here" \ -d '{"query": "test"}'
export CRAWLLAMA_DEV_MODE=true python app.py
**Example Requests:**
bash
CrawlLama provides a complete REST API for integration into custom applications.
Windows:
run_api.bat
Linux/macOS:
./run_api.sh
Or manually:
uvicorn app:app --host 0.0.0.0 --port 8000
POST /query - Execute queries (with/without web search, multi-hop)GET /health - Health checkGET /stats - System statisticsPOST /memory/remember - Store data (OSINT)GET /memory/recall/{category} - Retrieve dataGET /plugins - Manage pluginsPOST /cache/clear - Clear cacherobots.txt```bash
curl http://localhost:8000/plugins
curl -X POST http://localhost:8000/plugins/example_plugin/load ```
```python
from core.plugin_manager import Plugin, PluginMetadata
class MyPlugin(Plugin): def get_metadata(self) -> PluginMetadata: return PluginMetadata( name="MyPlugin", version="1.0.0", description="My custom plugin", author="Your Name", dependencies=[] )
def get_tools(self): return [self.my_tool]
def my_tool(self, input: str) -> str: return f"Processed: {input}" ```
See: Plugin Tutorial for details
```
CrawlLama 是一个具备本地处理能力的 AI 研究智能体。它集成了强大的 OSINT(开源情报)模块,支持对 Email、电话、IP 地址及社交媒体进行深度分析。通过 LangGraph 实现多跳推理(Multi-hop reasoning),能够处理复杂的查询任务。请注意,虽然支持通过 Claude 或 OpenAI 等云端 API 进行增强,但若要实现完全的本地化运行,需要配置如 Ollama 这样的本地 LLM 后端。
项目核心功能包括:自适应智能体切换系统(Adaptive agent hopping),可根据查询复杂度自动选择合适的 Agent 并进行资源感知适配;基于 LangGraph 的多步工作流,支持复杂逻辑推理;强大的 OSINT 能力,支持使用 site:、inurl: 等高级搜索运算符进行情报搜集;此外,系统还具备持久化记忆存储功能,能够记录并关联 Email、域名及用户信息,确保研究过程的连续性。
运行 CrawlLama 需要 Python 3.11+ 环境。在安装依赖阶段,请确保网络环境稳定,执行 `pip install -r requirements.txt` 可能需要 5-10 分钟。建议开发者在安装前检查 Python 版本,并确保系统环境符合项目对依赖包的安全与兼容性要求。
推荐使用官方提供的 Setup Scripts 进行安装:Windows 用户运行 `setup.bat`,Linux/macOS 用户运行 `setup.sh`。脚本会自动检查 Python 版本、创建虚拟环境并引导模型选择。若选择手动安装,请确保已安装 Git、Python 3.10+ 以及 Ollama。特别提醒:若使用下载的压缩包,请务必使用仓库中最新的 `requirements.txt` 以修复潜在的安全漏洞。
项目启动流程分为两步:首先通过 `run_api.bat` 启动 API Server;随后可以通过交互式文档(http://localhost:8000/docs)进行测试。请注意,首次启动由于涉及初始化、依赖安装及模型下载,耗时会显著增加,但后续启动将非常迅速。你可以通过 `curl` 命令向本地 API 发送 POST 请求来执行查询任务。
项目通过 `.env` 文件进行配置管理。在完成虚拟环境创建并激活后,请将 `.env.example` 复制并重命名为 `.env`。你可以通过编辑器手动添加必要的 API keys(如 Claude 或 OpenAI 的密钥)以增强模型能力。对于本地化部署,请确保 Ollama 服务已正确配置并处于运行状态。
CrawlLama 提供功能完备的 API 接口与交互式 CLI 界面。开发者可以通过 `python main.py --interactive` 进入交互模式进行智能对话,也可以通过标准的 RESTful API 进行集成。API 支持插件化扩展,你可以通过查询 `/plugins` 接口获取当前加载的插件列表,并使用 POST 请求动态加载新的插件模块。
CrawlLama 的工作流基于高度智能的路由机制。系统会根据用户输入的查询复杂度(低/中/高)自动进行 Agent 路由与任务分发。通过 LangGraph 构建的推理链条,智能体能够执行多步骤的搜索与验证任务。同时,系统支持插件化开发模式,允许开发者通过 API 动态加载自定义插件来扩展其情报搜集与处理能力。
在使用过程中,如遇到启动缓慢或报错,请优先检查网络连接及模型下载状态。若遇到依赖冲突,建议重新创建虚拟环境并使用最新的 `requirements.txt` 进行安装。对于模型选择问题,请确保在初始化设置阶段已正确配置 LLM 模型,以避免测试程序��行异常。
该工具使用 NOASSERTION 协议,商用场景请仔细阅读协议条款,必要时咨询法律意见。
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
📄 NOASSERTION — 请查阅原始协议条款了解具体使用限制。
总体来看,爬虫lama 是一款质量良好的Agent工作流,在同类工具中具备一定竞争力。AI Skill Hub 将持续追踪其更新动态,建议收藏备用,结合自身场景选择合适时机引入使用。
| 原始名称 | Crawllama |
| 原始描述 | 开源AI工作流:CrawlLama 🦙 is an local AI agent that answers questions via Ollama and integra。⭐63 · Python |
| Topics | 自动化爬虫FastAPI |
| GitHub | https://github.com/arn-c0de/Crawllama |
| License | NOASSERTION |
| 语言 | Python |
收录时间:2026-05-30 · 更新时间:2026-05-30 · License:NOASSERTION · AI Skill Hub 不对第三方内容的准确性作法律背书。
选择 Agent 类型,复制安装指令后粘贴到对应客户端