快速爬虫工具 是 AI Skill Hub 本期精选MCP工具之一。综合评分 7.5 分,整体质量较高。我们推荐使用将其纳入你的 AI 工具库,帮助提升工作效率。
快速爬虫工具 是一款遵循 MCP(Model Context Protocol)标准协议的 AI 工具扩展。通过 MCP 协议,它可以让 Claude、Cursor 等主流 AI 客户端直接访问和操作外部工具、数据源和服务,实现 AI 能力的无缝扩展。无论是文件操作、数据库查询还是 API 调用,都可以通过自然语言在 AI 对话中直接触发,极大提升生产效率。
快速爬虫工具 是一款遵循 MCP(Model Context Protocol)标准协议的 AI 工具扩展。通过 MCP 协议,它可以让 Claude、Cursor 等主流 AI 客户端直接访问和操作外部工具、数据源和服务,实现 AI 能力的无缝扩展。无论是文件操作、数据库查询还是 API 调用,都可以通过自然语言在 AI 对话中直接触发,极大提升生产效率。
# 方式一:通过 Claude Code CLI 一键安装
claude skill install https://github.com/us/crw
# 方式二:手动配置 claude_desktop_config.json
{
"mcpServers": {
"------": {
"command": "npx",
"args": ["-y", "crw"]
}
}
}
# 配置文件位置
# macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
# Windows: %APPDATA%/Claude/claude_desktop_config.json
# 安装后在 Claude 对话中直接使用 # 示例: 用户: 请帮我用 快速爬虫工具 执行以下任务... Claude: [自动调用 快速爬虫工具 MCP 工具处理请求] # 查看可用工具列表 # 在 Claude 中输入:"列出所有可用的 MCP 工具"
// claude_desktop_config.json 配置示例
{
"mcpServers": {
"______": {
"command": "npx",
"args": ["-y", "crw"],
"env": {
// "API_KEY": "your-api-key-here"
}
}
}
}
// 保存后重启 Claude Desktop 生效
<a name="readme-top"></a> <p align="center"> <a href="https://fastcrw.com"> <img src="docs/logo.png" alt="fastCRW" height="120" /> </a> <p align="center">The web scraper built for AI agents. Single binary. Zero config.</p> <p align="center"> <a href="https://crates.io/crates/crw-server"><img src="https://img.shields.io/crates/v/crw-server.svg" alt="crates.io"></a> <a href="https://github.com/us/crw/actions"><img src="https://github.com/us/crw/workflows/CI/badge.svg" alt="CI"></a> <a href="LICENSE"><img src="https://img.shields.io/badge/license-AGPL--3.0-blue.svg" alt="License"></a> <a href="https://github.com/us/crw/stargazers"><img src="https://img.shields.io/github/stars/us/crw?style=social" alt="GitHub Stars"></a> <a href="https://fastcrw.com"><img src="https://img.shields.io/badge/Managed%20Cloud-fastcrw.com-blueviolet" alt="fastcrw.com"></a> </p> <p align="center"> <a href="https://twitter.com/fastcrw"> <img src="https://img.shields.io/badge/Follow%20on%20X-000000?style=for-the-badge&logo=x&logoColor=white" alt="Follow on X" /> </a> <a href="https://www.linkedin.com/company/fastcrw"> <img src="https://img.shields.io/badge/Follow%20on%20LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white" alt="Follow on LinkedIn" /> </a> <a href="https://discord.gg/kkFh2SC8"> <img src="https://img.shields.io/badge/Join%20our%20Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white" alt="Join our Discord" /> </a> </p> <p align="center"> <a href="https://www.producthunt.com/products/fastcrw?utm_source=badge-featured&utm_medium=badge&utm_campaign=badge-fastcrw" target="_blank" rel="noopener noreferrer"><img src="https://api.producthunt.com/widgets/embed-image/v1/featured.svg?post_id=1116966&theme=light&t=1775671073751" alt="fastCRW - Search + scrape live web results for AI agents | Product Hunt" width="250" height="54" /></a> </p> <p align="center"> Works with: Claude Code · Cursor · Windsurf · Cline · Copilot · Continue.dev · Codex · Gemini CLI </p> <p align="center"> <a href="#-quick-start">Quick Start</a> • <a href="#-connect-to-ai-agents">AI Agents</a> • <a href="#-benchmark">Benchmarks</a> • <a href="https://docs.fastcrw.com/#rest-api">API Reference</a> • <a href="https://fastcrw.com">Cloud</a> • <a href="https://discord.gg/kkFh2SC8">Discord</a> </p> <p align="center"> <b>English</b> | <a href="README.zh-CN.md">中文</a> </p> </p>
<p align="center"> <img src="docs/demo.gif" alt="fastCRW Demo" width="700" /> </p>
---
curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh | CRW_BINARY=crw sh
crw setup
client = CrwClient(api_url="http://localhost:3000") results = client.search("open source web scraper 2026", limit=10)
curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh | CRW_BINARY=crw sh
curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh | CRW_BINARY=crw-server sh
docker run -p 3000:3000 ghcr.io/us/crw
Custom port:bash CRW_SERVER__PORT=8080 crw-server # env var docker run -p 8080:8080 -e CRW_SERVER__PORT=8080 ghcr.io/us/crw # Docker
**Docker Compose** ships with `lightpanda` enabled by default; `chrome` is opt-in to keep small VPS deploys lean (~500MB image + 1GB resident):
bash
The CLI includes an interactive setup wizard for easy configuration:
crw setup
The wizard guides you through: - Cloud vs Local mode selection - Browser engine setup (LightPanda or Chrome for JS rendering) - Search engine setup (SearXNG via Docker) - LLM provider configuration (BYOK for AI features) - Shell configuration (auto-adds to .zshrc/.bashrc)
Options: - crw setup --cloud — skip to cloud setup - crw setup --local — skip to local setup - crw setup --no-color — disable colored output (accessibility) - Press ESC to cancel gracefully at any prompt
See the self-hosting guide for production hardening, auth, reverse proxy, and resource tuning.
---
```bash
This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination.
#### Renderer selection & response metadata
CRW picks between three rendering backends per request:
- **`http`** (1 credit) — plain HTTP fetch. Used for static pages.
- **`lightpanda`** (1 credit) — lightweight JS renderer for most SPAs.
- **`chrome`** (2 credits) — full Chromium for sites where LightPanda's hydration crashes (e.g. some Next.js App Router pages).
By default the engine auto-selects, learns per-host preferences after repeated failures, and falls over chrome → lightpanda → http transparently. Pass `"renderer"` to pin one of `auto | http | lightpanda | chrome` (Firecrawl's `engine` is also accepted as an alias).
Every successful response includes routing metadata so callers can audit and debug:
jsonc { "data": { "markdown": "...", "renderDecision": { "kind": "failover", // autoDefault | autoPromoted | userPinned | failover | breakerSkipped "chain": ["lightpanda", "chrome"], // renderers actually attempted "reason": "nextJsClientError" // why the chain advanced }, "creditCost": 2, "warnings": [ "lightpanda returned a failed render (nextjs_client_error)" ], "metadata": { "renderedWith": "chrome", / … / } } } ```
When you hard-pin a renderer that fails (e.g. "renderer":"lightpanda" on a hydration-crashing page), success stays true for protocol compatibility — but data.warnings[] carries an actionable hint suggesting renderer="chrome" or auto mode. Clients should surface the warnings array.
Layered TOML config with environment variable overrides:
config.default.toml — built-in defaultsconfig.local.toml — local overrides (or CRW_CONFIG=myconfig)CRW_ prefix, __ separator (e.g. CRW_SERVER__PORT=8080)```toml [server] host = "0.0.0.0" port = 3000 rate_limit_rps = 10
[renderer] mode = "auto" # auto | lightpanda | playwright | chrome | none
[crawler] max_concurrency = 10 requests_per_second = 10.0 respect_robots_txt = true
[auth]
Power AI agents with clean web data. Single Rust binary, zero config, Firecrawl-compatible API. The open-source Firecrawl alternative you can self-host for free — or use our managed cloud.
Don't want to self-host? Sign up free → — managed cloud with global proxy network, web search, and dashboard. Same API, zero infra. 500 free credits, no credit card required.
---
Core
| Feature | Description |
|---|---|
| [**Scrape**](#scrape) | Convert any URL to markdown, HTML, JSON, or links |
| [**Crawl**](#crawl) | Async BFS website crawler with rate limiting |
| [**Map**](#map) | Discover all URLs on a site instantly |
| [**Search**](#search) | Web search + content scraping — bundled SearXNG sidecar, free Tavily alternative |
More
| Feature | Description |
|---|---|
| [**LLM Extraction**](#llm-structured-extraction) | Send a JSON schema, get validated structured data back |
| **LLM Summary** | formats: ["summary"] on /v1/scrape — clean prose digest of any page. BYOK (Anthropic / OpenAI / Azure / DeepSeek / any OpenAI-compatible). |
| **LLM Search Answer** | answer: true / summarizeResults: true on /v1/search — synthesized answer with citations or per-result summaries |
| [**JS Rendering**](#js-rendering) | Auto-detect SPAs, render via LightPanda or Chrome |
| [**CLI**](#cli) | Scrape any URL from your terminal — no server needed |
| [**MCP Server**](#mcp-server-for-ai-agents) | Built-in stdio + HTTP transport for any AI agent |
Use Cases: RAG pipelines · AI agent web access · content monitoring · data extraction · HTML to markdown conversion · web archiving
---
<details>
<summary><b>cURL</b></summary>
bash
| Method | Endpoint | Description |
|---|---|---|
POST | /v1/scrape | Scrape a single URL, optionally with LLM extraction |
POST | /v1/crawl | Start async BFS crawl (returns job ID) |
GET | /v1/crawl/:id | Check crawl status and retrieve results |
DELETE | /v1/crawl/:id | Cancel a running crawl job |
POST | /v1/map | Discover all URLs on a site |
POST | /v1/search | Web search via SearXNG sidecar, with optional content scraping |
GET | /health | Health check (no auth required) |
POST | /mcp | Streamable HTTP MCP transport |
---
claude mcp add crw -- npx crw-mcp
```bash brew install us/crw/crw
For serving multiple apps, other languages (Node.js, Go, Java), or as a shared microservice.
```bash brew install us/crw/crw-server
crewai-crw — CRW scraping tools for CrewAI agentslangchain-crw — CRW document loader for LangChainNode.js: No official SDK yet — use the REST API directly or npx crw-mcp for MCP. SDK examples →
---
```
See full configuration reference.
---
Frameworks: CrewAI · LangChain · Agno · Dify
Missing your favorite tool? Open an issue → · All integrations →
---
/v1/scrape, /v1/crawl, /v1/map endpoints. HTML to markdown, structured data extraction, website crawler — all built-in| Metric | CRW (self-hosted) | fastcrw.com (cloud) | Firecrawl | Tavily | Crawl4AI |
|---|---|---|---|---|---|
| **Coverage (1K URLs)** | **92.0%** | **92.0%** | 77.2% | — | — |
| **Avg Scrape Latency** | **833ms** | **833ms** | 4,600ms | — | — |
| **Avg Search Latency** | **880ms** | **880ms** | 954ms | 2,000ms | — |
| **Search Win Rate** | **73/100** | **73/100** | 25/100 | 2/100 | — |
| **Idle RAM** | 6.6 MB | 0 (managed) | ~500 MB+ | — (cloud) | — |
| **Cold start** | 85 ms | 0 (always-on) | 30–60 s | — | — |
| **Self-hosting** | **Single binary** | — | Multi-container | No | Python + Playwright |
| **Cost / 1K scrapes** | **$0** (self-hosted) | From $13/mo | $0.83–5.33 | — | $0 |
| **License** | AGPL-3.0 | Managed | AGPL-3.0 | Proprietary | Apache-2.0 |
---
| Metric | CRW | Firecrawl | Tavily |
|---|---|---|---|
| **Avg Latency** | **880ms** | 954ms | 2,000ms |
| **Median Latency** | **785ms** | 932ms | 1,724ms |
| **Win Rate** | **73/100** | 25/100 | 2/100 |
CRW is 2.3x faster than Tavily and won 73% of latency races. Full search benchmark →
Tested on Firecrawl's scrape-content-dataset-v1:
| Metric | CRW | Firecrawl v2.5 |
|---|---|---|
| **Coverage** | **92.0%** | 77.2% |
| **Avg Latency** | **833ms** | 4,600ms |
| **P50 Latency** | **446ms** | — |
| **Noise Rejection** | **88.4%** | noise 6.8% |
| **Idle RAM** | **6.6 MB** | ~500 MB+ |
| **Cost / 1K scrapes** | **$0** (self-hosted) | $0.83–5.33 |
<details> <summary><b>Resource comparison</b></summary>
| Metric | CRW | Firecrawl |
|---|---|---|
| Min RAM | ~7 MB | 4 GB |
| Recommended RAM | ~64 MB (under load) | 8–16 GB |
| Docker images | single ~8 MB binary | ~2–3 GB total |
| Cold start | 85 ms | 30–60 seconds |
| Containers needed | 1 (+optional sidecar) | 5 |
</details>
Run the benchmark yourself:
pip install datasets aiohttp
python bench/run_bench.py
---
| Self-hosted (free) | [fastcrw.com](https://fastcrw.com) Cloud | |
|---|---|---|
| Core scraping | ✅ | ✅ |
| JS rendering | ✅ (LightPanda/Chrome) | ✅ |
| Web search | ✅ (bundled SearXNG sidecar) | ✅ (managed) |
| Global proxy network | ❌ | ✅ |
| Dashboard | ❌ | ✅ |
| Commercial use without open-sourcing | Requires AGPL compliance | ✅ Included |
| Cost | $0 | From $13/mo |
Sign up free → — 500 free credits, no credit card required.
---
快速、轻量级的爬虫工具,值得使用
该工具使用 AGPL-3.0 协议,商用场景请仔细阅读协议条款,必要时咨询法律意见。
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
⚠️ AGPL 3.0 — 最严格的 Copyleft,网络服务端使用也需开源,SaaS 使用受限。
经综合评估,快速爬虫工具 在MCP工具赛道中表现稳健,质量良好。如果你已有明确的使用需求,可以直接上手体验;如果还在评估阶段,建议对比同类工具后再做决策。
| 原始名称 | crw |
| 原始描述 | 开源MCP工具:Fast, lightweight Firecrawl alternative in Rust. Web scraper, crawler & search A。⭐113 · Rust |
| Topics | crawlerdata-extractionrust |
| GitHub | https://github.com/us/crw |
| License | AGPL-3.0 |
| 语言 | Rust |
收录时间:2026-05-25 · 更新时间:2026-05-26 · License:AGPL-3.0 · AI Skill Hub 不对第三方内容的准确性作法律背书。
选择 Agent 类型,复制安装指令后粘贴到对应客户端