WebReaper 是 AI Skill Hub 本期精选MCP工具之一。综合评分 7.5 分,整体质量较高。我们推荐使用将其纳入你的 AI 工具库,帮助提升工作效率。
WebReaper 是一款遵循 MCP(Model Context Protocol)标准协议的 AI 工具扩展。通过 MCP 协议,它可以让 Claude、Cursor 等主流 AI 客户端直接访问和操作外部工具、数据源和服务,实现 AI 能力的无缝扩展。无论是文件操作、数据库查询还是 API 调用,都可以通过自然语言在 AI 对话中直接触发,极大提升生产效率。
WebReaper 是一款遵循 MCP(Model Context Protocol)标准协议的 AI 工具扩展。通过 MCP 协议,它可以让 Claude、Cursor 等主流 AI 客户端直接访问和操作外部工具、数据源和服务,实现 AI 能力的无缝扩展。无论是文件操作、数据库查询还是 API 调用,都可以通过自然语言在 AI 对话中直接触发,极大提升生产效率。
# 方式一:通过 Claude Code CLI 一键安装
claude skill install https://github.com/pavlovtech/WebReaper
# 方式二:手动配置 claude_desktop_config.json
{
"mcpServers": {
"webreaper": {
"command": "npx",
"args": ["-y", "webreaper"]
}
}
}
# 配置文件位置
# macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
# Windows: %APPDATA%/Claude/claude_desktop_config.json
# 安装后在 Claude 对话中直接使用 # 示例: 用户: 请帮我用 WebReaper 执行以下任务... Claude: [自动调用 WebReaper MCP 工具处理请求] # 查看可用工具列表 # 在 Claude 中输入:"列出所有可用的 MCP 工具"
// claude_desktop_config.json 配置示例
{
"mcpServers": {
"webreaper": {
"command": "npx",
"args": ["-y", "webreaper"],
"env": {
// "API_KEY": "your-api-key-here"
}
}
}
}
// 保存后重启 Claude Desktop 生效
macOS / Linux (Homebrew):
brew install pavlovtech/webreaper/webreaper
Any POSIX shell (install.sh):
curl -fsSL https://raw.githubusercontent.com/pavlovtech/WebReaper/master/install.sh | sh
.NET library:
dotnet add package WebReaper
Windows binaries are on the GitHub Releases page; winget and Scoop are on the v10.1 roadmap.
webreaper scrape https://example.com --browser --auto-stealth
webreaper init ```
The CLI is built Native-AOT (ADR-0043), ships as a single binary on every tagged GitHub release across six RIDs (linux-x64, linux-arm64, osx-x64, osx-arm64, win-x64, win-arm64), and is bot-check-aware (ADR-0056). The macOS binaries are Apple codesigned and notarized (ADR-0071); Homebrew installs run without Gatekeeper warnings on a clean machine.
using WebReaper.Extraction.Attributes;
[ScrapeSchema]
public partial class Article
{
[ScrapeField("h1")] public string? Title { get; set; }
[ScrapeField(".views", Type = SchemaFieldType.Integer)] public int Views { get; set; }
[ScrapeField(".tag", IsList = true)] public List<string> Tags { get; set; } = new();
}
// Emitted at compile time, reflection-free, AOT-clean:
// public static Schema Schema { get; }
// public static Article Materialize(JsonObject json)
var engine = await ScraperEngineBuilder
.Crawl("https://example.com/post")
.Extract(Article.Schema)
.Subscribe(p => HandleArticle(Article.Materialize(p.Data)))
.BuildAsync();
The WebReaper.Extraction.Generators Roslyn analyzer (ADR-0045) emits the schema and a Materialize function. Schema typos are compile errors; the generated path uses no reflection so it's AOT-clean.
webreaper map plus webreaper scrape per URL, piped into a prompt or a vector DB..WithChangeTracking() (ADR-0048) so the sink fires only on diff. Hash-based dedup; cron-friendly.LlmAgent.RunAsync(url, goal, chatClient) decides which links to follow until the goal is met. Durable resume across restarts.--browser --auto-stealth from the CLI; stealth backend auto-installs on first detected challenge.[ScrapeSchema] POCO plus the source generator; reflection-free, AOT-compiles into a native binary.dotnet add package WebReaper; the public registration seam lets you plug Redis, Cosmos DB, your own sink.```bash $ webreaper scrape https://news.ycombinator.com
Examples/WebReaper.AiNativeShowcase wires every feature in this section:
dotnet run --project Examples/WebReaper.AiNativeShowcase -- markdown
dotnet run --project Examples/WebReaper.AiNativeShowcase -- sourcegen
dotnet run --project Examples/WebReaper.AiNativeShowcase -- llm
dotnet run --project Examples/WebReaper.AiNativeShowcase -- router
dotnet run --project Examples/WebReaper.AiNativeShowcase -- changetrack
The single binary works inside any shell-spawning agent: LangChain ShellTool, OpenAI Assistants code-interpreter, GitHub Actions, internal scripts. Zero runtime to install; one syscall to invoke.
The library is a fluent builder over a small set of seams. For the deep seam-by-seam reference (interfaces, main entities, custom sinks), see docs/architecture.md.
The release ships eleven packages (one core, ten satellites), all versioned in lockstep at 10.0.0. The core stays dependency-light and Native-AOT-publishable with zero warnings; satellites bring their own SDK dependencies and quarantine them off the core graph (ADR-0009).
| Package | Add it for | Key builder calls |
|---|---|---|
| **WebReaper** | Core. HTTP crawl and parse, in-memory and file scheduler / visited-link tracker / cookie and config storage, Console / CSV / JSON-Lines sinks, Markdown extractor, schema fold. Dependency-light, Native-AOT-ready, Newtonsoft-free. | Crawl Extract AsMarkdown Follow Paginate WriteToJsonFile WriteToCsvFile WriteToConsole |
| **WebReaper.Cdp** | Raw CDP IPageLoadTransport (ADR-0052). AOT-clean (no PuppeteerSharp / Playwright dependency); System.Net.WebSockets plus System.Text.Json source-gen. Bedrock for the stealth pattern. | .WithCdpPageLoader(cdpUrl) (BYO) or .WithCdpPageLoader(CdpLaunchOptions) (launch managed Chromium) |
| **WebReaper.Playwright** | Microsoft.Playwright-backed transport (ADR-0053). Multi-browser (Chromium default; Firefox / WebKit opt-in). All seven PageAction arms supported. Use for modern multi-browser needs; pair with WebReaper.Cdp for AOT or stealth. | .WithPlaywrightPageLoader() |
| **WebReaper.Stealth.CloakBrowser** | First stealth-backend satellite (ADR-0054). Auto-downloads CloakBrowser on first use; composes on WebReaper.Cdp. Disposable via the ADR-0058 engine teardown chain. | .WithCloakBrowser() |
| **WebReaper.AI** | LLM extraction, LLM action resolver, LLM brain, LLM self-healing, LLM schema inferrer (ADR-0044 / 0050 / 0051 / 0067). Built on Microsoft.Extensions.AI; bring your own IChatClient. | .WithLlmFallback .WithLlmSelfHealing .WithLlmExtractor .WithLlmAgentBrain .WithLlmActionResolver .WithLlmSchemaInferrer .UseAi(client) |
| **WebReaper.Extraction.Attributes** | The [ScrapeSchema] / [ScrapeField] marker types. Standalone, no runtime cost. | [ScrapeSchema] [ScrapeField("selector")] |
| **WebReaper.Extraction.Generators** | Roslyn source generator that emits static Schema plus reflection-free static Materialize(JsonObject) (ADR-0045). DevelopmentDependency=true; does not propagate at runtime. | compile-time only |
| **WebReaper.Mcp** | MCP server Exe exposing scrape / map / extract as MCP tools over stdio (ADR-0049). Interop adapter for MCP-only clients. | the package _is_ the executable |
| **WebReaper.Mongo** | MongoDB result sink and MongoDB-backed config / cookie storage. | .WriteToMongoDb(...) .WithMongoDbConfigStorage(...) .WithMongoDbCookieStorage(...) |
| **WebReaper.Redis** | Redis scheduler, visited-link tracker, result sink, config / cookie storage. | .WithRedisScheduler(...) .TrackVisitedLinksInRedis(...) .WriteToRedis(...) .WithRedisConfigStorage(...) .WithRedisCookieStorage(...) |
| **WebReaper.AzureServiceBus** | Distributed scheduler over an Azure Service Bus queue. | .WithAzureServiceBusScheduler(...) |
| **WebReaper.Cosmos** | Azure Cosmos DB result sink. | .WriteToCosmosDb(...) |
| **WebReaper.Sqlite** | Local **durable** scheduler and visited-link tracker on an embedded SQLite store; resume is a query, no position file. Opt-in robust-local tier (no server, unlike Redis). | .WithSqliteScheduler(...) .TrackVisitedLinksInSqlite(...) |
WebReaper.Cli (the AOT single-binary; ADR-0043) is not a NuGet package; it ships as platform binaries on every GitHub release (Native-AOT plus dotnet tool install are mutually incompatible on one target). Install via Homebrew or install.sh, or build from source.
| WebReaper | Firecrawl | Crawl4AI | WebFetch (Claude) | |
|---|---|---|---|---|
| **License** | MIT | AGPL-3.0 (plus commercial) | Apache 2.0 | bundled with Claude |
| **Install** | one binary, ~12 MB | Docker + Postgres + Redis (self-host) or hosted | Docker + Python + Playwright | nothing to install |
| **Cost** | free | metered API plus free tier | free | included with Claude |
| **BYO LLM** | any IChatClient | no (their model) | yes (LiteLLM) | Claude only |
| **Autonomous agent** | Agent.RunAsync() durable, in-process | /agent endpoint (cloud only) | code it yourself | not available |
| **Bot-protected** | --auto-stealth | cloud yes; self-host degraded (no Fire-engine) | BYO | no |
| **Claude Code skill** | webreaper init bundled | community firecrawl-claude-code-skill wraps the cloud API | none official | not applicable |
Crawlee (Apify's Node/Python library) is also worth knowing; it covers similar ground to the WebReaper library API but doesn't ship a binary, a Claude Code skill, or a built-in LLM safety net. Use it if you're already in the Apify ecosystem.
The closest reference is Firecrawl: same AI-native positioning, opposite distribution shape. Firecrawl optimises for the hosted-API flow; WebReaper optimises for the local-binary flow. If you want a managed cloud with someone else's proxies and infra, Firecrawl is the buy. If you want a binary that runs locally with your own LLM key and no metering, WebReaper is the build.
高效的AI-native网页爬虫工具
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ MIT 协议 — 最宽松的开源协议之一,可自由商用、修改、分发,仅需保留版权声明。
经综合评估,WebReaper 在MCP工具赛道中表现稳健,质量良好。如果你已有明确的使用需求,可以直接上手体验;如果还在评估阶段,建议对比同类工具后再做决策。
| 原始名称 | WebReaper |
| 原始描述 | 开源MCP工具:AI-native web scraper. Single binary with a bundled Claude Code skill. MIT-licen。⭐134 · C# |
| Topics | ai-agents-automationcrawlerdotnet |
| GitHub | https://github.com/pavlovtech/WebReaper |
| License | MIT |
| 语言 | C# |
收录时间:2026-05-27 · 更新时间:2026-05-27 · License:MIT · AI Skill Hub 不对第三方内容的准确性作法律背书。
选择 Agent 类型,复制安装指令后粘贴到对应客户端