Scrapling网页爬虫框架 是 AI Skill Hub 本期精选AI工具之一。在 GitHub 上收获超过 49.2k 颗 Star,综合评分 8.2 分,整体质量较高。我们强烈推荐将其纳入你的 AI 工具库,帮助提升工作效率。
Scrapling网页爬虫框架 是一款基于 Python 开发的开源工具,专注于 网页爬虫、MCP工具、数据采集 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
Scrapling网页爬虫框架 是一款基于 Python 开发的开源工具,专注于 网页爬虫、MCP工具、数据采集 等核心功能。作为 GitHub 开源项目,它拥有活跃的社区支持和持续的版本迭代,代码完全透明可审计,支持本地部署以保护数据隐私。无论是个人使用还是集成到企业工作流,都能提供稳定可靠的解决方案。
# 方式一:pip 安装(推荐)
pip install scrapling
# 方式二:虚拟环境安装(推荐生产环境)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install scrapling
# 方式三:从源码安装(获取最新功能)
git clone https://github.com/D4Vinci/Scrapling
cd Scrapling
pip install -e .
# 验证安装
python -c "import scrapling; print('安装成功')"
# 命令行使用
scrapling --help
# 基本用法
scrapling input_file -o output_file
# Python 代码中调用
import scrapling
# 示例
result = scrapling.process("input")
print(result)
# scrapling 配置文件示例(config.yml) app: name: "scrapling" debug: false log_level: "INFO" # 运行时指定配置文件 scrapling --config config.yml # 或通过环境变量配置 export SCRAPLING_API_KEY="your-key" export SCRAPLING_OUTPUT_DIR="./output"
<p align="center"> <a href="https://trendshift.io/repositories/14244" target="_blank"><img src="https://trendshift.io/api/badge/repositories/14244" alt="D4Vinci%2FScrapling | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a> <br/> <a href="https://github.com/D4Vinci/Scrapling/blob/main/docs/README_AR.md">العربيه</a> | <a href="https://github.com/D4Vinci/Scrapling/blob/main/docs/README_ES.md">Español</a> | <a href="https://github.com/D4Vinci/Scrapling/blob/main/docs/README_PT_BR.md">Português (Brasil)</a> | <a href="https://github.com/D4Vinci/Scrapling/blob/main/docs/README_FR.md">Français</a> | <a href="https://github.com/D4Vinci/Scrapling/blob/main/docs/README_DE.md">Deutsch</a> | <a href="https://github.com/D4Vinci/Scrapling/blob/main/docs/README_CN.md">简体中文</a> | <a href="https://github.com/D4Vinci/Scrapling/blob/main/docs/README_JP.md">日本語</a> | <a href="https://github.com/D4Vinci/Scrapling/blob/main/docs/README_RU.md">Русский</a> | <a href="https://github.com/D4Vinci/Scrapling/blob/main/docs/README_KR.md">한국어</a> <br/> <a href="https://github.com/D4Vinci/Scrapling/actions/workflows/tests.yml" alt="Tests"> <img alt="Tests" src="https://github.com/D4Vinci/Scrapling/actions/workflows/tests.yml/badge.svg"></a> <a href="https://badge.fury.io/py/Scrapling" alt="PyPI version"> <img alt="PyPI version" src="https://badge.fury.io/py/Scrapling.svg"></a> <a href="https://clickpy.clickhouse.com/dashboard/scrapling" rel="nofollow"><img src="https://img.shields.io/pypi/dm/scrapling" alt="PyPI package downloads"></a> <a href="https://github.com/D4Vinci/Scrapling/tree/main/agent-skill" alt="AI Agent Skill directory"> <img alt="Static Badge" src="https://img.shields.io/badge/Skill-black?style=flat&label=Agent&link=https%3A%2F%2Fgithub.com%2FD4Vinci%2FScrapling%2Ftree%2Fmain%2Fagent-skill"></a> <a href="https://clawhub.ai/D4Vinci/scrapling-official" alt="OpenClaw Skill"> <img alt="OpenClaw Skill" src="https://img.shields.io/badge/Clawhub-darkred?style=flat&label=OpenClaw&link=https%3A%2F%2Fclawhub.ai%2FD4Vinci%2Fscrapling-official"></a> <br/> <a href="https://discord.gg/EMgGbDceNQ" alt="Discord" target="_blank"> <img alt="Discord" src="https://img.shields.io/discord/1360786381042880532?style=social&logo=discord&link=https%3A%2F%2Fdiscord.gg%2FEMgGbDceNQ"> </a> <a href="https://x.com/Scrapling_dev" alt="X (formerly Twitter)"> <img alt="X (formerly Twitter) Follow" src="https://img.shields.io/twitter/follow/Scrapling_dev?style=social&logo=x&link=https%3A%2F%2Fx.com%2FScrapling_dev"> </a> <br/> <a href="https://pypi.org/project/scrapling/" alt="Supported Python versions"> <img alt="Supported Python versions" src="https://img.shields.io/pypi/pyversions/scrapling.svg"></a> </p>
<p align="center"> <a href="https://scrapling.readthedocs.io/en/latest/parsing/selection.html"><strong>Selection methods</strong></a> · <a href="https://scrapling.readthedocs.io/en/latest/fetching/choosing.html"><strong>Fetchers</strong></a> · <a href="https://scrapling.readthedocs.io/en/latest/spiders/architecture.html"><strong>Spiders</strong></a> · <a href="https://scrapling.readthedocs.io/en/latest/spiders/proxy-blocking.html"><strong>Proxy Rotation</strong></a> · <a href="https://scrapling.readthedocs.io/en/latest/cli/overview.html"><strong>CLI</strong></a> · <a href="https://scrapling.readthedocs.io/en/latest/ai/mcp-server.html"><strong>MCP</strong></a> </p>
Scrapling is an adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl.
Its parser learns from website changes and automatically relocates your elements when pages update. Its fetchers bypass anti-bot systems like Cloudflare Turnstile out of the box. And its spider framework lets you scale up to concurrent, multi-session crawls with pause/resume and automatic proxy rotation - all in a few lines of Python. One library, zero compromises.
Blazing fast crawls with real-time stats and streaming. Built by Web Scrapers for Web Scrapers and regular users, there's something for everyone.
from scrapling.fetchers import Fetcher, AsyncFetcher, StealthyFetcher, DynamicFetcher
StealthyFetcher.adaptive = True
p = StealthyFetcher.fetch('https://example.com', headless=True, network_idle=True) # Fetch website under the radar!
products = p.css('.product', auto_save=True) # Scrape data that survives website design changes!
products = p.css('.product', adaptive=True) # Later, if the website structure changes, pass `adaptive=True` to find them! Or scale up to full crawls from scrapling.spiders import Spider, Response
class MySpider(Spider):
name = "demo"
start_urls = ["https://example.com/"]
async def parse(self, response: Response):
for item in response.css('.product'):
yield {"title": item.css('h2::text').get()}
MySpider().start()
<p align="center"> <a href="https://dataimpulse.com/?utm_source=scrapling&utm_medium=banner&utm_campaign=scrapling" target="_blank" style="display:flex; justify-content:center; padding:4px 0;"> <img src="https://raw.githubusercontent.com/D4Vinci/Scrapling/main/images/DataImpulse.png" alt="At DataImpulse, we specialize in developing custom proxy services for your business. Make requests from anywhere, collect data, and enjoy fast connections with our premium proxies." style="max-height:60px;"> </a> </p>
1. If you are going to use any of the extra features below, the fetchers, or their classes, you will need to install fetchers' dependencies and their browser dependencies as follows:
pip install "scrapling[fetchers]"
scrapling install # normal install
scrapling install --force # force reinstall
This downloads all browsers, along with their system dependencies and fingerprint manipulation dependencies.
Or you can install them from the code instead of running a command like this:
from scrapling.cli import install
install([], standalone_mode=False) # normal install
install(["--force"], standalone_mode=False) # force reinstall
2. Extra features: - Install the MCP server feature:
pip install "scrapling[ai]"
- Install shell features (Web Scraping shell and the extract command): pip install "scrapling[shell]"
- Install everything: pip install "scrapling[all]"
Remember that you need to install the browser dependencies with scrapling install after any of these extras (if you didn't already)
Let's give you a quick glimpse of what Scrapling can do without deep diving.
Scrapling requires Python 3.10 or higher:
pip install scrapling
This installation only includes the parser engine and its dependencies, without any fetchers or commandline dependencies.
You can also install a Docker image with all extras and browsers with the following command from DockerHub:
docker pull pyd4vinci/scrapling Or download it from the GitHub registry: docker pull ghcr.io/d4vinci/scrapling:latest This image is automatically built and pushed using GitHub Actions and the repository's main branch.
HTTP requests with session support ```python from scrapling.fetchers import Fetcher, FetcherSession
with FetcherSession(impersonate='chrome') as session: # Use latest version of Chrome's TLS fingerprint page = session.get('https://quotes.toscrape.com/', stealthy_headers=True) quotes = page.css('.quote .text::text').getall()
```python import asyncio from scrapling.fetchers import FetcherSession, AsyncStealthySession, AsyncDynamicSession
async with FetcherSession(http3=True) as session: # FetcherSession is context-aware and can work in both sync/async patterns page1 = session.get('https://quotes.toscrape.com/') page2 = session.get('https://quotes.toscrape.com/', impersonate='firefox135')
async with AsyncStealthySession(max_pages=2) as session: tasks = [] urls = ['https://example.com/page1', 'https://example.com/page2'] for url in urls: task = session.fetch(url) tasks.append(task) print(session.get_pool_stats()) # Optional - The status of the browser tabs pool (busy/free/error) results = await asyncio.gather(*tasks) print(session.get_pool_stats()) ```
Scrapling includes a powerful command-line interface:
Launch the interactive Web Scraping shell
scrapling shell Extract pages to a file directly without programming (Extracts the content inside the body tag by default). If the output file ends with .txt, then the text content of the target will be extracted. If it ends in .md, it will be a Markdown representation of the HTML content; if it ends in .html, it will be the HTML content itself. scrapling extract get 'https://example.com' content.md
scrapling extract get 'https://example.com' content.txt --css-selector '#fromSkipToProducts' --impersonate 'chrome' # All elements matching the CSS selector '#fromSkipToProducts'
scrapling extract fetch 'https://example.com' content.md --css-selector '#fromSkipToProducts' --no-headless
scrapling extract stealthy-fetch 'https://nopecha.com/demo/cloudflare' captchas.html --css-selector '#padded_content a' --solve-cloudflare
[!NOTE] There are many additional features, but we want to keep this page concise, including the MCP server and the interactive Web Scraping Shell. Check out the full documentation here
Scrapling作为MCP生态爬虫工具,集自适应采集与AI集成于一身,代码活跃度高,适合现代AI应用构建数据管道。
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ BSD 3-Clause — 宽松协议,可商用修改分发,禁止使用原作者名称进行背书宣传。
经综合评估,Scrapling网页爬虫框架 在AI工具赛道中表现稳健,质量优秀。如果你已有明确的使用需求,可以直接上手体验;如果还在评估阶段,建议对比同类工具后再做决策。
| 原始名称 | Scrapling |
| 原始描述 | 开源MCP工具:🕷️ An adaptive Web Scraping framework that handles everything from a single req。⭐49.2k · Python |
| Topics | 网页爬虫MCP工具数据采集自动化Python库 |
| GitHub | https://github.com/D4Vinci/Scrapling |
| License | BSD-3-Clause |
| 语言 | Python |
收录时间:2026-05-13 · 更新时间:2026-05-16 · License:BSD-3-Clause · AI Skill Hub 不对第三方内容的准确性作法律背书。