经 AI Skill Hub 精选评估,Apify Python SDK 获评「强烈推荐」。这款Agent工作流在功能完整性、社区活跃度和易用性方面表现出色,AI 评分 8.0 分,适合有一定技术背景的用户使用。
构建服务器端Apify Actor的官方Python库,实现工作流自动化
Apify Python SDK 是一套完整的 AI Agent 自动化工作流方案。通过可视化的节点编排,将复杂的多步骤任务拆解为清晰的自动化流程,实现全程无人值守的智能处理。支持与数百种外部服务和 API 无缝集成,适合构建数据处理管线、业务自动化和 AI 辅助决策系统。
构建服务器端Apify Actor的官方Python库,实现工作流自动化
Apify Python SDK 是一套完整的 AI Agent 自动化工作流方案。通过可视化的节点编排,将复杂的多步骤任务拆解为清晰的自动化流程,实现全程无人值守的智能处理。支持与数百种外部服务和 API 无缝集成,适合构建数据处理管线、业务自动化和 AI 辅助决策系统。
# 方式一:pip 安装(推荐)
pip install apify-sdk-python
# 方式二:虚拟环境安装(推荐生产环境)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install apify-sdk-python
# 方式三:从源码安装(获取最新功能)
git clone https://github.com/apify/apify-sdk-python
cd apify-sdk-python
pip install -e .
# 验证安装
python -c "import apify_sdk_python; print('安装成功')"
# 命令行使用
apify-sdk-python --help
# 基本用法
apify-sdk-python input_file -o output_file
# Python 代码中调用
import apify_sdk_python
# 示例
result = apify_sdk_python.process("input")
print(result)
# apify-sdk-python 配置文件示例(config.yml) app: name: "apify-sdk-python" debug: false log_level: "INFO" # 运行时指定配置文件 apify-sdk-python --config config.yml # 或通过环境变量配置 export APIFY_SDK_PYTHON_API_KEY="your-key" export APIFY_SDK_PYTHON_OUTPUT_DIR="./output"
<p align="center"> <a href="https://badge.fury.io/py/apify" rel="nofollow"><img src="https://badge.fury.io/py/apify.svg" alt="PyPI package version"></a> <a href="https://pypi.org/project/apify/" rel="nofollow"><img src="https://img.shields.io/pypi/dm/apify" alt="PyPI package downloads"></a> <a href="https://codecov.io/gh/apify/apify-sdk-python"><img src="https://codecov.io/gh/apify/apify-sdk-python/graph/badge.svg?token=Y6JBIZQFT6" alt="Codecov report"></a> <a href="https://pypi.org/project/apify/" rel="nofollow"><img src="https://img.shields.io/pypi/pyversions/apify" alt="PyPI Python version"></a> <a href="https://discord.gg/jyEM2PRvMU" rel="nofollow"><img src="https://img.shields.io/discord/801163717915574323?label=discord" alt="Chat on Discord"></a> </p>
The Apify SDK for Python is the official library to create Apify Actors in Python. It provides useful features like Actor lifecycle management, local storage emulation, and Actor event handling.
If you just need to access the Apify API from your Python applications, check out the Apify Client for Python instead.
The Apify SDK for Python is available on PyPI as the apify package. For default installation, using Pip, run the following:
pip install apify
For users interested in integrating Apify with Scrapy, we provide a package extra called scrapy. To install Apify with the scrapy extra, use the following command:
pip install apify[scrapy]
Below are few examples demonstrating how to use the Apify SDK with some web scraping-related libraries.
To see how you can use the Apify SDK with other popular libraries used for web scraping, check out our guides for using BeautifulSoup with HTTPX, Parsel with Impit, Playwright, Selenium, Crawlee, or Scrapy.
To learn more about the features of the Apify SDK and how to use them, check out the Usage Concepts section in the sidebar, particularly the guides for the Actor lifecycle, working with storages, handling Actor events or how to use proxies.
This example illustrates how to integrate the Apify SDK with HTTPX and BeautifulSoup to scrape data from web pages.
from bs4 import BeautifulSoup
from httpx import AsyncClient
from apify import Actor
async def main() -> None:
async with Actor:
# Retrieve the Actor input, and use default values if not provided.
actor_input = await Actor.get_input() or {}
start_urls = actor_input.get('start_urls', [{'url': 'https://apify.com'}])
# Open the default request queue for handling URLs to be processed.
request_queue = await Actor.open_request_queue()
# Enqueue the start URLs.
for start_url in start_urls:
url = start_url.get('url')
await request_queue.add_request(url)
# Process the URLs from the request queue.
while request := await request_queue.fetch_next_request():
Actor.log.info(f'Scraping {request.url} ...')
# Fetch the HTTP response from the specified URL using HTTPX.
async with AsyncClient() as client:
response = await client.get(request.url)
# Parse the HTML content using Beautiful Soup.
soup = BeautifulSoup(response.content, 'html.parser')
# Extract the desired data.
data = {
'url': request.url,
'title': soup.title.string,
'h1s': [h1.text for h1 in soup.find_all('h1')],
'h2s': [h2.text for h2 in soup.find_all('h2')],
'h3s': [h3.text for h3 in soup.find_all('h3')],
}
# Store the extracted data to the default dataset.
await Actor.push_data(data)
This example demonstrates how to use the Apify SDK alongside PlaywrightCrawler from Crawlee to perform web scraping.
from crawlee.crawlers import PlaywrightCrawler, PlaywrightCrawlingContext
from apify import Actor
async def main() -> None:
async with Actor:
# Retrieve the Actor input, and use default values if not provided.
actor_input = await Actor.get_input() or {}
start_urls = [url.get('url') for url in actor_input.get('start_urls', [{'url': 'https://apify.com'}])]
# Exit if no start URLs are provided.
if not start_urls:
Actor.log.info('No start URLs specified in Actor input, exiting...')
await Actor.exit()
# Create a crawler.
crawler = PlaywrightCrawler(
# Limit the crawl to max requests. Remove or increase it for crawling all links.
max_requests_per_crawl=50,
headless=True,
)
# Define a request handler, which will be called for every request.
@crawler.router.default_handler
async def request_handler(context: PlaywrightCrawlingContext) -> None:
url = context.request.url
Actor.log.info(f'Scraping {url}...')
# Extract the desired data.
data = {
'url': context.request.url,
'title': await context.page.title(),
'h1s': [await h1.text_content() for h1 in await context.page.locator('h1').all()],
'h2s': [await h2.text_content() for h2 in await context.page.locator('h2').all()],
'h3s': [await h3.text_content() for h3 in await context.page.locator('h3').all()],
}
# Store the extracted data to the default dataset.
await context.push_data(data)
# Enqueue additional links found on the current page.
await context.enqueue_links()
# Run the crawler with the starting URLs.
await crawler.run(start_urls)
高质量的Python库,实现Apify Actor的构建和工作流自动化
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ Apache 2.0 — 宽松开源协议,可商用,需保留版权声明和 NOTICE 文件,含专利授权条款。
AI Skill Hub 点评:Apify Python SDK 的核心功能完整,质量优秀。对于自动化工程师和运维人员来说,这是一个值得纳入个人工具库的选择。建议先在非生产环境试用,再逐步推广。
| 原始名称 | apify-sdk-python |
| Topics | apifyactorautomationcrawleedata-extraction |
| GitHub | https://github.com/apify/apify-sdk-python |
| License | Apache-2.0 |
| 语言 | Python |
收录时间:2026-06-05 · 更新时间:2026-06-05 · License:Apache-2.0 · AI Skill Hub 不对第三方内容的准确性作法律背书。
选择 Agent 类型,复制安装指令后粘贴到对应客户端