数据工作流基线 是 AI Skill Hub 本期精选Agent工作流之一。综合评分 7.5 分,整体质量较高。我们推荐使用将其纳入你的 AI 工具库,帮助提升工作效率。
数据工作流基线 是一套完整的 AI Agent 自动化工作流方案。通过可视化的节点编排,将复杂的多步骤任务拆解为清晰的自动化流程,实现全程无人值守的智能处理。支持与数百种外部服务和 API 无缝集成,适合构建数据处理管线、业务自动化和 AI 辅助决策系统。
数据工作流基线 是一套完整的 AI Agent 自动化工作流方案。通过可视化的节点编排,将复杂的多步骤任务拆解为清晰的自动化流程,实现全程无人值守的智能处理。支持与数百种外部服务和 API 无缝集成,适合构建数据处理管线、业务自动化和 AI 辅助决策系统。
# 方式一:pip 安装(推荐)
pip install dataspoke-baseline
# 方式二:虚拟环境安装(推荐生产环境)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install dataspoke-baseline
# 方式三:从源码安装(获取最新功能)
git clone https://github.com/selhorys/dataspoke-baseline
cd dataspoke-baseline
pip install -e .
# 验证安装
python -c "import dataspoke_baseline; print('安装成功')"
# 命令行使用
dataspoke-baseline --help
# 基本用法
dataspoke-baseline input_file -o output_file
# Python 代码中调用
import dataspoke_baseline
# 示例
result = dataspoke_baseline.process("input")
print(result)
# dataspoke-baseline 配置文件示例(config.yml) app: name: "dataspoke-baseline" debug: false log_level: "INFO" # 运行时指定配置文件 dataspoke-baseline --config config.yml # 或通过环境变量配置 export DATASPOKE_BASELINE_API_KEY="your-key" export DATASPOKE_BASELINE_OUTPUT_DIR="./output"
Note: This project is currently under active development and has not been officially released. APIs, features, and documentation are subject to change without notice.
AI-powered sidecar extension for DataHub, built API-first.
DataSpoke is a loosely coupled sidecar to DataHub. DataHub stores metadata (the Hub); DataSpoke extends it with five baseline features (the Spokes): Ingestion Control, Validation, Ontology Generation, Metadata Generation, and Governance. Both UI and API are organised by feature — one function namespace each under /spoke/.
This repository delivers two artifacts:
spec/API.md is the canonical surface; the frontend is a thin reference UI that consumes those routes verbatim.Fork or copy this repository to create a data catalog for your organization.
uvDataSpoke ships as an umbrella Helm chart at helm-charts/dataspoke/. The production profile (values.yaml) enables the application components (frontend, API) and infrastructure (PostgreSQL with pgvector + Apache AGE, Redis, Airflow). The optional event-consumer subchart is shipped disabled — baseline UC1–UC5 are schedule-driven via Airflow rather than event-driven.
1. Build and push images: docker build -t <registry>/dataspoke/api:latest -f docker-images/api/Dockerfile . (Frontend image TBD; event-consumer is disabled by default) 2. Configure: Copy helm-charts/dataspoke/values.yaml and customize — container images, ingress hosts/TLS, DataHub connection (config.datahub.gmsUrl), and secrets (PostgreSQL, Redis, JWT, LLM API key). For production secrets management, consider External Secrets Operator. 3. Install:
helm dependency build ./helm-charts/dataspoke
helm upgrade --install dataspoke ./helm-charts/dataspoke \
--namespace dataspoke --create-namespace \
--values ./your-values.yaml
Resource sizing: Production defaults total ~5 CPU / ~10 CPU and ~9.5 Gi / ~22 Gi (requests / limits), excluding the opt-in event-consumer. See spec/feature/HELM_CHART.md for the full chart reference.
The dev profile installs infrastructure (DataHub, PostgreSQL with pgvector + Apache AGE, Redis, Airflow, self-hosted Langfuse for LLM observability, example data sources) into a Kubernetes cluster via the umbrella Helm chart plus dev peripherals. The API runs in-cluster alongside Airflow (for workflow callbacks); frontend runs on the host.
cp helm-charts/.env.example helm-charts/.env # Set your Kubernetes context
./helm-charts/bin/install.sh --profile dev # ~5-10 min first run
Using Claude Code? Run /k8s-deploy install for guided setup.
After install, verify all services are reachable:
./helm-charts/bin/health-check.sh # Verify all services respond via nginx-ingress
Services are accessed via nginx-ingress endpoints — HTTP services use virtual-host routing (http://<service>.<INGRESS_IP>.nip.io/) and TCP services use dedicated ports on the ingress IP. See helm-charts/README.md for the full endpoint table, credentials, lock service, namespace architecture, resource budgets, and troubleshooting.
./helm-charts/bin/uninstall.sh --profile dev
Fork this repository and adapt:
spec/MANIFESTO_*.md -- redefine features and product identity/spec-write -- update architecture and author feature specs/k8s-deploy install -- bring up the local environmentUse the plan -> approve -> generate -> evaluate workflow:
spec/feature/backend -> reviewer -> [fix pass if needed]workflow -> reviewer -> [fix pass if needed]test -- write and run testsfrontend -> reviewer -> [fix pass if needed]k8s-helm -- containerize and deploySee spec/AI_SCAFFOLD.md for the full scaffold reference.
高质量的开源AI工作流项目
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ Apache 2.0 — 宽松开源协议,可商用,需保留版权声明和 NOTICE 文件,含专利授权条款。
经综合评估,数据工作流基线 在Agent工作流赛道中表现稳健,质量良好。如果你已有明确的使用需求,可以直接上手体验;如果还在评估阶段,建议对比同类工具后再做决策。
| 原始名称 | dataspoke-baseline |
| 原始描述 | 开源AI工作流:A Baseline Product for an Omnipotent Data Catalog。⭐15 · Python |
| Topics | ai数据管理工作流 |
| GitHub | https://github.com/selhorys/dataspoke-baseline |
| License | Apache-2.0 |
| 语言 | Python |
收录时间:2026-05-25 · 更新时间:2026-05-30 · License:Apache-2.0 · AI Skill Hub 不对第三方内容的准确性作法律背书。
选择 Agent 类型,复制安装指令后粘贴到对应客户端