📄 工具详情 ⚙️ 安装教程 📚 使用教程

能力标签

🤖 Agent 👁 OCR 💻 CLI 🔗 REST API 🧬 Embedding 🧠 Claude ✨ GPT

⚙️

Agent工作流

ATM-Bench

基于 Python · 无代码搭建完整 AI 自动化流程

⭐ 44 Stars 🍴 2 Forks 💻 Python 📄 MIT 🏷 AI 8.0分

8.0AI 综合评分

ai-agentbenchmarklong-term-memory

⬇ 下载源码 ZIP ⚙️ 配置说明

✦ AI Skill Hub 推荐

ATM-Bench 是 AI Skill Hub 本期精选Agent工作流之一。综合评分 8.0 分，整体质量较高。我们强烈推荐将其纳入你的 AI 工具库，帮助提升工作效率。

📚 深度解析

ATM-Bench 是一套完整的 AI Agent 自动化工作流方案。随着 AI 能力的不断提升，基于 Agent 的自动化工作流正在成为提升个人和团队效率的核心方式。区别于传统的 RPA 自动化（模拟鼠标键盘操作），AI Agent 工作流通过理解任务意图、动态规划执行路径，能够处理更复杂的非结构化任务。

ATM-Bench 工作流的设计遵循"最小配置，最大复用"原则：核心逻辑已经封装好，用户只需配置自己的 API Key 和业务参数即可快速上手。工作流内置错误处理和重试机制，在网络波动或 API 限速等情况下仍能稳定运行，适合作为生产环境的自动化基础设施。

在实际部署时，建议先在测试环境中运行 3-5 次，验证各个环节的输出结果符合预期，再部署到生产环境。AI Skill Hub 评分 8.0 分，是同类 Agent 工作流中的精选推荐。

📋 工具概览

ATM-Bench 是一套完整的 AI Agent 自动化工作流方案。通过可视化的节点编排，将复杂的多步骤任务拆解为清晰的自动化流程，实现全程无人值守的智能处理。支持与数百种外部服务和 API 无缝集成，适合构建数据处理管线、业务自动化和 AI 辅助决策系统。

GitHub Stars

⭐ 44

开发语言

Python

支持平台

Windows / macOS / Linux

维护状态

轻量级项目，按需更新

开源协议

MIT

AI 综合评分

8.0 分

工具类型

Agent工作流

Forks

📖 中文文档

以下内容由 AI Skill Hub 根据项目信息自动整理，如需查看完整原始文档请访问底部「原始来源」。

📌 核心特色

可视化 Agent 工作流编排，无需编写复杂代码
支持多步骤自动化任务链，实现全流程无人值守
与外部 API、数据库和第三方服务无缝集成
内置错误处理与自动重试机制，保障稳定运行
提供可复用的自动化模板，快速在同类场景部署

🎯 主要使用场景

自动化日常重复性工作，将精力集中于创造性任务
构建数据采集 → 处理 → 输出的完整自动化管线
实现跨平台、跨系统的数据流转和业务协同

以下安装命令基于项目开发语言和类型自动生成，实际以官方 README 为准。

安装命令

# 方式一：pip 安装（推荐）
pip install atm-bench

# 方式二：虚拟环境安装（推荐生产环境）
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install atm-bench

# 方式三：从源码安装（获取最新功能）
git clone https://github.com/JingbiaoMei/ATM-Bench
cd ATM-Bench
pip install -e .

# 验证安装
python -c "import atm_bench; print('安装成功')"

📋 安装步骤说明

访问 GitHub 仓库获取工作流文件
在对应平台（Dify / Flowise / Make 等）中找到「导入工作流」功能
上传工作流文件
按照提示配置必要的环境变量和 API Key
运行测试确认流程正常后投入使用

以下用法示例由 AI Skill Hub 整理，涵盖最常见的使用场景。

常用命令 / 代码示例

# 命令行使用
atm-bench --help

# 基本用法
atm-bench input_file -o output_file

# Python 代码中调用
import atm_bench

# 示例
result = atm_bench.process("input")
print(result)

以下配置示例基于典型使用场景生成，具体参数请参照官方文档调整。

配置示例

# atm-bench 配置文件示例（config.yml）
app:
  name: "atm-bench"
  debug: false
  log_level: "INFO"

# 运行时指定配置文件
atm-bench --config config.yml

# 或通过环境变量配置
export ATM_BENCH_API_KEY="your-key"
export ATM_BENCH_OUTPUT_DIR="./output"

📑 README 深度解析真实文档完整度 64/100 查看 GitHub 原文 →

以下内容由系统直接从 GitHub README 解析整理，保留代码块、表格与列表结构。

简介

📋 Overview

Existing long-term memory benchmarks focus primarily on dialogue history, failing to capture realistic personalized references grounded in lived experience. ATM-Bench addresses this gap with:

🖼️ Multimodal and multi-source data: Images, videos, emails
📅 Long-term horizon: ~4 years of personal memory
🎯 Referential queries: Resolving personalized references (e.g., "Show me the moments where Grace was trying to be sneaky...")
🔍 Evidence-grounded: Human-annotated QA pairs with ground-truth memory evidence
🧩 Multi-evidence reasoning: Queries requiring evidence from multiple sources
⚡ Conflicting evidence: Handling contradictory information

ATM-Bench Overview

NIAH Evaluation Setup

In addition to end-to-end retrieval + generation evaluation, we provide NIAH (Needle In A Haystack):

Each question is paired with a fixed evidence pool (niah_evidence_ids) that contains all ground-truth items.
The rest of the pool is filled with realistic distractors.
This isolates answer generation/reasoning quality from retrieval quality.

See: - docs/niah.md

Installation

conda create -n atmbench python=3.11 -y
conda activate atmbench
pip install -r requirements.txt
pip install -e .

🚀 Quick Start

Optional but recommended: preload reverse-geocoding cache

ANSWERER_MODEL env vars).

bash scripts/QA_Agent/MMRAG/run.sh

+ OPENAI_API_KEY set in the environment or api_keys/.openai_key.

bash scripts/QA_Agent/Oracle/run_oracle_gpt5.sh ```

Baseline Compatibility and Environments

Core baselines (MMRAG, Oracle, NIAH) are tested in the main atmbench environment.
Third-party memory-system baselines in this repo include:
A-Mem
HippoRAG2
mem0
MemoryOS
MemPalace
SimpleMem
MemoryOS and MemPalace are strongly recommended to run in separate conda environments. MemoryOS uses a FAISS / sentence-transformers stack, while MemPalace uses ChromaDB / ONNX-backed local embeddings; isolating them avoids dependency collisions with the core baseline environment and each other.
A-Mem, HippoRAG2, and mem0 are tested to be compatible with the core baseline environment, but separate environments are still safer for reproducibility and dependency isolation.
SimpleMem runs against a sibling clone of the upstream repo (LanceDB + Tantivy FTS stack); see memqa/qa_agent_baselines/SimpleMem/README.md. Pinned upstream commit: 094027eca4c890dc9912be8cee1da04428de8076 (verified by scripts/QA_Agent/SimpleMem/run.sh).
Setup references for the vendored baselines are under third_party/:
third_party/A-mem/
third_party/HippoRAG/
third_party/mem0/
third_party/MemoryOS/
MemPalace ships as a PyPI package (mempalace==3.3.5) and is installed via memqa/qa_agent_baselines/Mempalace/requirements.txt — no third_party/ vendoring.
SimpleMem is not vendored under third_party/. Clone the upstream repo at the pinned commit alongside ATMBench and point SIMPLEMEM_DIR at it (defaults to ../SimpleMem):

  git clone https://github.com/aiming-lab/SimpleMem.git ../SimpleMem
  git -C ../SimpleMem checkout 094027eca4c890dc9912be8cee1da04428de8076
  pip install -r ../SimpleMem/requirements.txt
  pip install -r memqa/qa_agent_baselines/SimpleMem/requirements.txt

- The General-Purpose Agent evaluation harness for all five agents (Claude Code, Codex, Pi, OpenCode, OpenClaw) ships under agent_systems/.

For detailed setup, data layout, and reproducibility settings, see: - docs/README.md - docs/data.md - docs/reproducibility.md - docs/baseline.md - docs/niah.md

API Keys

Set via environment variables:

export OPENAI_API_KEY="your-key"
export VLLM_API_KEY="your-key"

Or use local key files (gitignored): - api_keys/.openai_key - api_keys/.vllm_key

+ a running vLLM endpoint at http://127.0.0.1:8000/v1/chat/completions

serving Qwen/Qwen3-VL-8B-Instruct-FP8 (override with VLLM_ENDPOINT /

+ a running vLLM endpoint serving Qwen/Qwen3-VL-8B-Instruct-FP8.

bash scripts/QA_Agent/Oracle/run_oracle_qwen3vl8b_raw.sh

ATM-Bench: Long-Term Personalized Referential Memory QA

The first benchmark for multimodal, multi-source personalized referential memory QA over long time horizons (~4 years), with evidence-grounded retrieval and answering.

🇬🇧 English • 🇨🇳 中文

🚀 Quick Start • 🤖 Agent Results • 🧠 Memory Systems • 📊 Oracle / NIAH • 🏆 Live Leaderboard • 📖 Citation

</div>

📄 Paper: According to Me: Long-Term Personalized Referential Memory QA 🌐 Project Page: https://atmbench.github.io/ 🏆 Live Leaderboard: https://atmbench.github.io/leaderboard.html

🎯 aiskill88 AI 点评 A 级 2026-06-01

高质量的AI工作流基准测试