ATM-Bench 是 AI Skill Hub 本期精选Agent工作流之一。综合评分 8.0 分,整体质量较高。我们强烈推荐将其纳入你的 AI 工具库,帮助提升工作效率。
ATM-Bench 是一套完整的 AI Agent 自动化工作流方案。通过可视化的节点编排,将复杂的多步骤任务拆解为清晰的自动化流程,实现全程无人值守的智能处理。支持与数百种外部服务和 API 无缝集成,适合构建数据处理管线、业务自动化和 AI 辅助决策系统。
ATM-Bench 是一套完整的 AI Agent 自动化工作流方案。通过可视化的节点编排,将复杂的多步骤任务拆解为清晰的自动化流程,实现全程无人值守的智能处理。支持与数百种外部服务和 API 无缝集成,适合构建数据处理管线、业务自动化和 AI 辅助决策系统。
# 方式一:pip 安装(推荐)
pip install atm-bench
# 方式二:虚拟环境安装(推荐生产环境)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install atm-bench
# 方式三:从源码安装(获取最新功能)
git clone https://github.com/JingbiaoMei/ATM-Bench
cd ATM-Bench
pip install -e .
# 验证安装
python -c "import atm_bench; print('安装成功')"
# 命令行使用
atm-bench --help
# 基本用法
atm-bench input_file -o output_file
# Python 代码中调用
import atm_bench
# 示例
result = atm_bench.process("input")
print(result)
# atm-bench 配置文件示例(config.yml) app: name: "atm-bench" debug: false log_level: "INFO" # 运行时指定配置文件 atm-bench --config config.yml # 或通过环境变量配置 export ATM_BENCH_API_KEY="your-key" export ATM_BENCH_OUTPUT_DIR="./output"
Existing long-term memory benchmarks focus primarily on dialogue history, failing to capture realistic personalized references grounded in lived experience. ATM-Bench addresses this gap with:

<a id="memory-ingestion"></a>
In addition to end-to-end retrieval + generation evaluation, we provide NIAH (Needle In A Haystack):
niah_evidence_ids) that contains all ground-truth items.See: - docs/niah.md
<a id="quick-start"></a>
conda create -n atmbench python=3.11 -y
conda activate atmbench
pip install -r requirements.txt
pip install -e .
bash scripts/QA_Agent/MMRAG/run.sh
bash scripts/QA_Agent/Oracle/run_oracle_gpt5.sh ```
MMRAG, Oracle, NIAH) are tested in the main atmbench environment.A-MemHippoRAG2mem0MemoryOSMemPalaceSimpleMemMemoryOS and MemPalace are strongly recommended to run in separate conda environments. MemoryOS uses a FAISS / sentence-transformers stack, while MemPalace uses ChromaDB / ONNX-backed local embeddings; isolating them avoids dependency collisions with the core baseline environment and each other.A-Mem, HippoRAG2, and mem0 are tested to be compatible with the core baseline environment, but separate environments are still safer for reproducibility and dependency isolation.SimpleMem runs against a sibling clone of the upstream repo (LanceDB + Tantivy FTS stack); see memqa/qa_agent_baselines/SimpleMem/README.md. Pinned upstream commit: 094027eca4c890dc9912be8cee1da04428de8076 (verified by scripts/QA_Agent/SimpleMem/run.sh).third_party/:third_party/A-mem/third_party/HippoRAG/third_party/mem0/third_party/MemoryOS/MemPalace ships as a PyPI package (mempalace==3.3.5) and is installed via memqa/qa_agent_baselines/Mempalace/requirements.txt — no third_party/ vendoring.SimpleMem is not vendored under third_party/. Clone the upstream repo at the pinned commit alongside ATMBench and point SIMPLEMEM_DIR at it (defaults to ../SimpleMem): git clone https://github.com/aiming-lab/SimpleMem.git ../SimpleMem
git -C ../SimpleMem checkout 094027eca4c890dc9912be8cee1da04428de8076
pip install -r ../SimpleMem/requirements.txt
pip install -r memqa/qa_agent_baselines/SimpleMem/requirements.txt
- The General-Purpose Agent evaluation harness for all five agents (Claude Code, Codex, Pi, OpenCode, OpenClaw) ships under agent_systems/.
For detailed setup, data layout, and reproducibility settings, see: - docs/README.md - docs/data.md - docs/reproducibility.md - docs/baseline.md - docs/niah.md
<a id="repository-structure"></a>
Set via environment variables:
export OPENAI_API_KEY="your-key"
export VLLM_API_KEY="your-key"
Or use local key files (gitignored): - api_keys/.openai_key - api_keys/.vllm_key
bash scripts/QA_Agent/Oracle/run_oracle_qwen3vl8b_raw.sh
The first benchmark for multimodal, multi-source personalized referential memory QA over long time horizons (~4 years), with evidence-grounded retrieval and answering.
🚀 Quick Start • 🤖 Agent Results • 🧠 Memory Systems • 📊 Oracle / NIAH • 🏆 Live Leaderboard • 📖 Citation
</div>
<video src="https://atmbench.github.io/static/videos/ATM-Bench-demo.mp4" controls width="100%"></video>
📄 Paper: According to Me: Long-Term Personalized Referential Memory QA 🌐 Project Page: https://atmbench.github.io/ 🏆 Live Leaderboard: https://atmbench.github.io/leaderboard.html
高质量的AI工作流基准测试
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ MIT 协议 — 最宽松的开源协议之一,可自由商用、修改、分发,仅需保留版权声明。
经综合评估,ATM-Bench 在Agent工作流赛道中表现稳健,质量优秀。如果你已有明确的使用需求,可以直接上手体验;如果还在评估阶段,建议对比同类工具后再做决策。
| 原始名称 | ATM-Bench |
| Topics | ai-agentbenchmarklong-term-memory |
| GitHub | https://github.com/JingbiaoMei/ATM-Bench |
| License | MIT |
| 语言 | Python |
收录时间:2026-06-01 · 更新时间:2026-06-01 · License:MIT · AI Skill Hub 不对第三方内容的准确性作法律背书。
选择 Agent 类型,复制安装指令后粘贴到对应客户端