经 AI Skill Hub 精选评估,记忆智能体基准评测 获评「强烈推荐」。这款Agent工作流在功能完整性、社区活跃度和易用性方面表现出色,AI 评分 8.0 分,适合有一定技术背景的用户使用。
ICLR 2026论文配套开源项目,专门用于评估大语言模型智能体的记忆能力。提供完整的增量学习评测框架和基准数据集,适合AI研究者、工程师优化智能体系统的记忆机制。
记忆智能体基准评测 是一套完整的 AI Agent 自动化工作流方案。通过可视化的节点编排,将复杂的多步骤任务拆解为清晰的自动化流程,实现全程无人值守的智能处理。支持与数百种外部服务和 API 无缝集成,适合构建数据处理管线、业务自动化和 AI 辅助决策系统。
ICLR 2026论文配套开源项目,专门用于评估大语言模型智能体的记忆能力。提供完整的增量学习评测框架和基准数据集,适合AI研究者、工程师优化智能体系统的记忆机制。
记忆智能体基准评测 是一套完整的 AI Agent 自动化工作流方案。通过可视化的节点编排,将复杂的多步骤任务拆解为清晰的自动化流程,实现全程无人值守的智能处理。支持与数百种外部服务和 API 无缝集成,适合构建数据处理管线、业务自动化和 AI 辅助决策系统。
# 方式一:pip 安装(推荐)
pip install memoryagentbench
# 方式二:虚拟环境安装(推荐生产环境)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install memoryagentbench
# 方式三:从源码安装(获取最新功能)
git clone https://github.com/HUST-AI-HYZ/MemoryAgentBench
cd MemoryAgentBench
pip install -e .
# 验证安装
python -c "import memoryagentbench; print('安装成功')"
# 命令行使用
memoryagentbench --help
# 基本用法
memoryagentbench input_file -o output_file
# Python 代码中调用
import memoryagentbench
# 示例
result = memoryagentbench.process("input")
print(result)
# memoryagentbench 配置文件示例(config.yml) app: name: "memoryagentbench" debug: false log_level: "INFO" # 运行时指定配置文件 memoryagentbench --config config.yml # 或通过环境变量配置 export MEMORYAGENTBENCH_API_KEY="your-key" export MEMORYAGENTBENCH_OUTPUT_DIR="./output"
Yuanzhe Hu, Yu Wang, Julian McAuley.
This project benchmarks agents with memory capabilities. Follow the steps below to set up your environment and install dependencies.
Four Core Competencies for Evaluation: * Accurate Retrieval (AR)

We collected and reformulated data from previous benchmarks and datasets. All data is split into chunks to simulate real multi-turn interaction scenarios—just like your daily conversations with an AI assistant. We also newly constructed two datasets EventQA and FactConsolidation.
Notably, the team adopted a "inject once, query multiple times" design philosophy—one long text corresponds to multiple questions, significantly improving evaluation efficiency.
pip install torch
pip install -r requirements.txt
pip install "numpy<2" We did not include the hipporag in requirements.txt since the current version of hipporag will cause some conflicts on pacakge version. You can create another environment with hipporag instead.
Sometimes you can try to supplement the lacked packages for cognee and letta. If you met some package related errors after installing requirements.txt.
pip install letta
pip uninstall letta
pip install cognee
pip uninstall cognee
You can run an evaluation using the following example command:
#### Long Context Agents
bash bash_files/eniac/run_memagent_longcontext.sh - --agent_config: Path to the agent/model configuration file. - --dataset_config: Path to the dataset configuration file.
bash bash_files/eniac/run_memagent_rag_agents.sh #### Ablation Study for Chunk Size bash bash_files/eniac/run_memagent_rag_agents_chunksize.sh
Remember that hipporag (2.0.0a3) reuqires openai==1.58.1, which may cause some latest OpenAI models could not be used in same environment.
It’s recommended to use a dedicated conda environment for reproducibility:
conda create --name MABench python=3.10.16
To use this project, you need to download the processed data files and place them in the correct directory.
To run this project, you need to configure your API keys and model settings in a .env file at the project root.
Create a .env file and add the following content, replacing the placeholder values with your actual API keys:
OPENAI_API_KEY= ###your_openai_api_key
#### Settings for Cognee
LLM_MODEL=gpt-4o-mini
LLM_API_KEY= ###your_api_key
#### Other API Keys
Anthropic_API_KEY= ###your_anthropic_api
Google_API_KEY= ###your_google_api
学术基准工具���为记忆智能体研究提供标准评测框架。代码质量有保障,适合学术和工程应用,具有重要参考价值。
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ MIT 协议 — 最宽松的开源协议之一,可自由商用、修改、分发,仅需保留版权声明。
AI Skill Hub 点评:记忆智能体基准评测 的核心功能完整,质量优秀。对于自动化工程师和运维人员来说,这是一个值得纳入个人工具库的选择。建议先在非生产环境试用,再逐步推广。
| 原始名称 | MemoryAgentBench |
| 原始描述 | 开源AI工作流:Open source code for ICLR 2026 Paper: Evaluating Memory in LLM Agents via Increm。⭐337 · Python |
| Topics | 智能体大语言模型记忆评测基准数据集学术论文 |
| GitHub | https://github.com/HUST-AI-HYZ/MemoryAgentBench |
| License | MIT |
| 语言 | Python |
收录时间:2026-05-21 · 更新时间:2026-05-30 · License:MIT · AI Skill Hub 不对第三方内容的准确性作法律背书。
选择 Agent 类型,复制安装指令后粘贴到对应客户端