AI Skill Hub 强烈推荐:τ-Bench工作流基准测试 是一款优质的Agent工作流。已获得 1.2k 颗 GitHub Star,AI 综合评分 8.2 分,在同类工具中表现稳健。如果你正在寻找可靠的Agent工作流解决方案,这是一个值得深入了解的选择。
τ-Bench工作流基准测试 是一套完整的 AI Agent 自动化工作流方案。通过可视化的节点编排,将复杂的多步骤任务拆解为清晰的自动化流程,实现全程无人值守的智能处理。支持与数百种外部服务和 API 无缝集成,适合构建数据处理管线、业务自动化和 AI 辅助决策系统。
τ-Bench工作流基准测试 是一套完整的 AI Agent 自动化工作流方案。通过可视化的节点编排,将复杂的多步骤任务拆解为清晰的自动化流程,实现全程无人值守的智能处理。支持与数百种外部服务和 API 无缝集成,适合构建数据处理管线、业务自动化和 AI 辅助决策系统。
# 方式一:pip 安装(推荐)
pip install tau2-bench
# 方式二:虚拟环境安装(推荐生产环境)
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install tau2-bench
# 方式三:从源码安装(获取最新功能)
git clone https://github.com/sierra-research/tau2-bench
cd tau2-bench
pip install -e .
# 验证安装
python -c "import tau2_bench; print('安装成功')"
# 命令行使用
tau2-bench --help
# 基本用法
tau2-bench input_file -o output_file
# Python 代码中调用
import tau2_bench
# 示例
result = tau2_bench.process("input")
print(result)
# tau2-bench 配置文件示例(config.yml) app: name: "tau2-bench" debug: false log_level: "INFO" # 运行时指定配置文件 tau2-bench --config config.yml # 或通过环境变量配置 export TAU2_BENCH_API_KEY="your-key" export TAU2_BENCH_OUTPUT_DIR="./output"
From text-only to multimodal, knowledge-aware agent evaluation.
Voice full-duplex · Knowledge retrieval · 75+ task fixes
τ-Voice paper · τ-Knowledge paper · Task fixes paper · Release notes
How do you say $\tau^3$-bench? We just say "tau three," but you do you!
$\tau$-bench is a simulation framework for evaluating customer service agents across multiple domains. It supports text-based half-duplex (turn-based) evaluation and voice full-duplex (simultaneous) evaluation using real-time audio APIs.
Each domain specifies: - A policy that the agent must follow - A set of tools that the agent can use - A set of tasks to evaluate the agent's performance - Optionally: a set of user tools for the user simulator
Available domains: mock · airline · retail · telecom · banking_knowledge
| Mode | Description |
|---|---|
| **Text (half-duplex)** | Turn-based chat with tool use |
| **Voice (full-duplex)** | End-to-end audio via realtime providers (OpenAI, Gemini, xAI) |
banking_knowledge) — A knowledge-retrieval-based customer service domain with configurable RAG pipelines, document search, embeddings, and agentic shell-based search. Learn more →See CHANGELOG.md for the full version history.
Backward compatibility note: If you are evaluating an agent (not training), use the base task split to evaluate on the complete task set that matches the original τ-bench structure. This is the default.
Upgrading from $\tau^2$-bench? Installation now usesuvinstead ofpip install -e ., and Python>=3.12, <3.14is required (was>=3.10). Some internal APIs have been refactored — see CHANGELOG.md for details.
git clone https://github.com/sierra-research/tau2-bench
cd tau2-bench
uv sync # core only (text-mode: airline, retail, telecom, mock)
Optional extras (install what you need):
uv sync --extra voice # + voice/audio-native features
uv sync --extra knowledge # + banking_knowledge domain (retrieval pipeline)
uv sync --extra gym # + gymnasium RL interface
uv sync --extra dev # + pytest, ruff, pre-commit (required for contributing)
uv sync --all-extras # everything
This requires uv. Voice features also need system dependencies (brew install portaudio ffmpeg on macOS). See the full installation guide for details.
| Document | Description |
|---|---|
| [Getting Started](docs/getting-started.md) | Installation, API keys, first run, output structure, configuration |
| [CLI Reference](docs/cli-reference.md) | All tau2 commands and options |
```
```bash cp .env.example .env
专业基准测试工具,填补AI工作流评估空白。实验设计严谨,指标体系完整,是智能体系统优化必备参考
AI Skill Hub 为第三方内容聚合平台,本页面信息基于公开数据整理,不对工具功能和质量作任何法律背书。
建议在沙箱或测试环境中充分验证后,再部署至生产环境,并做好必要的安全评估。
✅ MIT 协议 — 最宽松的开源协议之一,可自由商用、修改、分发,仅需保留版权声明。
总体来看,τ-Bench工作流基准测试 是一款质量优秀的Agent工作流,在同类工具中具备一定竞争力。AI Skill Hub 将持续追踪其更新动态,建议收藏备用,结合自身场景选择合适时机引入使用。
| 原始名称 | tau2-bench |
| 原始描述 | 开源AI工作流:τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains。⭐1.2k · Python |
| Topics | 基准测试智能体评估工作流工具调用LLM评估 |
| GitHub | https://github.com/sierra-research/tau2-bench |
| License | MIT |
| 语言 | Python |
收录时间:2026-05-20 · 更新时间:2026-05-30 · License:MIT · AI Skill Hub 不对第三方内容的准确性作法律背书。
选择 Agent 类型,复制安装指令后粘贴到对应客户端