AI Agent Engineering Delivery Framework AI Agent 工程化交付框架
Roll out features with AI agents — move fast, no sprints. 用 AI Agent 把功能滚出来 — 快速交付,无须排迭代。
@seanyao/roll · 2026 @seanyao/roll · 2026
AI coding tools have evolved from "auto-completing a few lines" to "delivering whole features." The developer's role is shifting from writing code to directing AI. AI 编程工具已经从「自动补几行」演化到「交付整个功能」。开发者的角色正在从写代码转向指挥 AI。
Developers write every line themselves. Quality depends on individual skill. Companies constrain people through architecture standards, code reviews, QA processes. 每一行都是人写的。质量取决于个人水平。公司靠架构规范、代码评审、QA 流程来约束人。
AI handles the actual coding. With the same tool and the same model, different people get dramatically different results. AI 负责实际编码。同一工具、同一模型,不同的人拿到天差地别的结果。
The problem isn't "which AI tool to use" — just pick one and standardize. The real question is: why does the same tool produce great results for some and garbage for others? 问题不在「该用哪款 AI 工具」——选一个并统一即可。真正的问题是:同一个工具,为什么有人产出精品,有人产出垃圾?
We used to have an entire system to constrain people — architecture, processes, reviews. AI is the primary executor now, but the constraint system for AI is almost nonexistent. 过去整套体系都在约束人——架构、流程、评审。现在 AI 是主要执行者,但面向 AI 的约束体系几乎是空白。
AI doesn't know your architecture conventions, module boundaries, or forbidden zones. Different prompts = different outputs. AI 不了解你的架构规范、模块边界、禁区。不同 prompt = 不同输出。
Constraints must be built into the project so AI follows them automatically — not documents people read once. 约束必须内建到项目里,让 AI 自动遵守——而不是人读过一遍就遗忘的文档。
ROLL solves a people problem. By baking constraints and methodology into the project, output quality is consistent regardless of who's driving. ROLL 本质上解决的是「人」的问题。把约束和方法论嵌入项目,无论谁来开车,产出质量都保持一致。
An autonomous delivery system for software teams. AI agents pick stories from your BACKLOG, execute them with encoded engineering discipline, and ship continuously while you focus on what to build next. 一套面向研发团队的自主交付系统。AI Agent 从 BACKLOG 拣选 Story,按内建的工程纪律执行,持续交付——让你专注在下一个该做什么上。
roll loop on runs BACKLOG items hourly. Dream scans code health nightly. Humans retain sole release authority.
roll loop on 每小时执行 BACKLOG。Dream 夜间扫描代码健康。发布权始终在人手里。
23 skills encode TDD, TCR, DDD, INVEST as repeatable workflows any agent can follow. Works with Claude, Cursor, Codex, Gemini — swap the tool, keep the discipline. 23 个 Skill 把 TDD、TCR、DDD、INVEST 编码为任何 Agent 都能执行的可重复工作流。Claude、Cursor、Codex、Gemini 通用——换工具,留纪律。
ROLL turns a BACKLOG into shipped code continuously. Engineering practices are encoded as executable skills — reliable enough for an agent to run unattended, disciplined enough to ship production code. ROLL 持续把 BACKLOG 变成上线代码。工程实践被编码为可执行 Skill——可靠到 Agent 无人值守也能跑,严谨到能上生产。
From raw idea to production — five stages, three loops, one continuous flow. 从一个想法到生产环境——五个阶段、三个 Loop、一条持续流动的主线。
Raw thought
Anyone can submit原始想法
任何人可提
AC ready
Feature doc done验收标准就绪
Feature 文档完成
TCR micro-steps
Spar · Review · CITCR 微步
Spar · Review · CI
Deploy to Test/UAT
Live evidence部署到测试/UAT
采集活证据
Deploy to Prod
Sentinel takes over部署到生产
Sentinel 接管
Idea → Backlog想法 → 待办
Backlog → Verify待办 → 验证
Verify → Release → Patrol验证 → 发布 → 巡逻
↩ Loop C finds an issue → auto-creates new Idea → back to Pipeline ↩ Loop C 发现问题 → 自动创建新 Idea → 重新进入流水线
The two ends need human judgment. The middle runs on autopilot. 两端需要人的判断,中间自动运行。
"Help me research this, break it into Stories"「帮我调研这个,拆成 Story」
Idea → Backlog
ROLL Loop auto-delivers. No babysitting needed.ROLL Loop 自动交付,无须看守。
Backlog → Build → Verify
"UAT passed. Ship it?"「UAT 已通过,发布吗?」
Verify → Release
The more automated the middle is, the more humans can focus on the two ends — deciding what to build and when to ship. These are judgment calls AI shouldn't make alone. 中间越自动,人就越能聚焦两端——决定做什么、什么时候上线。这两个判断 AI 不应该单独做。
From vague idea to executable Backlog — powered by DDD and structured design. 从模糊的想法到可执行的 Backlog——靠 DDD 和结构化设计驱动。
Establish Bounded Contexts, Ubiquitous Language, Context Maps. Ensure engineering speaks the same language as business.建立限界上下文、统一语言、上下文映射。让工程和业务说同一种话。
HV Analysis: vertical traces full lifecycle, horizontal compares competitors. Cross-axis produces insights. Output: PDF report.HV 分析:纵轴梳理全生命周期,横轴对比竞品,交叉产出洞察。输出 PDF 报告。
Solution exploration with DDD modeling, architecture decisions, interface definitions, data models. Explores multiple options before committing.用 DDD 建模、架构决策、接口定义、数据模型来做方案探索。落定前先比较多个方案。
Break into INVEST-compliant User Stories with acceptance criteria. Write to BACKLOG + features/. Each Story independently deliverable.拆成符合 INVEST 的 User Story,每个带验收标准。写入 BACKLOG + features/。每个 Story 都能独立交付。
Everything flows through BACKLOG.md — four work item types, each with different Loop A depth. 一切流经 BACKLOG.md——四种工作项类型,对应不同深度的 Loop A。
Business features. Full Loop A: DDD → Research → Design → AC. The heaviest investment.业务功能。完整 Loop A:DDD → 调研 → 设计 → AC。前期投入最重。
ID: US-XXX
Tech debt, architecture cleanup. Often surfaced by $roll-.dream nightly scans. Medium Loop A.技术债、架构改良。常由 $roll-.dream 夜间扫描产出。中等 Loop A。
ID: REFACTOR-XXX
Bug fixes, Sentinel alerts, user reports. Light Loop A: locate root cause → AC. Fast path.Bug 修复、Sentinel 告警、用户反馈。轻量 Loop A:定位根因 → AC。快速通道。
ID: FIX-XXX
Exploratory research. Loop A only — output is knowledge, not code. May spawn Stories or Refactors.探索性研究。只走 Loop A——产出的是知识不是代码。可能催生新的 Story 或 Refactor。
ID: SPIKE-XXX
FIX (bugs first) > US (user value) > REFACTOR (tech debt). Automated by $roll-loop — the autonomous executor scans BACKLOG hourly and routes each item to the right skill. FIX(先修 bug)> US(用户价值)> REFACTOR(技术债)。由 $roll-loop 自动执行——每小时扫描 BACKLOG,路由到对应的 Skill。
Test-Driven Development writes the standard first. TCR (Test && Commit || Revert) enforces it mechanically. 测试驱动开发先写下标准。TCR(测过则提交,未过则回滚)把标准机械化地执行。
The full delivery pipeline — from Backlog item to verified deployment. 完整的交付流水线——从 Backlog 工作项到已验证的部署。
Decompose into minimal deliverable Actions. Independent Actions can run in parallel.拆成最小可交付的 Action。独立 Action 可以并行。
RED test → GREEN code → self-review ($roll-.review) → auto-commit. Fail = auto-revert.写测试(RED)→ 写代码(GREEN)→ 自查($roll-.review)→ 自动提交。失败 = 自动回滚。
Lint + type check + full test suite + build. All must pass before push.Lint + 类型检查 + 全量测试 + 构建。全过才能 push。
CI re-verifies in a clean environment — the final ruling on "shippable."CI 在干净环境再验证一次——是否「可发布」的最终裁决。
Screenshots, curl responses, test outputs required. "I checked, it works" doesn't count.必须采集截图、curl 响应、测试输出。AI 说「我检查过了」不算数。
Human approves. BACKLOG status: ✅ Done. Sentinel takes over monitoring.人审批通过。BACKLOG 状态变 ✅ Done。Sentinel 接手监控。
Every layer is automated. None depend on human patience or memory. 每一层都是自动化的。不依赖任何人的耐心或记性。
Verify every 2-5 minutes. Define the standard before writing — auto-revert if it fails. Bugs eliminated the instant they appear. 每 2-5 分钟验证一次。先定标准再写代码——不过就回滚。Bug 一冒头就被消灭。
$roll-build
For critical modules (payments, auth). One AI attacks, another defends. Up to 5 rounds of escalating intensity. 面向关键模块(支付、认证)。一边 AI 攻击,另一边防御。最多 5 轮逐级升级。
$roll-spar
Self — per-commit 6-dim check.
Peer — cross-agent negotiation.
Dream — nightly code health scan.
Self——每次提交 6 维自查。
Peer——跨 Agent 协商。
Dream——夜间代码健康扫描。
$roll-.review · $roll-peer · $roll-.dream
24/7 random-sample monitoring. Alerts only after 3 consecutive failures. Auto-creates Fix tasks. 7×24 抽样巡检。连续 3 次失败才告警(防误报)。自动创建 Fix 任务。
$roll-sentinel
Two complementary monitors: Sentinel watches runtime, Dream watches code structure. 两位互补的守望者:Sentinel 盯运行时,Dream 盯代码结构。
Random-sample monitoring of production. Cost-controlled AI validation with intelligent spot-checking.对生产做抽样巡检。成本可控的 AI 验证 + 智能点检逻辑。
Patrol Modes:巡检模式:
Light: 5/day · Intensive: 20/hr (post-release) · Full sweep: weekly轻量:每天 5 次 · 密集:发布后每小时 20 次 · 全量扫描:每周一次
Output:输出: FIX-XXX entries in BACKLOG在 BACKLOG 中生成 FIX-XXX 条目
Runs at 3am. Six dimensions of code health:凌晨 3 点运行。代码健康的六个维度:
1. Dead Code · 2. Architectural Drift · 3. Pruning Candidates · 4. Emerging Patterns · 5. Doc Coverage · 6. Doc Freshness1. 死代码 · 2. 架构漂移 · 3. 可裁剪项 · 4. 涌现模式 · 5. 文档覆盖 · 6. 文档新鲜度
Output:输出: REFACTOR-XXX entries in BACKLOG在 BACKLOG 中生成 REFACTOR-XXX 条目
Sentinel monitors behavior. Dream monitors structure. Together they detect both runtime degradation and code-quality decay — before users notice. Sentinel 盯行为,Dream 盯结构。两者合力,在用户发现之前就检出运行时退化和代码质量衰减。
ROLL operates at three levels of autonomy, each with clear boundaries. ROLL 在三个自治层级运转,每层边界清晰。
Set goals, review proposals, approve releases. The judgment calls.定目标、审方案、批发布。判断类决策。
Hourly BACKLOG scan. Auto-routes each item to the right skill. FIX > US > REFACTOR.每小时扫描 BACKLOG,自动路由到对应 Skill。FIX > US > REFACTOR。
3am nightly code health scan. 6 dimensions. Generates REFACTOR entries autonomously.凌晨 3 点代码健康扫描,6 个维度,自动产出 REFACTOR 条目。
Cross-agent negotiation on high-risk decisions. Up to 3 rounds. No consensus → escalate to human.高风险决策由跨 Agent 协商,最多 3 轮。达不成共识 → 升级给人。
Humans set direction and approve releases. Everything else — building, reviewing, monitoring, refactoring — can run autonomously. The system never ships to production without human approval. 人设方向、批发布。其他一切——构建、评审、监控、重构——都可以自主运行。没有人的审批,系统不会上生产。
23 skills spanning design, build, check, autonomous, and support — each maps to a specific phase. 23 个 Skill,覆盖设计、构建、校验、自治、支持——每个对应一个具体阶段。
| SkillSkill | Tier分类 | What It Does做什么 |
|---|---|---|
| $roll-research | Research调研 | HV analysis — timeline + competitive landscape → PDF reportHV 分析——时间线 + 竞品 → PDF 报告 |
| $roll-design | Design设计 | DDD modeling, solution design, INVEST story breakdownDDD 建模、方案设计、INVEST Story 拆分 |
| $roll-idea | Capture捕获 | Fast backlog capture — one-liner in, classified entry out快速收录——一句话进,分类条目出 |
| $roll-propose | Propose提案 | Generate 1-3 structured US drafts → proposals.md for human review生成 1-3 个结构化 US 草稿 → proposals.md 待人审 |
| $roll-onboard | Onboard接入 | Interactive onboarding for legacy projects — 9 Qs → onboard-plan.yaml老项目交互式接入——9 问 → onboard-plan.yaml |
| $roll-build | Build构建 | Universal entry: US/FIX/plain text → TCR delivery通用入口:US / FIX / 自由文本 → TCR 交付 |
| $roll-spar | Adversarial对抗 | Red-blue drill: Attacker writes exploits, Defender patches红蓝对抗:攻方写漏洞测试,守方打补丁 |
| $roll-fix | Fix修复 | Single-bug fix + mandatory regression test单个 bug 修复 + 必备回归测试 |
| $roll-debug | Diagnose诊断 | Black Box probe: Console/Network/DOM/Perf → root cause黑盒探针:Console/Network/DOM/Perf → 根因 |
| $roll-sentinel | Patrol巡逻 | Production random-sample monitoring, 3-strike alerting生产抽样监控,连续三次失败才告警 |
| $roll-review-pr | PR ReviewPR 评审 | Agent-agnostic PR review with 3-state verdict跨 Agent 的 PR 评审 · 三态结论 |
| $roll-doc | Document文档 | Auto-scan, index, gap analysis, fill for project docs自动扫描、建索引、找缺口、补写项目文档 |
| $roll-notes | Journal日志 | Project diary — records dev moments chronologically项目日记——按时间记录开发瞬间 |
| $roll-doctor | Maintain体检 | ROLL self-health check (skills/symlinks/config/templates)ROLL 自身健康检查(Skill/链接/配置/模板) |
| $roll-loop | Auto自治 | Hourly BACKLOG executor — routes items to skills每小时 BACKLOG 执行器——路由到对应 Skill |
| $roll-peer | Auto自治 | Cross-agent peer review, up to 3 negotiation rounds跨 Agent 同行评审,最多 3 轮协商 |
| $roll-brief | Auto自治 | Owner-facing briefing: done, in-progress, queue, escalations面向负责人的简报:已完成、进行中、队列、待升级 |
| $roll-.dream | Auto自治 | Nightly 6-dimension code health scan → REFACTOR entries夜间 6 维代码健康扫描 → REFACTOR 条目 |
| $roll-.review | Hidden隐式 | Per-commit self-review: correctness, security, maintainability每次提交自查:正确性、安全、可维护性 |
| $roll-.changelog | Hidden隐式 | Auto-generates CHANGELOG.md from completed stories从已完成 Story 自动生成 CHANGELOG.md |
| $roll-.qa | Hidden隐式 | Test pyramid standards: unit/E2E/visual/smoke + CI gates测试金字塔标准:单元 / E2E / 视觉 / 冒烟 + CI 闸门 |
| $roll-.echo | Hidden隐式 | Passive intent clarification for vague inputs面向模糊输入的被动意图澄清 |
| $roll-.clarify | Hidden隐式 | Scope clarification for under-specified Fly-mode inputsFly 模式下的范围澄清 |
Quality assurance isn't removed — the implementation is upgraded. 质量保障没有被去掉——是实现方式被升级了。
roll init — one command, a few seconds.
roll init — 一条命令,几秒钟。
roll setup syncs conventions and skills to every AI tool simultaneously.
roll setup 把规范与 Skill 同步到所有 AI 工具。
Never overwrites existing configs. Writes its own file and appends via @include.不覆盖已有配置。写自己的文件,通过 @include 引入。
Update ROLL, re-run setup — every AI tool upgrades in seconds.升级 ROLL 后重跑 setup,所有 AI 工具几秒内同步。
roll setup · roll init · roll update
Example: shipping a "User Login" feature across all three loops. 示例:把「用户登录」功能跑完三个 Loop。
PM submits: "We need user login." $roll-design runs DDD modeling, decomposes into 3 Stories (password, OAuth, remember-me), writes AC.PM 提需求:「要做用户登录」。$roll-design 做 DDD 建模,拆成 3 个 Story(密码、OAuth、记住我),写好 AC。
$roll-build starts TCR delivery. Verify + commit every 3 minutes. 12 tests passing in 30 min. 8 micro-commits.$roll-build 启动 TCR 交付。每 3 分钟一次验证 + 提交。30 分钟跑过 12 个测试,8 次微提交。
Auth module flagged as high-risk. Attacker tries SQL injection, brute force, session hijacking. Defender patches. 5 rounds, coverage: 71% → 93%.认证模块被标记为高风险。攻方尝试 SQL 注入、暴力破解、会话劫持。守方打补丁。5 轮后覆盖率 71% → 93%。
Cross-agent negotiation flags a session management concern. 2 rounds of discussion. Consensus reached, implementation adjusted.跨 Agent 协商发现一个会话管理问题。讨论 2 轮达成共识,调整实现。
Screenshots + curl responses captured. Verify stage complete. AI nudges: "UAT passed. Ready to release?"截图 + curl 响应已采集,Verify 阶段完成。AI 提醒:「UAT 通过,准备发布?」
Deployed. BACKLOG status: ✅ Done. $roll-sentinel begins monitoring.已发布。BACKLOG 状态变 ✅ Done。$roll-sentinel 接手监控。
Detects an emerging pattern: 3 similar auth helpers could be extracted. Creates REFACTOR-015 in BACKLOG.检出涌现模式:3 处相似的认证辅助函数可抽取。创建 REFACTOR-015。
OAuth endpoint response time degrading (3 consecutive failures). Auto-creates FIX-012. $roll-fix patches + regression test. Resolved before users notice.OAuth 端点响应变慢(连续 3 次失败)。自动创建 FIX-012。$roll-fix 修补 + 回归测试。用户察觉前已解决。
Take 20 years of proven engineering practices (TDD / TCR / CI / DDD / SRE)
and encode them as standardized AI Agent work instructions.
AI won't cut corners, won't get tired, won't "skip the tests this time" —
because that branch doesn't exist in its instructions.
把 20 年沉淀的工程实践(TDD / TCR / CI / DDD / SRE)
编码为标准化的 AI Agent 工作指令。
AI 不会偷懒、不会疲惫、不会「这次先跳过测试」——
因为它的指令里根本没有那条分支。
Requirement to production: hours
Zero-rework micro-step delivery
New dev onboards in minutes
需求到生产:小时级
零返工的微步交付
新人几分钟上手
Four automated lines of defense
Live-evidence verification
Sentinel + Dream 24/7 watch
四道自动化防线
活证据验证
Sentinel + Dream 7×24 守望
23 skills, one unified system
Three-layer autonomy
Human decides, AI delivers
23 个 Skill,统一一套体系
三层自治模型
人来决策,AI 来交付
@seanyao/roll · MIT · 23 skills · npm install -g @seanyao/roll
@seanyao/roll · MIT · 23 个 Skill · npm install -g @seanyao/roll