roll it just works.

AI Agent Engineering Delivery Framework

Roll out features with AI agents — Move fast, no sprints.

BIPO · 2026

Context

A Shift is Happening

AI coding tools have evolved from "auto-completing a few lines" to "delivering entire features."
The developer's role is shifting from writing code to directing AI.

Before: Humans Write Code

Developers write every line themselves. Quality depends on individual skill and experience.
Companies constrain people through architecture standards, code reviews, and QA processes.

Now: AI Writes Code, Humans Decide

AI handles the actual coding. But with the same AI tool, the same model —
different people get dramatically different results.

The problem isn't "which AI tool to use" — just pick one and standardize.
The problem is: why does the same tool produce great results for some and garbage for others?

01 / 19
Core Problem

The Constraint System
Hasn't Caught Up

We used to have an entire system to constrain people — architecture standards, processes, code reviews.
Now AI is the primary executor, but the constraint system for AI is almost nonexistent.

1. Architecture Constraints

AI doesn't know your architecture conventions, module boundaries, or forbidden zones. Different prompts = different outputs.

2. Methodology Application

Constraints must be built into the project so AI follows them automatically — not documents people read once.

ROLL solves a people problem 本质上解决的是「人」的问题
ROLL bakes constraints and methodology into the project, so output quality is consistent regardless of who's driving.

02 / 19
Solution

What ROLL Actually Is

An autonomous delivery system for software teams — AI agents pick stories from your BACKLOG,
execute them with encoded engineering discipline, and ship continuously while you stay focused on what to build next.

🔄

Autonomous Delivery

roll loop on runs BACKLOG items hourly. Dream scans code health nightly and surfaces maintenance tasks. Humans retain sole release authority — the system never ships to production without your approval.

🔧

Skill-Driven Execution

22 skills encode TDD, TCR, and INVEST practices as reliable, repeatable workflows any agent can follow. Works with Claude, Cursor, Codex, or your own agent — swap the tool, keep the discipline.

In One Sentence

ROLL turns a BACKLOG into shipped code continuously. Engineering practices are encoded as executable skills — reliable enough for an agent to run unattended, disciplined enough to ship production code.

03 / 19
Pipeline

The Delivery Pipeline

From raw idea to production — five stages, three loops, one continuous flow.

Idea

Raw thought
Anyone can submit

Backlog

AC ready
Feature Doc done

Build

TCR micro-steps
Spar · Review · CI

Verify

Deploy to Test/UAT
Live Evidence

Release

Deploy to Prod
Sentinel takes over

Loop A · Think It Through 想清楚
Loop B · Build It Right 做扎实
Loop C · Keep Watch 盯住了

↩ Loop C finds issues → auto-creates new Idea → back to Pipeline

04 / 19
Collaboration

Human × AI:
Who Drives Each Phase?

The two ends need human judgment. The middle runs on autopilot.

🧑‍💻 → 🤖

Human Asks AI

"Help me research this, break it into Stories"

Idea → Backlog

🤖 ⚡

AI Self-Drives

ROLL Loop auto-delivers. No babysitting needed.

Backlog → Build → Verify

🤖 → 🧑‍💻

AI Nudges Human

"UAT passed. Ship it?"

Verify → Release

The more automated the middle is, the more humans can focus on the two ends — deciding what to build and when to ship. These are judgment calls AI shouldn't make alone.

05 / 19
Loop A

Loop A: Think It Through 想清楚

From vague idea to executable Backlog — powered by DDD and structured design.

1. DDD Domain Modeling

Establish Bounded Contexts, Ubiquitous Language, and Context Maps. Ensure engineering speaks the same language as business.

2. Research — $roll-research

HV Analysis: vertical traces the full lifecycle, horizontal compares competitors. Cross-axis produces insights. Output: PDF report.

3. Design — $roll-design

Solution exploration with DDD modeling, architecture decisions, interface definitions, data models. When uncertain, explores multiple options before committing.

4. Decompose + AC

Break into INVEST-compliant User Stories with acceptance criteria. Write to BACKLOG.md + docs/features/. Each Story is independently deliverable.

$roll-research $roll-design $roll-idea $roll-propose
06 / 19
Backlog

Backlog: The Single Source of Truth

Everything flows through BACKLOG.md — four work item types, each with different Loop A depth.

User Story

Business requirements, product features. Full Loop A: DDD → Research → Design → AC. The heaviest investment upfront.

ID: US-XXX

Refactor

Tech debt, architecture improvements. Often generated by $roll-.dream's nightly scans. Medium Loop A: impact analysis → plan → AC.

ID: REFACTOR-XXX

Fix

Bug fixes, Sentinel alerts, user reports. Light Loop A: locate root cause → AC. Fast path from diagnosis to delivery.

ID: FIX-XXX

Spike

Exploratory research. Loop A only — the output is knowledge, not code. May produce new User Stories or Refactors.

ID: SPIKE-XXX

Priority Order

FIX (bugs first) > US (user stories) > REFACTOR (tech debt). Automated by $roll-loop — the autonomous executor scans BACKLOG hourly and routes each item to the right skill.

07 / 19
Fundamentals

TDD + TCR: The Build Rhythm

Test-Driven Development writes the standard first.
TCR (Test && Commit || Revert) enforces it mechanically.

1. Write Test
2. Write Code
3. Auto-Verify
✅ Pass → Auto-Commit
❌ Fail → Auto-Revert
2-5 min

Per micro-step
Worst case: lose 5 minutes

100%

Every save is verified
Code is always runnable

Fully Auto

AI executes autonomously
No human babysitting

08 / 19
Loop B

Loop B: Build It Right 做扎实

The full delivery pipeline — from Backlog item to verified deployment.

1. Read Story → Break into Actions (2-5 min each)

Decompose into minimal deliverable Actions. Independent Actions run in parallel.

2. TCR Micro-Loop (per Action)

Write test (RED) → Write code (GREEN) → Self-review ($roll-.review) → Auto-commit. Fail = auto-revert.

3. Local CI Gate

Lint + type check + full test suite + build. All must pass before push.

4. Push → Remote CI (Objective Arbiter)

CI re-verifies in a clean environment — the final ruling on "shippable."

5. Deploy to Test/UAT → Verification Gate

Live Evidence required — screenshots, curl responses, test outputs. AI saying "I checked, it works" doesn't count.

6. Release → Deploy to Production

Human approves. BACKLOG status: ✅ Done. Sentinel takes over monitoring.

09 / 19
Quality System

Four Lines of Defense

Every layer is automated. None depend on human patience or memory.

L1: TCR Micro-Verification

Verify every 2-5 minutes. Define the standard before writing code — auto-revert if it fails. Bugs are eliminated the instant they appear.

$roll-build

L2: Spar Adversarial Drill

For critical modules (payments, auth), red-blue testing. One AI attacks, another defends. Max 5 rounds of escalating intensity.

$roll-spar

L3: Auto Review

Self Review — per-commit 6-dimension check.
Peer Review — cross-agent negotiation for risky decisions.
Dream — nightly code health scan (6 dimensions).

$roll-.review · $roll-peer · $roll-.dream

L4: Sentinel Production Patrol

24/7 random-sample monitoring. Alerts only after 3 consecutive failures (false-positive prevention). Auto-creates fix tasks.

$roll-sentinel

10 / 19
Loop C

Loop C: Keep Watch 盯住了

Two complementary monitors: Sentinel watches runtime, Dream watches code structure.

$roll-sentinel — Runtime Patrol

Random-sample monitoring of production systems. Cost-controlled AI validation with intelligent spot-checking.

Patrol Modes:

Light: 5 checks/day · Intensive: 20 checks/hour (post-release) · Full sweep: weekly

Output: FIX-XXX entries in BACKLOG.md

$roll-.dream — Nightly Health Scan

Runs at 3am. Six dimensions of code health analysis:

1. Dead Code · 2. Architectural Drift · 3. Pruning Candidates · 4. Emerging Patterns · 5. Doc Coverage · 6. Doc Freshness

Output: REFACTOR-XXX entries in BACKLOG.md

Sentinel monitors behavior. Dream monitors structure.
Together they ensure both runtime health and code quality degrade detection — before users notice.

11 / 19
Autonomous

Three-Layer Autonomous Model

ROLL operates at three levels of autonomy, each with clear boundaries.

🧑‍💻

Human Layer

Set goals, review proposals, approve releases. The judgment calls.

roll-release · roll-propose
🔄

Loop Layer

Scans BACKLOG hourly (10am–6pm). Auto-routes each item to the right skill. FIX > US > REFACTOR.

roll-loop · roll-brief
🌙

Dream Layer

3am nightly code health scan. 6 dimensions. Generates REFACTOR entries autonomously.

roll-.dream
🤝

Peer Layer

Cross-agent negotiation for high-risk decisions. Up to 3 rounds. No consensus → escalate to human.

roll-peer
Design Principle

Humans set direction and approve releases. Everything else — building, reviewing, monitoring, refactoring — can run autonomously. The system never ships to production without human approval.

12 / 19
Skill System

ROLL's 22 Skills

13 Core + 4 Autonomous + 5 Support — each maps to a specific phase.

SkillTierWhat It Does
$roll-researchResearchHV analysis — timeline + competitive landscape → PDF report
$roll-designDesignDDD modeling, solution design, INVEST story breakdown
$roll-ideaCaptureFast backlog capture — one-liner in, classified BACKLOG entry out
$roll-proposeProposeGenerate 1-3 structured US drafts → PROPOSALS.md for human review
$roll-buildBuildUniversal entry: US-XXX / FIX-XXX / plain text → TCR delivery
$roll-sparAdversarialRed-blue drill: Attacker writes exploits, Defender patches
$roll-fixFixSingle-bug fix + mandatory regression test
$roll-debugDiagnoseBlack Box probe: Console/Network/DOM/Perf → root cause
$roll-sentinelPatrolProduction random-sample monitoring, 3-strike alerting
$roll-docDocumentAuto-scan, index, gap analysis, fill for project docs
$roll-notesJournalProject diary — records dev moments chronologically
$roll-doctorMaintainROLL self-health check (symlinks/config/skill status)
$roll-releaseReleaseVersion bump, tag, push — triggers npm auto-publish
$roll-loopAutoHourly BACKLOG executor — routes items to skills
$roll-peerAutoCross-agent peer review, up to 3 negotiation rounds
$roll-briefAutoOwner-facing briefing: done, in-progress, queue, escalations
$roll-.dreamAutoNightly 6-dimension code health scan → REFACTOR entries
$roll-.reviewHiddenPer-commit self-review: correctness, security, maintainability
$roll-.changelogHiddenAuto-generates CHANGELOG.md from completed stories
$roll-.qaHiddenTest pyramid standards: unit/E2E/visual/smoke + CI gates
$roll-.echoHiddenPassive intent clarification for vague inputs
$roll-.clarifyHiddenScope clarification for under-specified Fly mode inputs
13 / 19
Comparison

Scrum QA vs. ROLL Auto-QA

Quality assurance isn't removed — the implementation is upgraded.

Scrum QA (Traditional)

🐌Dev finishes → wait for QA availability
🐌Bugs surface only in test phase
🐌Manual regression — slow, error-prone
🐌Post-ship issues found by user complaints
🐌Quality depends on individual diligence

ROLL Auto-QA

Test while writing — bugs caught at birth
Fix cost approaches zero (max 5 min loss)
Auto regression + nightly Dream scans
Issues found before users notice (Sentinel)
Standards built into the system, not people
14 / 19
Structure

What a ROLL-Managed Project Looks Like

roll init — one command, 5 seconds.

my-project/
├── AGENTS.md   ← Engineering conventions (shared by all AI tools)
├── BACKLOG.md   ← Single source of truth (US / FIX / REFACTOR / SPIKE)
├── PROPOSALS.md   ← AI-generated proposals awaiting human review
├── docs/features/   ← Story details + acceptance criteria
├── docs/domain/   ← DDD context maps + ubiquitous language
├── docs/dream/   ← Nightly health scan reports
├── .github/workflows/
│   ├── ci.yml   ← Auto quality gate on every commit
│   └── sentinel.yml   ← Scheduled production patrols
└── ... source code
15 / 19
Configuration

One Command, Every AI Tool Aligned

roll setup syncs conventions to all AI tools simultaneously.

~/.roll/
Unified config
~/.claude/ → CLAUDE.md + skills/
~/.gemini/ → GEMINI.md + skills/
~/.cursor/ → rules + skills/
~/.kimi/ ~/.trae/ ...

Zero Intrusion

Never overwrites existing configs. Writes to its own file, appends via @include.

Symlinked Skills

Update ROLL, re-run setup — every AI tool upgrades in seconds.

3 CLI Commands

roll setup · roll init · roll update

16 / 19
Walkthrough

A Feature's Complete Journey

Example: building a "User Login" feature across all three loops.

9:00 AM — Idea enters the Pipeline

PM submits: "We need user login." $roll-design runs DDD modeling, decomposes into 3 Stories (password auth, OAuth, remember me), writes AC. Idea → Backlog.

9:30 AM — $roll-loop picks up US-001

$roll-build starts TCR delivery. Micro-step rhythm: verify + commit every 3 minutes. 12 tests passing in 30 min. 8 micro-commits.

10:00 AM — $roll-spar auto-triggers

Auth module detected as high-risk. Attacker attempts SQL injection, brute force, session hijacking. Defender patches. 5 rounds, coverage: 71% → 93%.

10:15 AM — $roll-peer reviews the design

Cross-agent negotiation flags a session management concern. 2 rounds of discussion. Consensus reached, implementation adjusted.

10:30 AM — CI green → Deploy to UAT → Evidence

Screenshots + curl responses captured. Verify stage complete. AI nudges: "UAT passed. Ready to release?"

10:45 AM — Human approves → Release to Prod

Deployed. BACKLOG status: ✅ Done. $roll-sentinel begins monitoring.

3:00 AM — $roll-.dream runs nightly scan

Detects emerging pattern: 3 similar auth helpers could be extracted. Creates REFACTOR-015 in BACKLOG.

Next Day — $roll-sentinel catches anomaly

OAuth endpoint response time degrading (3 consecutive failures). Auto-creates FIX-012. $roll-fix patches + regression test. Resolved before users notice.

17 / 19
Full Picture

APE × ROLL Framework

Pipeline · Loops · Human×AI · Four Defenses · Three-Layer Autonomy — everything on one page.

快 Fast
需求到生产:小时级
🛡️
稳 Solid
四道防线 · 活证据
🤝
齐 Aligned
22 Skills · 标准即代码
Loop A · 想清楚
Idea
任何人可提 · 业务 · 产品 · 工程师
↓ DDD → Research → Design → AC
Backlog
AC 就绪 · Feature Doc 完成
User Story Refactor Fix Spike
research · design · idea · propose
Loop B · 做扎实
Build
TCR 微步 2-5min · Spar · Review · CI
build · spar · fix · .review · peer · .qa
↓ CI 绿灯 · 代码合并
Verify
Deploy to Test/UAT · 活证据采集
debug
Loop C · 盯住了
Release
Deploy to Prod
Sentinel 运行时巡逻
Dream 夜间代码扫描
sentinel · .dream
🧑‍💻→🤖 人找 AI
🤖⚡ AI 自驱
🤖→🧑‍💻 AI 催人
↩ Loop C 发现问题 → 自动创建新 Idea → 重新进入 Pipeline
FOUR LINES OF DEFENSE
L1 · TCR
2-5min 验证
L2 · Spar
红蓝对抗
L3 · Review
Self·Peer·Dream
L4 · Sentinel
生产巡逻
THREE-LAYER AUTONOMOUS
🧑‍💻
Human
目标·审批·发版
🔄
Loop
每小时执行
🌙
Dream
凌晨 3 点体检
🤝
Peer
跨 Agent 协商
18 / 19
Summary

ROLL's Core Logic

Take 20 years of proven engineering practices (TDD / TCR / CI / DDD / SRE)
and encode them as standardized AI Agent work instructions.
AI won't cut corners, won't get tired, won't "skip the tests this time" —
because that branch doesn't exist in its instructions.

Fast

Requirement to production: hours
Zero-rework micro-step delivery
New dev onboards in 5 minutes

🛡️

Solid

Four automated defense lines
Live evidence verification
Sentinel + Dream 24/7 watch

🤝

Aligned

22 skills, one unified system
Three-layer autonomy
Human decides, AI delivers

@seanyao/roll · MIT · 22 Skills · npm install -g @seanyao/roll

19 / 19
Navigate F Fullscreen