πŸ” OrchestKit Γ— Fable 5 Loop Design β€” Where We Stand & Gaps

Gap analysis vs. Lance Martin's three pattern families (2026-06-09): self-correction loops (/goal Β· rubrics Β· Outcomes), verifier sub-agents over self-critique, and the memory progression fail β†’ investigate β†’ verify β†’ distill β†’ consult. Repo state: v8.29.1, post PR #2339 (Fable 5 model vocabulary).

Composite

5.9/10

πŸ” Self-correction loops

6.2

βš–οΈ Verifier sub-agents

7.0

🧠 Memory progression

4.4

Pattern families β†’ OrchestKit surfaces

Green = strong implementation Β· amber = partial Β· red = missing. Click tabs above for per-family deep dives.

flowchart LR
  classDef good fill:#1d3325,stroke:#3fb950,color:#e6edf3
  classDef warn fill:#332a14,stroke:#d29922,color:#e6edf3
  classDef bad fill:#3a1d1d,stroke:#f85149,color:#e6edf3
  classDef fam fill:#2a1d3a,stroke:#d2a8ff,color:#e6edf3,font-weight:bold

  F1["πŸ” Self-correction loops
6.2/10"]:::fam F2["βš–οΈ Verifier sub-agents
7.0/10"]:::fam F3["🧠 Memory outer loop
4.4/10"]:::fam F1 --> A1["prd-to-goal 9/10
PRD β†’ /goal assertions"]:::good F1 --> A2["brainstorm iterative-opt 8.5/10
autoresearch metric loop"]:::good F1 --> A3["cover heal loop 7/10
ci-sentinel 7/10"]:::warn F1 --> A4["rubric-file convention
grader-in-loop Β· CMA Outcomes"]:::bad F2 --> B1["verify 6-7 fork graders
composite GATES merge"]:::good F2 --> B2["adversarial refutation
blind refuters + ledger + quorum"]:::good F2 --> B3["assess advisory-only
no dimension blockers"]:::warn F2 --> B4["machine-readable
rubric contract"]:::bad F3 --> C1["investigate 7/10
dream Β· staleness cron Β· lint"]:::good F3 --> C2["consult 4/10
only 2/6 auto-inject"]:::warn F3 --> C3["VERIFY loop never closes
distill fragmented Β· journal unbuilt"]:::bad

βœ… Strongest: independent-context verification

verify/review-pr/assess already do what Lance recommends β€” grading in fork contexts, blind adversarial refuters, citation re-verification, anti-sycophancy. The bones of "Outcomes" exist.

verify Β· review-pr Β· adversarial-refutation.md Β· bare-eval

⚠️ Partial: loops exist but each is bespoke

prd-to-goal is a reference rubric-as-assertions implementation, but every loop (cover Γ—3, brainstorm stuck-5, ci-sentinel hourly) hard-codes its own convergence logic. No shared rubric file, no grader auditing /goal assertions.

prd-to-goal Β· cover Β· ci-sentinel Β· ScheduleWakeup

πŸ”΄ Weakest: memory stops at step 2–3

Per Lance's taxonomy OrchestKit memory behaves like "Opus 4.7": good investigation, low verification coverage, fragmented distillation (recent-decisions.md corrupted), mostly-manual consultation. The Fable 5 differentiator β€” closing fail→…→consult β€” is the open prize.

dream Β· staleness cron (unconsumed) Β· recent-decisions.md

πŸ” Family 1 β€” Self-correction loops Β· 6.2/10

"Let Claude run, collect feedback via the goal or rubric, self-correct, and proceed until satisfied."

flowchart LR
  classDef have fill:#1d3325,stroke:#3fb950,color:#e6edf3
  classDef miss fill:#3a1d1d,stroke:#f85149,color:#e6edf3,stroke-dasharray:5 4
  R["πŸ“‹ Rubric / Goal"]:::have --> RUN["▢️ Run task"]:::have
  RUN --> FB["πŸ“Š Collect feedback
tests Β· metrics Β· CI"]:::have FB --> SC["πŸ”§ Self-correct"]:::have SC --> RUN FB --> DONE{"rubric
satisfied?"}:::have DONE -- yes --> STOP["⏹️ stop"]:::have RF["πŸ“„ rubric.json file contract
(user-editable, cross-skill)"]:::miss -. missing .-> R GR["πŸ§‘β€βš–οΈ grader audits assertions
on /goal timeout"]:::miss -. missing .-> DONE CMA["☁️ CMA Outcomes integration"]:::miss -. missing .-> STOP
SurfaceLoop mechanismScoreMissing
prd-to-goalPRD β†’ /goal until observable assertions9.0assertion-quality grader
brainstorm iterative-optbaseline β†’ ideate β†’ measure β†’ keep/discard β†’ stuck-58.5rubric file; underused
audit-full/goal until findings.critical == 08.0configurable severity ladder
ScheduleWakeupcache-aware poll β†’ decide β†’ reschedule7.5no rubric, imperative only
cover (healer)generate β†’ run β†’ heal failures Γ—37.0hard-coded 3; no test-quality rubric
ci-sentinelhourly classify-failing-PRs, propose-don't-apply7.0no auto-learning from approved fixes
verify / assessrubrics exist but grade once β€” no loop5.0no "score < target β†’ improve β†’ re-run"

βš–οΈ Family 2 β€” Verifier sub-agents Β· 7.0/10

"A verifier sub-agent tends to outperform self-critique, because grading is done in an independent context window." The gap is authority: who can actually block stop?

flowchart TB
  classDef good fill:#1d3325,stroke:#3fb950,color:#e6edf3
  classDef warn fill:#332a14,stroke:#d29922,color:#e6edf3
  classDef bad fill:#3a1d1d,stroke:#f85149,color:#e6edf3
  W["πŸ› οΈ Producer (lead context)"] --> V1["verify: 6-7 fork graders
8 dims, composite under 6.0 BLOCKS βœ…"]:::good W --> V2["review-pr: blind refuters
quorum + ledger + no-auto-flip βœ…"]:::good W --> V3["assess: refuters but
ADVISORY ONLY ⚠️"]:::warn W --> V4["bare-eval: fresh-context judge
primitive, no verdict surface ⚠️"]:::warn V1 --> GATE{"stop-gate"} V2 --> GATE V3 -. no authority .-> GATE GATE --> X1["❌ no dimension-level blockers
(security 3.2 hides in passing composite)"]:::bad GATE --> X2["❌ no Outcomes-style
grader-must-approve-stop"]:::bad
SurfaceIndependent contextRefutationStop-gating
verifyβœ… fork Γ—6-7β€”YES β€” composite gates merge
review-prβœ… fork, no teamβœ… blind, quorumYES β€” CRITICAL β†’ request-changes
assessβœ… fork Γ—4βœ… Phase 2.5NO β€” advisory only
coverβœ… worktree/tierβ€”implicit (flag after 3 heals)
bare-eval / eval-runnerβœ… --bare freshβ€”NO β€” pipeline primitive
brainstorm devil's advocateβœ… agentspartialNO β€” ranking only

🧠 Family 3 β€” Memory progression Β· 4.4/10

Fable 5 completes fail β†’ investigate β†’ verify β†’ distill β†’ consult (verification coverage up to 73%). OrchestKit currently exits around step 2–3 β€” the "Opus 4.7" profile.

flowchart LR
  classDef s3 fill:#3a1d1d,stroke:#f85149,color:#e6edf3
  classDef s7 fill:#1d3325,stroke:#3fb950,color:#e6edf3
  classDef s5 fill:#332a14,stroke:#d29922,color:#e6edf3
  classDef s4 fill:#332a14,stroke:#d29922,color:#e6edf3
  FAIL["1️⃣ FAIL Β· 3/10
remember skill, handoffs
❌ no 'ignored-advice' tracking"]:::s3 INV["2️⃣ INVESTIGATE Β· 7/10
dream refs, staleness cron,
memory-lint, validator"]:::s7 VER["3️⃣ VERIFY Β· 5/10
staleness classified BUT
reports never consumed"]:::s5 DIS["4️⃣ DISTILL Β· 3/10
dream dedup exists BUT
recent-decisions.md corrupted,
experiment journal unbuilt"]:::s3 CON["5️⃣ CONSULT Β· 4/10
2/6 auto-inject; brainstorm
prints a reminder, not results"]:::s4 FAIL --> INV --> VER --> DIS --> CON CON -. outer loop never closes .-> FAIL

Gap M1 β€” VERIFY never closes the loop

Nightly staleness reports have zero consumers; no signal records whether a consulted (or ignored) memory changed an outcome. This is the exact step Lance identifies as the Fable 5 differentiator.

staleness_cron.py β†’ docs/reports/* (unread)

Gap M2 β€” DISTILL fragmented + corrupted index

recent-decisions.md (the distilled-rules index) contains mangled truncated entries and has no owning write mechanism. Brainstorm's experiment journal is referenced in phase-workflow.md but the TSV was never implemented.

.claude/rules/recent-decisions.md Β· dream SKILL.md

Gap M3 β€” CONSULT is manual

priorDecisionsLoader injects "IMPORTANT: search memory" instead of running the search and injecting results. Only session-handoff + memory-lint auto-inject.

context-loaders-env.ts Β· session-handoff-injector.ts

πŸ•³οΈ Gap heatmap β€” ranked by (impact Γ— cross-family reach) / effort

Note how the rubric-file contract and stop-gating grader each appear in two families β€” one fix, double coverage.

#GapFamilySeverityEffortFix sketch
1No machine-readable rubric contractπŸ” + βš–οΈHIGH3-4 dork-rubric/1.0 schema; backfill verify/assess/review-pr/prd-to-goal
2assess advisory-only β€” no stop-gatingβš–οΈHIGH2-3 demit assess-verdict.json; gate /ork:implement at <5.5
3Memory VERIFY loop never closes🧠HIGH1 wkconsume staleness reports in CI; record consultβ†’outcome signals
4recent-decisions.md corrupted; journal unbuilt🧠HIGH2-3 ddream owns regeneration; implement brainstorm TSV journal
5No grader-in-the-loop for /goalπŸ”MED2 dpost-timeout grader audits assertion quality
6verify lacks dimension-level blockersβš–οΈMED1-2 dsecurity <4.0 always blocks regardless of composite
7CONSULT manual (4/6 surfaces)🧠MED2-3 dpriorDecisionsLoader runs search, injects results (relevance-gated)
8No cross-skill loop primitiveπŸ”MED?likely a chain-patterns reference, NOT a skill ("/workflows: use don't wrap")
9No CMA / Outcomes integrationπŸ” + βš–οΈLOW*n/awatch β€” hosted product surface, plugin can't reach it yet

* LOW for the plugin today; HIGH strategically if CMA exposes an API the plugin can target.

πŸ—ΊοΈ Recommended sequencing

A β†’ B β†’ (C reassessed). Approach A closes the #1 gap of two families with one contract.

flowchart TB
  classDef rec fill:#2a1d3a,stroke:#d2a8ff,color:#e6edf3,font-weight:bold
  classDef nor fill:#161b22,stroke:#79c0ff,color:#e6edf3
  classDef opt fill:#161b22,stroke:#8b949e,color:#8b949e,stroke-dasharray:5 4
  A["πŸ…°οΈ Rubric contract + stop-gating grader Β· 8.3
ork-rubric/1.0 schema Β· assess gates implement
verify dimension blockers Β· /goal assertion grader
β‰ˆ 1 week"]:::rec B["πŸ…±οΈ Close the memory VERIFY loop Β· 7.8
consume staleness reports · consult→outcome signals
fix recent-decisions.md Β· experiment journal TSV
auto-inject prior decisions Β· β‰ˆ 1 week"]:::nor C["πŸ…² loop-until-rubric primitive / CMA parity Β· 6.5
⚠️ devil's advocate: '/workflows β€” use, don't wrap'
after A this may reduce to a documented pattern"]:::opt A -->|"rubric schema unlocks
verification instrumentation"| B A -->|"grader gate makes C
mostly free"| C B --> DONE["🏁 OrchestKit completes the progression
= 'Fable 5 memory profile' per Lance's taxonomy"] C -. reassess .-> DONE

πŸ…°οΈ Rubric contract + gating β€” composite 8.3 Β· RECOMMENDED

One JSON schema (dimensions, weights, min_pass, min_blocker, bands) shared by verify/assess/review-pr/prd-to-goal. assess emits a verdict file that gates /ork:implement; verify gains dimension blockers; /goal gains a post-timeout assertion grader.

Risk: schema versioning burden β†’ keep bands optional, start with the 3 skills that already have rubrics.

πŸ…±οΈ Memory verify loop β€” composite 7.8

Targets the weakest family and the Fable-5-specific differentiator. Weekly job diffs staleness reports + warns on PRs touching stale-memory files; consult events recorded; dream regenerates recent-decisions.md.

Risk: auto-injection token cost β†’ gate by relevance score.

πŸ…² Loop primitive β€” composite 6.5 Β· DEFER

A generic loop-until-rubric harness risks wrapping what CC-native /goal + Workflow already provide (prior decision: "use /workflows, don't wrap"). Build A first; C likely becomes a chain-patterns reference.

Reassess after A ships.

πŸŽ›οΈ Goal Builder β€” turn this analysis into a /goal line

Pick scope and effort; copy the generated kickoff prompt + /goal line into Claude Code.