Learning Curator 3.1 — Re-audit 報告

日期:2026-05-22 · 分支:feat/learning-curator-pivot · 比對基準:2026-05-21 audit · PRs 已合:#46 #47 #48 #49 #50

TL;DR

Eval 執行結果

dashboard-concurrency-guard SHIP

5 trials · k=5 · verdict policy: non-regression · preflight: skip

Trial 1/5: PASS (1)
Trial 2/5: PASS (1)
Trial 3/5: PASS (1)
Trial 4/5: PASS (1)
Trial 5/5: PASS (1)
Verdict: SHIP

對應 PR-D 加的 Layer 6 optimistic concurrency gate。Agent 在 5 trials 都正確:(1) 識別 candidate 實際狀態已變 dismissed;(2) 認定 Reviewer B 的 stale request 應被拒;(3) 識別 HTTP 409;(4) 確認 absent expected_current_status 會 degrade gracefully。

Layer 0 — Enablement / Scope Gate

狀態檢查項目證據
PASSshouldObserve() 串接 ARCFORGE_OBSERVE_SKIP_PATHShooks/observe/main.js:61-73
PASSNEW: ARCFORGE_OBSERVE_EXPLICIT_SKIP=1 guard (PR-B)main.js:80
PASSNEW: ARCFORGE_OBSERVE_SELF_ANALYSIS=1 guard (PR-B)main.js:81
PASSEval-trial 路徑 regexmain.js:57-58, 67-68
PASSDisabled-by-defaultlearning.js:75-76
PASSFail-closed (no fs.write)main.js:75-90
PASSSpec codified env var 命名 (PR-A)layer-0-*.md:39-48, 73
FIRST-SLICE-ACCEPTDaemon spawn 自己 claude 沒 export ARCFORGE_OBSERVE_SELF_ANALYSIS=1(dev repo plugin disabled 緩解)observer-daemon.sh:245

Layer 1 — Observation Collection

狀態檢查項目證據
PASSSkeleton 必填欄位(含 PR #46 加的 schema_version + sourcemain.js:408-420
PASS不持久化 raw tool_inputmain.js:431-440
PASSSkill args 不持久化main.js:425-426
PASSPostToolUse 只記 outcome + output_bytesmain.js:447-453
PASSPer-tool collection contract(全 10 tool)main.js:157-294
FIRST-SLICE-ACCEPTobservation.skill 在 pre + post 都設(spec 只描述 tool_start)carried from 2026-05-21

Layer 2 — Sanitization + Derived Semantic View

狀態檢查項目證據
PASSEVIDENCE_STATUS 凍結常數 + 4 值sanitize-observation.js:174-179
PASSomitted_no_input vs omitted_safety 語意classifyOmission
PASSDecision 5 keyword + value form 覆蓋sanitize-observation.js:30-94
PASSPer-tool persistence contractmain.js:198-291
PASS欄位名 operation_kind全部正確
PASSsummarizeToolInput read-time onlylearning-observation-view.js
PASSFail-closed: raw fallback 用 OMITTED_NO_INPUTmain.js:435-440
FIRST-SLICE-ACCEPTWebSearch 把 query 寫進 url 欄位(spec 說 "sanitized URL/domain")— 命名小不一致main.js:270-291

Layer 3 — Curator Batch Assembly 全 PASS (11/11)

狀態檢查項目證據
已修Reflect + Recall 已讀(PR-C)— MAX_REFLECTS=10, MAX_RECALLS=10batch-assembler.js:35-36, 469-475
已修Diaries 回傳 DiaryEvidenceItem[](PR-C):177-215
已修source_windows.{diaries,reflects,recalls} 寫 manifest(PR-C):584-595
PASSreadRecentEvidence(kind, ...) 三合一 helper(PR-C simplify):146-219
PASS新 reader:~/.arcforge/reflections/ + ~/.arcforge/recalls/:57-63
PASSOne-way(不讀 Layer 5-8)無相關 import
PASSManifest 持久化路徑正確 + atomic:555, 617
PASSSafety raw_*_included: false:522-531
已修evidence_status_by_id 持久化(PR-B)— 給 Layer 5 omitted_upstream 用:610-612

Layer 4 — LLM Curator Analysis

狀態檢查項目證據
PASSDaemon 不直接讀 observations.jsonl 內容observer-daemon.sh:97-105
PASSBash daemon + Node CLI 分工:158, 296
PASSbody_source: "llm_curator" 三層強制prompt + ingestor + validator
PASSFirst-slice allowed_artifact_types: ["instinct"]observer-prompt.md:18, 70, 102
已修sanitizer_module 在 metadata 不在 prompt 文字(PR-A 修正 audit 措辭)proposal-ingestor.js:175
已修observer-prompt 改用全 4 種 evidence_type cite(PR-C)observer-prompt.md:34-40, 83-86
PASSFailure modes 不建 queue stateproposal-ingestor.js:330-372
DRIFT (新)CuratorRunManifest daemon-side transport/timeout 失敗不寫詳見 Drift #1 ↓

Layer 5 — Candidate Queue + Lifecycle

狀態檢查項目證據
已修Full safety metadata(PR-B)— 3 versions + 3 scans + 6 raw flagsproposal-ingestor.js:172-185 + validator schema.js:407-477
已修evidence_ref_omitted_upstream emit(PR-B)proposal-ingestor.js:437-456
已修Canonical dedupe_basis + superseded transition(PR-C):188-212, :478-489
已修rule_version namespace 分離(PR-C)— sanitizer 'v1' vs evidence-quality 'v1-project_obs_count'schema.js:535 + sanitize-observation.js:185
PASSNEW: INSERTION_STATUSES + isLegalInsertionStatus() guard(PR-C simplify)lifecycle.js:51-58
PASSNEW: EVIDENCE_QUALITY_RULE_VERSION + VALIDATOR_VERSION 常數schema.js:18, 535
PASSBody source 4-value enumschema.js:26
PASSpromoted_from_* 在 scope 上拒收schema.js:214-228
PASSAction × Status matrix 8×7(含 deactivate column)lifecycle.js:79-88
PASSapplyTransition 對 promote/evolve throwlifecycle.js:128-134
PASSEvidence-quality v1 formulaschema.js:74-86, 366-376
PASSAtomic queue + rejections + lockqueue-writer.js:51-117
PASSReplay 容忍 corrupted trailing linequeue-writer.js:270-277
DRIFT (新)7 個 rejection code 定義但 never emit(dead codes)詳見 Drift #2 ↓
DRIFT (新)batch_hash round-trip cross-check 缺漏詳見 Drift #3 ↓
DRIFT (新)source_manifest_missing throw 而非寫 rejection詳見 Drift #4 ↓
FIRST-SLICE-ACCEPTllm_assessment 屬性 spec 標 optional,目前 discards 不 propagateproposal-ingestor.js
FIRST-SLICE-ACCEPTevidence_quality_metadata.basis 只填 project_obs_count,spec 其他欄位標 "may be populated"proposal-ingestor.js:163-167

Layer 6 — Dashboard Review Control Plane 全 PASS (19/19)

狀態檢查項目證據
已修Wire model 加 evidence_quality_chip + relationships(PR-D)learning-dashboard.js:109-112, 125, 145
已修evidence_counts {total, by_type}(PR-D):90-98, 139
已修risk_note_count + uncertainty_note_count(PR-D):140-143
已修expected_current_status optimistic concurrency → HTTP 409(PR-D):370-374 + http.js:144-146
已修safety_ack 必填 activate/deactivate(PR-D):382-393
已修actor 預設值 + reason 寫 audit(PR-D):317, 345
已修Detail view 加 4 個 block: evidence_summaries / llm_assessment / materialization / activation(PR-D):175-205
已修Detail blocks 全部走 sanitizer(PR-D simplify)— assessment + provenance sanitizers:215-246
已修HTML XSS-safe(PR-D simplify)— 0 innerHTML 用 textContent/createElementlearning-dashboard.html:97-285
已修File size 拆分(PR-D simplify)— dashboard.js 559 lines + dashboard-http.js 197 lines(< 700 hard limit)file sizes
PASSPrivacy invariant — 4 adversarial fixtures(API key / Bearer / JWT / private key)全部 redactedtests:253-313
PASSproject_id 從 wire model 剔除:80-88
PASSServer-side Action × Status matrix 強制:377-379 + lifecycle.js:79-88
PASSactions.jsonl audit log(accept + reject 都記):270-278
PASSLayer 6 不 call LLM、不寫 skill、不寫 CLAUDE.md無相關 import
PASSPromote 建新 global candidate,source 狀態不變:397-423
PASSToken-gated POST(24-byte random token)http.js:66-70, 115-117
SHIPEval: dashboard-concurrency-guard 5/5 PASS本次 audit 執行

Layer 7 — Materialization 全 PASS (8/8)

狀態檢查項目證據
已 codifiedrender_policy.include_evidence_summaries: false 寫進 spec(PR-A)materialize.js:84 + spec layer-7 L128-135
PASS只接受 approved + deactivatedmaterialize.js:327-330
PASS只寫 inactive draftmaterialize.js:56-66
PASSPath-containment 防逃逸:399-403
PASSMaterializationRecord 先寫完再回報 Layer 5:464-468
PASSAtomic + lock-protected:380, 406, 464, 478
PASSIdempotent on (candidate_hash, render_policy_version):260-288, 361-371
PASSBody secret scan(strict equality):355-358

Layer 8 — Activation / Runtime Influence Surface 全 PASS (15/15)

狀態檢查項目證據
已修deactivate() 驗 reviewer_ack(PR-B shared helper)activate.js:197-203, 266, 481
已修active_path_summary 不洩 project_id(PR-B)— sha256[:12]:211-213, 372, 527
已 codifiedSpec 寫入 redaction rule(PR-A)layer-8 L251-269
已 codifiedSpec 寫入 hash-verify 順序等價(PR-A)layer-8 L271-273
PASS只接受 materialized + deactivated:255-258
PASSMaterialization record hash check:290-298
PASSActivate 路徑要求 reviewer_ack:266-269
PASSPer-target-kind overwrite policy:86-93, 343-358
PASSclaude_md_addition 不自動 apply(double block):271-280
PASSAllowed roots allowlist + path-resolve containment:303-326
PASSAtomic write + lock + supersede 時備份:331-359, 362, 434
PASSActivationRecord 先寫完再回報 Layer 5:419-424, :562-568
PASSSessionStart 不 auto-load instinctsinject-context.js:271-273 + e2e test
PASS失敗不建 activated eventfail() 在 transition 前 return
PASSHash 算一次重用(PR-B simplify):366, 372

新發現的 Drift(需決定)

Drift #1 — Layer 4: CuratorRunManifest daemon-side 失敗時不寫 DRIFT

檔案:skills/arc-observing/scripts/observer-daemon.sh:286-290

Spec 要求(layer-4 acceptance #10):CuratorRunManifest 對「every attempted run」都要寫,包含失敗路徑。

實作現況persistRunManifest 只在 ingestProposal 函式內被呼叫(proposal-ingestor.js:93-100)。Daemon 端的 claude CLI 失敗(transport_error / timeout / watchdog kill)會直接 return,沒寫 manifest。

影響:審計時無法知道「daemon 嘗試了 N 次,其中 M 次因 transport 失敗」— 失敗統計缺漏。Pre-existing,不是 PR-A/B/C 引入。

修法:daemon 在 abort 路徑改 invoke ingest-proposal --record-failure transport_error|timeout(或加 sibling record-run-failure 命令),保證每次 attempt 都有 manifest。

Drift #2 — Layer 5: 7 個 rejection code dead(定義但 never emit) DRIFT

檔案:scripts/lib/learning-curator/schema.js:508-527

Spec 要求REJECTION_CODES 列 18 個合法 reason,validator 應該全部用到。

實作現況:以下 7 個 code 定義在 frozen 常數,但 codebase 全文檢索沒有任何 emit site:

影響:壞 proposal 全部被分類到 schema_invalid,丟失 root cause 資訊;ref-count 跟 type-against-batch 完全沒檢,可能放行不合規 proposal。

修法:(a) validator 對 artifact_type / scope.kind 用 dedicated code;(b) 加 min/max evidence-ref 強制(pin 2-5 為常數);(c) proposal-ingestor 收 evidence_ref 時跨檢 batch item 的 evidence_type。

Drift #3 — Layer 5: batch_hash round-trip cross-check 缺漏 DRIFT

檔案:scripts/lib/learning-curator/proposal-ingestor.js:285-291

Spec 要求:LLM response 必須回 source.batch_hash,validator 要跟 manifest 的 batch_hash 比對 — stale/wrong response 應拒收。

實作現況:載 batch manifest 用 batch_id,但從不比對 proposal.source.batch_hash === batchManifest.batch_hash

影響:如果 LLM 回了一個過時的 batch_hash(譬如 client 重試時 batch 已經 regenerated),會被靜默接收,產生對應錯 batch 的 candidate。

修法proposal-ingestor.js 比對 payload.source.batch_hash 與 loaded manifest,不一致時 reject 用 source_hash_mismatch

Drift #4 — Layer 5: source_manifest_missing throw 而非 reject DRIFT

檔案:scripts/lib/learning-curator/proposal-ingestor.js:286-291

Spec 要求(layer-5 L244):missing batch manifest 應該寫 rejection(source_manifest_missing)。

實作現況:throw Error,caller(daemon)視為 fatal failure,沒寫 rejection record。

影響:與 Drift #1 連動 — daemon 看到 throw 就 return,連 run manifest 都沒寫。Audit trail 兩個層級的失敗都消失。

修法:改成寫 rejection record(reason: source_manifest_missing)然後正常 return;daemon 才能繼續正常記錄。

整體 verdict

整體 PASS 率 78/82 = 95%。

沒有 P0 bug。Drift #1-4 都是「audit completeness」型 — 真實 production 出狀況的機率低,但會在 root cause analysis 時遺失資訊。

建議下一步

  1. 判斷 4 個新 drift 要不要修:建議都修,但分批
  2. Eval 補強:目前只跑了 dashboard-concurrency-guard。其他 PR-B/C/D 加的功能(deactivate ack、omitted_upstream、dedupe superseded、safety_ack)目前只有 unit test 守,沒有 scenario eval。下方有評估表 ↓

Eval scenario 補強評估

候選 scenario對應 PR價值建議
dashboard-safety-ack-requiredPR-D高 — safety gate 是 user-facing 行為建議補
deactivate-reviewer-ack-requiredPR-B Blocker #3高 — 防 silent destructive建議補
evidence-ref-omitted-upstreamPR-B Blocker #2中 — 目前 Layer 3 不會放 non-present 進 batch,是防禦欄位可延後
dedupe-superseded-transitionPR-C中 — lifecycle correctness可延後(已有 unit test 守)
typed-evidence-cite-chainPR-C低 — 屬於 capability 而非 gate不補

建議:先補 2 個高價值的(safety_ack + deactivate_ack),其他維持 unit test 守。如果你想全部補,我可以一次寫齊 5 個 scenarios。