๐Ÿ”Ž CC-Idiom Conformance Grader

Eval v2 ยท closes #2193 ยท scripts/eval/conformance-check.mjs ยท advisory (WARN-only) ยท 2026-06-04

2
Checks (high-precision)
5
Real findings (0 FP)
160โ†’0
FP eliminated (dropped C3)
1659
Docs scanned
What it is: a static, deterministic grader (no LLM, no network, no gh) that scans every skill + agent doc for stale Claude Code idioms โ€” automating the manual staleness-hunting that recurred all session (the CC 2.1.162 stragglers, the doctor 173-vs-144 hook count, the fumadocs drift). Wired into scripts/eval/static-analysis.sh as Section F; advisory (emits WARN, never blocks CI).

The two checks (precision over recall)

CheckDetectsFalse-positive guards
C1 model-idOpus 4.6-and-older used as a current model in SKILL.md / agent bodiesskips URLs, co-author templates, "4.6+" floor form, "3ร— Opus 4.6" comparisons, migration/historical context
C2 hook-counta hardcoded "N hooks" disagreeing with hooks.json (144)requires OrchestKit-hook context โ€” React "19 hooks" etc. excluded; skips "~/about/planned/added"

What it found (all real, zero false positives)

C1 assess/SKILL.md:74 ยท implement/SKILL.md:74 ยท verify/SKILL.md:65 โ†’ example full model ID claude-opus-4-6 should be claude-opus-4-8 C2 doctor/references/hook-validation.md:5 โ†’ "66 hooks" should be 144 C2 validate-counts/references/count-locations.md:19 โ†’ "173 hooks" should be 144

These 5 are the grader's first backlog โ€” a fast follow-up fix (each edit triggers docs regen, so handled separately to keep this PR to the tool itself).

The C3 lesson โ€” why dogfooding mattered

A first draft included a floor-drift check (C3): flag any "CC 2.1.X" cited below the supported floor. Dogfooding produced 160 false positives โ€” because skills correctly annotate features with the version that introduced them (Effort scaling (CC 2.1.76)), which is indistinguishable by regex from a stale requirement. C3 was removed. Floor consistency (frontmatter compatibility:) is already enforced by test-cc-version-floor.sh โ€” not duplicated here.

Advisory now โ†’ gate later

Ships WARN-only (process.exit(0)) so it can't break CI while signal is validated. Once trusted, CONFORMANCE_STRICT=1 makes it exit non-zero on any finding โ€” promotable to a blocking gate, the same path the plugins/ and docs drift gates took. Complements (does not duplicate) cc-triage.mjs (adoption gaps), test-cc-version-floor.sh (floor), and test-count-components.sh (manifest counts).
#2193 (Eval v2 #156). Scoped via Explore sub-agent against existing eval infra; built high-precision after a 160โ†’5 false-positive reduction. This is the PR's playground gate artifact. The keystone the session converged on: turns manual staleness-hunting into a repeatable static check.