/ork:auto routing rules — the tune a real model demanded

The deterministic mirror (route-check.mjs) sits at 100% on the same rules. Sending the actual rule text to a real model (sonnet, headless, 50-case routing-benchmark.json) scored 78% — the 22-point gap decomposed into 4 clusters. This PR closes 3 of them at the rules layer.

Baseline (pre-tune)
78%
39/50 · real sonnet call
After tune v2
96%
48/50 · target was 0.95
Paired-case delta
+16.6pp
77.8% → 94.4% · 0 regressions

Cluster 1 · "why…" is always diagnose

diagnose → /ork:fix-issue
-

A "why…" question is a gentler entry than a fix command.

+

A "why…" question is ALWAYS diagnose, even when it names a failure — the question form is what makes it diagnose, not the absence of a symptom.

"why isn't the build green" → diagnose "why does the API return 500" → diagnose "there's a regression in checkout" → fix

Cluster 2 · "untested" is cover without a % target

cover → /ork:cover --target {N}
-

"write more tests" with no target → ask the target %.

+

A surface called out as "untested" is cover even with no % target — the "fix" there repairs a coverage gap, not a bug; ask the target % at invoke time.

"the payments service is untested, fix that" → cover

Cluster 3 · a skill/agent target is improve-skill regardless of verb

improve-skill → champion/challenger holdout gate
-

"optimize my prompt" (not a skill file) → route to optimize.

+

When the improvement target IS a skill or agent — a SKILL.md, a named skill, an agent prompt — it is improve-skill regardless of the verb.

"optimize the prompt for the security-auditor skill" → improve-skill "make the brainstorm SKILL.md produce better ideas" → improve-skill

Rule 7 · canonical do-not-guess examples

Disambiguation — when multiple categories match
+

Do not guess when two DIFFERENT skills are plausible and the goal names no artifact, metric, or number.

"fix performance on the dashboard" → fix vs optimize "review my code" (no PR/branch) → review vs verify "test the new feature" → cover vs verify vs build