🧭 Agent Activation β€” State of the Work

diagnose β†’ research β†’ audit β†’ experiment β†’ ship Β· OrchestKit Β· the full thread, where it landed

The thesis β€” both halves, both now evidenced

β‘  WIRE β€” so it FIRES

activation = wiring, not merit
  • subagent_type=ork:X from busy skills
  • agentType= on Workflow stages (the 394 bucket)
  • descriptions are Ξ”0 (A/B 18/18) β€” never the lever

β‘‘ GROUND β€” so it CATCHES

merit = grounding (multi-source)
  • CandleKeep books + Tavily current + context7
  • in the agent BODY (spawn-only β†’ 0 baseline cost)
  • recall 2/4 β†’ 4/4 on subtle vulns (proven)
βŠ•
Wire the specialist so it fires; ground it so it catches what a generic misses. The 76%-generic problem and the under-used-catalog problem are the same fix from two sides β€” and both sides now have data.

The recall experiment (v2) β€” click an arm

A Β· ungrounded
B Β· grounded (right)
C Β· grounded (WRONG)

Reduce βŠ• Improve β€” defense funds offense

πŸ›‘ REDUCE (defense)

RTK 61% live Β· vitest 94% / 19M tok
skill descs ~9.6K tok baseline β€” uncapped whale
compress on arrival never on cache (headroom βœ—, Alex's tool βœ“)

βš” IMPROVE (offense)

GROUND CandleKeep ~1:1 β†’ specialists (OWASPβ†’sec, K8sβ†’infra…)
RESEARCH Tavily freshness (current CVEs/docs)
LEARN memory + LLM wiki (p.106) compound across runs
every token REDUCE saves is budget IMPROVE spends β€” and grounding made the review output better AND cheaper ($0.074 vs $0.078, less padding). One data point of the whole thesis.
4/4
grounded recall
2/4
ungrounded Β· control
0b
baseline cost added
$0.45
total experiment

Where we are now

βœ…
Shipped to source β€” grounded security-auditor
src/agents/security-auditor.md: Grounding Protocol (body) + WebSearch/WebFetch Β· validated (test:agents βœ“, token-overhead βœ“, baseline-neutral)
βœ…
Evidence saved β€” durable
memory agent-activation-audit + LLM wiki page 106 extended (Index + Changelog, live-verified)
⏳
Build + commit β€” gated
edit lives in src/ only; npm run build + branch + commit held back β€” ~174 claude procs, couldn't confirm solo session (your worktree rule)
⏳
Stress-test the experiment
stronger model + 3–4 snippets β€” does the 2/4β†’4/4 delta hold? (Haiku, n=1, self-judged so far)
⏳
Extend the pattern
code-quality-reviewer β†’ Code Review v4 (273p) Β· then the rest of the CandleKeep↔specialist map