+ "Ask the docs"ยท Claude full-context (docs <500pp) + prompt cache ยท no RAG
Execution โ all additive (lexical hot path never blocked)
build-time
Embed ~400 docs
@xenova ONNX, ~50-80MB, NOT tfjs-node (383MB > Vercel 250MB)
โ
ship
search-vectors.json
~0.8MB artifact, loaded server-side
โ
runtime
Orama mode:hybrid
load vectors โ lexical โช vector query
โ
measure
recall vs held-out set
must beat lexical-only to keep
Risk heat map
tfjs-node at runtime โ 383MB breaks Vercel 250MB limit๐ดโ๐ข MITIGATED ยท @xenova ONNX, build-time only
recall improvement unproven at ~400 docs๐ก GATE ยท held-out query set must show gain
+0.8MB vector payload๐ก server-side load, NOT shipped to browser
lexical hot path โ untouched๐ข fully additive, zero regression
cold start โ embed is build-time, no runtime model load๐ข no added cold cost
Trigger gate โ should we build this at all?
Does query telemetry show lexical search missing synonym / concept matches users expect?
NO โ stay deferred โ
Live verify 2026-06-08: "agnt"โ783 hits, typo tolerance + 5 tabs all work. Lexical closes ~80-90% of the gap. Don't build speculatively.
YES โ build Approach C
Add the build-time embed step + hybrid mode, then prove recall gain on a held-out set before keeping it. Effort โ 2 sprints.
Decision (ADR-lite): defer. Lexical is sufficient per live verification; semantic adds a ~0.8MB artifact + build complexity for unproven recall gain on a 400-doc corpus. Revisit only when telemetry proves a synonym/concept gap. Fully reversible โ nothing on the lexical hot path changes.