Architecture Reference
How engram-go decides where a document lives, what gets indexed, and what gets handed to the language model when you ask a question about it. Every byte from the first write to the last recall, traced end-to-end.
Architecture Diagram
Size drives routing. Every document is classified once on arrival and never reclassified. The classification determines which tables hold the data and what the chunker sees.
buildSynopsis() in
internal/mcp/ingest_document.go:72 produces the first 8 KiB of the
document verbatim, then appends a --- Outline --- section containing every
Markdown h1/h2 heading found in the full body, capped at 2 KiB.
This is what lands in memories.content for Tier-1 and Tier-2.
Tier-0
size <= 500 KB
memories.content = full bodydocuments table not touchedLazyChunkThreshold)memory_store_document path; no new codeTier-1
500 KB – 8 MB (default)
memories.content = synopsis (8 KiB + outline)RawBody carries full body in-memory onlyRawBody — recall stays groundeddocuments table not touchedRawBody tagged json:"-"; never persistedTier-2
8 MB – 50 MB (default)
memories.content = synopsisdocuments.body = full body (durable)memories.document_id FK setmemory_query_documentTier-3
size > 50 MB
Retrieval
Every stored memory participates in both retrieval paths. The paths differ in cost, depth, and what they return.
Semantic search for finding memories. Embeds the query, runs vector ANN against the
chunks table via HNSW index, and merges with BM25 full-text scores.
Composite score: 0.50 x cosine + 0.35 x bm25_norm + 0.15 x recency_decay.
Importance multiplier applied on top. Best matching chunk per memory is returned as context.
T0 T1 T2 All tiers participate via the chunks table. Tier-2 chunks are from synopsis only, so ranking accuracy is proportional to synopsis quality.
RecallWithOpts() — internal/search/engine.go
Deep query against a specific memory's full content. Content source depends on tier:
T0
Full body from memories.content — handed directly to
QueryDocument().
T1
Synopsis from memories.content plus semantic chunk lookup via
RecallWithinMemory() for targeted queries.
T2
Full body fetched from documents table via document_id FK.
QueryDocument() applies regex or substring window extraction,
enforces a token budget, then calls Claude for span-grounded answer generation.
query_document.go + claude/document_query.go
Design Decisions
Each decision in the tiered design was made to solve a specific failure mode.
"Synopsis in memories.content for Tier-1 and Tier-2"
memories.content to populate result summaries. If Tier-2 put the raw body there,
a single recall call returning 10 results could transfer hundreds of megabytes.
The synopsis gives recall enough signal to rank without punishing every search query.
"RawBody is ephemeral — carries the full body for chunking then disappears"
RawBody, Tier-1 chunks would be built from the synopsis, making recall
no better than Tier-2. RawBody is tagged json:"-" so it cannot
accidentally enter any serialization path. It is set by execStoreDocument(),
consumed by storeChunksForMemory(), and gone.
"documents table for Tier-2, not a blob store or object storage"
memories.document_id to documents.id means
document bodies are deleted when memories are deleted, and they are visible within
the same PostgreSQL transaction as the memory write. No S3 bucket, no presigned URLs,
no orphan cleanup job. At the 50 MB hard limit, PostgreSQL handles this scale comfortably.
"Composite score: 0.50 cosine + 0.35 BM25 + 0.15 recency"
exp(-0.01 * hours_since_access)). The 50/35/15 split was tuned against
the engram benchmark suite in docs/benchmarks/2026-04-summarize-llm-comparison.md.
The recency weight is intentionally small — it breaks ties without overriding relevance.
"HNSW index via pgvector, not in-process cosine scan"
m=16, ef_construction=64) keeps approximate recall quality above 0.97
while reducing the recall path from an O(n) transfer to a top-K index scan.
IVFFlat was rejected because it requires periodic reindexing.
"charCap is a hard limit on QueryDocument span budget"
TokenBudget * 4 chars, but if it did, it appended nothing
and set Truncated=true. The last span was silently dropped. Fix: truncate
the span at a valid UTF-8 rune boundary to use the remaining budget, then set
Truncated=true. The span appears in the result; the caller knows it is cut.
A second guard was added after the walk-back loop to prevent appending a zero-length span
when the boundary lands at position 0.
Sources and References
Internal Design Documents
Academic and Technical Foundations
Prior Art and Inspiration
Note: the original design session for tiered ingestion included additional web references that are no longer accessible (the conversation context was compacted). The academic foundations above represent the complete known bibliography for the techniques implemented.