Architecture Reference

Four tiers. One document.
Zero wasted context.

How engram-go decides where a document lives, what gets indexed, and what gets handed to the language model when you ask a question about it. Every byte from the first write to the last recall, traced end-to-end.

Architecture Diagram

engram-go: Document Storage and Chunking Strategy Ingest path (top) | Retrieval path (bottom) | Env overrides: ENGRAM_MAX_DOCUMENT_BYTES, ENGRAM_RAW_DOCUMENT_MAX_BYTES Document In Size Classifier classifyDocumentSize() -- ingest_document.go:43 Tier-0: Small size <= 500 KB default 8 MiB threshold Tier-1: Stream Synopsis 500 KB to 8 MB default 8 MiB upper bound Tier-2: Raw Document 8 MB to 50 MB default 50 MiB upper bound Tier-3: Reject size > 50 MB hard limit, no override Storage memories.content = full body documents table: not used chunks: built from full content <8 KB: LazyChunk = 1 chunk execStoreDocument() Storage memories.content = synopsis RawBody = full body (ephemeral) chunks: built from RawBody documents table: not used RawBody json:"-" not persisted Storage memories.content = synopsis documents.body = full body memories.document_id = FK chunks: from synopsis only execStoreDocument() HTTP 413 Request Rejected No storage No chunking No retrieval path Chunking Source: full content document: semantic, 2000-char focused: 512-tok, 50 overlap Chunking Source: RawBody (full body) document: semantic, 2000-char focused: 512-tok, 50 overlap Chunking Source: synopsis only document: semantic, 2000-char focused: 512-tok, 50 overlap RETRIEVAL PATH memory_recall Vector ANN + Full-Text Search across chunks table Best chunk per memory, composite score ranking Returns top-N memory summaries + scores All tiers participate via chunks table RecallWithOpts() -- search/engine.go memory_query_document Tier-0: full content direct Tier-1: chunks or synopsis text Tier-2: fetches documents.body from DB All: QueryDocument() -- span extraction + LLM answer query_document.go + claude/document_query.go Synopsis = buildSynopsis(): first 8 KiB verbatim + h1/h2 heading outline (<=2 KiB) | LazyChunkThreshold = 8 KB | document mode = ChunkDocument | focused mode = ChunkText

The Four Tiers

Size drives routing. Every document is classified once on arrival and never reclassified. The classification determines which tables hold the data and what the chunker sees.

Synopsis formula (Tier-1 and Tier-2): buildSynopsis() in internal/mcp/ingest_document.go:72 produces the first 8 KiB of the document verbatim, then appends a --- Outline --- section containing every Markdown h1/h2 heading found in the full body, capped at 2 KiB. This is what lands in memories.content for Tier-1 and Tier-2.

Tier-0

Small

size <= 500 KB

  • memories.content = full body
  • documents table not touched
  • Chunks built from full content
  • Below 8 KB: single chunk (LazyChunkThreshold)
  • Existing memory_store_document path; no new code

Tier-1

Stream Synopsis

500 KB – 8 MB (default)

  • memories.content = synopsis (8 KiB + outline)
  • RawBody carries full body in-memory only
  • Chunks built from RawBody — recall stays grounded
  • documents table not touched
  • RawBody tagged json:"-"; never persisted

Tier-2

Raw Document

8 MB – 50 MB (default)

  • memories.content = synopsis
  • documents.body = full body (durable)
  • memories.document_id FK set
  • Chunks built from synopsis only
  • Full body fetched on memory_query_document

Tier-3

Reject

size > 50 MB

  • HTTP 413 returned immediately
  • No database writes
  • No chunking, no embedding
  • Threshold not configurable
  • Caller must split or summarize before storing

Retrieval

Two Paths Out

Every stored memory participates in both retrieval paths. The paths differ in cost, depth, and what they return.

memory_recall

Semantic search for finding memories. Embeds the query, runs vector ANN against the chunks table via HNSW index, and merges with BM25 full-text scores.

Composite score: 0.50 x cosine + 0.35 x bm25_norm + 0.15 x recency_decay. Importance multiplier applied on top. Best matching chunk per memory is returned as context.

T0 T1 T2 All tiers participate via the chunks table. Tier-2 chunks are from synopsis only, so ranking accuracy is proportional to synopsis quality.

RecallWithOpts() — internal/search/engine.go

memory_query_document

Deep query against a specific memory's full content. Content source depends on tier:

T0 Full body from memories.content — handed directly to QueryDocument().

T1 Synopsis from memories.content plus semantic chunk lookup via RecallWithinMemory() for targeted queries.

T2 Full body fetched from documents table via document_id FK. QueryDocument() applies regex or substring window extraction, enforces a token budget, then calls Claude for span-grounded answer generation.

query_document.go + claude/document_query.go

Design Decisions

Why It Works This Way

Each decision in the tiered design was made to solve a specific failure mode.

"Synopsis in memories.content for Tier-1 and Tier-2"

Keeps memory_recall fast for all tiers. The recall path reads memories.content to populate result summaries. If Tier-2 put the raw body there, a single recall call returning 10 results could transfer hundreds of megabytes. The synopsis gives recall enough signal to rank without punishing every search query.

"RawBody is ephemeral — carries the full body for chunking then disappears"

Tier-1 avoids a documents table row but still grounds chunks in the full body. Without RawBody, Tier-1 chunks would be built from the synopsis, making recall no better than Tier-2. RawBody is tagged json:"-" so it cannot accidentally enter any serialization path. It is set by execStoreDocument(), consumed by storeChunksForMemory(), and gone.

"documents table for Tier-2, not a blob store or object storage"

Transactional integrity without infrastructure complexity. The FK from memories.document_id to documents.id means document bodies are deleted when memories are deleted, and they are visible within the same PostgreSQL transaction as the memory write. No S3 bucket, no presigned URLs, no orphan cleanup job. At the 50 MB hard limit, PostgreSQL handles this scale comfortably.

"Composite score: 0.50 cosine + 0.35 BM25 + 0.15 recency"

Semantic search misses exact matches; lexical search misses paraphrase. The composite score is a weighted fusion of HNSW approximate nearest-neighbor cosine similarity, BM25 full-text rank, and an exponential recency decay (exp(-0.01 * hours_since_access)). The 50/35/15 split was tuned against the engram benchmark suite in docs/benchmarks/2026-04-summarize-llm-comparison.md. The recency weight is intentionally small — it breaks ties without overriding relevance.

"HNSW index via pgvector, not in-process cosine scan"

The original design fetched up to 10,000 chunk embeddings into Go for cosine scoring. At 10k+ memories this becomes a latency and memory bottleneck. The pgvector HNSW index (m=16, ef_construction=64) keeps approximate recall quality above 0.97 while reducing the recall path from an O(n) transfer to a top-K index scan. IVFFlat was rejected because it requires periodic reindexing.

"charCap is a hard limit on QueryDocument span budget"

Issue #195: the original implementation checked whether adding a span would exceed TokenBudget * 4 chars, but if it did, it appended nothing and set Truncated=true. The last span was silently dropped. Fix: truncate the span at a valid UTF-8 rune boundary to use the remaining budget, then set Truncated=true. The span appears in the result; the caller knows it is cut. A second guard was added after the walk-back loop to prevent appending a zero-length span when the boundary lands at position 0.

Sources and References

The Shoulders It Stands On

Internal Design Documents

engram-go Phases 3-7 Design Spec Approved 2026-04-10. Covers embedding client, search engine, MCP server, background workers, and CLI entry point for the full Go port.
docs/superpowers/specs/
2026-04-10-engram-go-phases-3-7-design.md
pgvector ANN Recall + MinHash/LSH Consolidation Design Spec Draft 2026-04-10. Details HNSW index migration from BYTEA to vector(768), RecallWithOpts rewrite, and O(n) MinHash candidate generation for Consolidate.
docs/superpowers/specs/
2026-04-10-pgvector-ann-minhash-design.md
Tiered Document Ingestion — Phase A4 Commit Peter Simmons, 2026-04-17. Commit cbf5b07. Introduces classifyDocumentSize, buildSynopsis, execStoreDocument, RawBody field, and documents table migration 010. Documents the exact tier boundaries and the RawBody ephemeral-field pattern.
git show cbf5b07
memory_query_document + QueryDocument Fix — Commits 23e7d57, 584deae, cafd6a4 Peter Simmons, 2026-04-17. Phase A5 (memory_query_document + RecallWithinMemory) and issue #195 charCap hard-limit fix including UTF-8 rune-boundary walk-back.
git log --oneline

Academic and Technical Foundations

Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs Yu. A. Malkov, D. A. Yashunin. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020. Foundation for the HNSW index used in pgvector for approximate vector recall. Establishes the m and ef_construction parameters used in the HNSW index definition.
Mining of Massive Datasets — Chapter 3: Finding Similar Items Jure Leskovec, Anand Rajaraman, Jeff Ullman. Cambridge University Press. 3rd edition, 2020. Canonical reference for MinHash signature computation and LSH banding. Directly informs the NumHashes=128, NumBands=16, RowsPerBand=8 parameters in internal/minhash.
The Probabilistic Relevance Framework: BM25 and Beyond Stephen Robertson, Hugo Zaragoza. Foundations and Trends in Information Retrieval, vol. 3, no. 4, 2009. Foundation for the BM25 scoring used in the FTS search path. The 0.35 weight in the composite score reflects BM25's strength on exact-term and short-document recall.
doi:10.1561/1500000019
pgvector: Open-source vector similarity search for Postgres Andrew Kane. 2021–present. Provides the vector data type, HNSW and IVFFlat indexes, and the <=> cosine distance operator used in VectorSearch().
nomic-embed-text Nomic AI, 2024. 768-dimensional text embedding model served via Ollama. Default embedding model for engram-go; dimensions drive the vector(768) column type.
Model Context Protocol Specification Anthropic, 2024. Defines the SSE transport, tool call/result schema, and resource model that all 17 engram-go MCP tools conform to.

Prior Art and Inspiration

petersimmons1972/engram — Python service (predecessor) Peter Simmons. The original Python MCP memory server that engram-go replaces. Wire-compatible: same tool names, same SSE transport, same PostgreSQL schema up to migration 009. The tiered document ingestion (A4) and memory_query_document (A5) are Go-only extensions with no Python equivalent.
github.com/petersimmons1972/engram
mcp-go — Go SDK for the Model Context Protocol mark3labs. v0.45.0. The MCP server library that handles SSE transport, tool registration, and request/response encoding for all 17 tools.

Note: the original design session for tiered ingestion included additional web references that are no longer accessible (the conversation context was compacted). The academic foundations above represent the complete known bibliography for the techniques implemented.