Honest dimensional comparison of local-first / sovereign memory substrates
This is a head-to-head against other sovereign / local-first memory substrates — projects competing on data residency, integrity, and offline operation. If you're choosing between M3 and a developer-tool memory layer like Mem0, Letta, Zep, or LangChain Memory, see the developer-facing decision guide instead.
| Dimension | m3-memory (2026.4.24.12) | agentmemory (V4.0.2) | Chronos (High/Res) | Hindsight (v0.5.4) | Mastra OM (v1.26.0) | Mem0 (v3.0.0) | Memento (v1.0.0) | MemPalace (v3.3.0) |
|---|---|---|---|---|---|---|---|---|
| ▸ Sovereignty & Integrity (M3 strengths) | ||||||||
| Sovereignty (Main) | 🛡️ Full Sovereign | 🛡️ Full Sovereign | ⚖️ On-Prem | ⚖️ High Local | ⚖️ Hybrid | 🔻 Cloud-Tied | ⚖️ Config-Local | 🛡️ Full Verbatim |
| └ Data Residency | 🏆 Local SQLite | ✅ Local SQLite | ✅ Local Files | ✅ Local Files | ⚖️ Postgres / Container | 🔻 Cloud DB | ✅ Local SQLite | ✅ Local SQLite |
| └ Extraction Compute | 🏆 Local SLM | ✅ Deterministic | ✅ ISO-Temporal | ✅ Neural / Local | 🔻 Cloud Reflector | 🔻 Cloud LLM | ⚖️ User-Defined | ❌ Verbatim only |
| └ Telemetry / Audit | 🏆 Zero / Bitemporal | ✅ Zero / Merkle | ✅ Event Logs | ✅ Internal | ⚖️ Usage Logs | 🔻 SaaS Metrics | ✅ Zero / Merkle | 🛡️ Total Dark |
| └ Infrastructure | 🏆 Native Python + SQLite | ✅ Native Python | ⚖️ Linux / Python | ⚖️ Py / Services | 🔻 Docker stack | ✅ SDK / API | ✅ Native Python | ✅ Native Python |
| Data Integrity | 🏆 Bitemporal Logic + Undo | 🏆 Merkle Tree | ✅ Event Logs | ✅ Traceable | ⚖️ DB-Level only | 🔻 Managed only | 🏆 Merkle-Audit | 🔻 JSON Desync Risk |
| Bitemporal & Undo | 🏆 Full bitemp + Undo | ⚖️ Temporal sig. | ✅ Audit log | ✅ Traceable | ✅ 3-Date Anchor | 🔻 No Undo | 🏆 Merkle-Audit | ❌ Verbatim only |
| Privacy / GDPR | 🏆 Native GDPR tools | ✅ Local-only | ✅ On-Prem | ✅ Local-only | ⚖️ Hybrid | 🔻 No native | ✅ Local-only | 🛡️ Total sovereignty |
| ▸ Multi-Agent & Concurrency | ||||||||
| Multi-agent Writes | 🏆 Atomic (WAL) | 🏆 Durable Objects | ✅ Turn-based | 🏆 Shared Banks | ⚖️ Adapter-based | ✅ ID-Scoped | ✅ Transactional | 🔻 Silent failures |
| Multi-agent Orchestration | 🏆 Native MCP handoffs | 🏆 Orchestrated | ✅ Sequential | 🏆 Bank-Scoped | 🏆 Supervisor | ✅ ID-Scoped | 🏆 Native (MCP) | ✅ Agent Diaries |
| Native OS Support | 🍎 🐧 🪟 | 🍎 🐧 | 🐧 🍎 | 🐧 🍎 | 🔻 (Docker only) | 🍎 🐧 🪟 | 🍎 🐧 🪟 | 🍎 🐧 🪟 |
| Multi-Computer Sync | 🏆 Bi-dir Delta Sync | ✅ Managed API | ✅ Web Server | 🏆 Local Server | 🏆 Cloud / EKS | 🏆 Cloud Native | ⚖️ Local Sync | 🔻 Manual Sync |
| ▸ Retrieval & Extraction (M3 is solid; not the recall leader) | ||||||||
| LME-S Score | 89.0% | 96.2% (🏆 #1) | 95.6% | 91.4% | 94.9% | 89.1% | 90.8% | 96.6% (R@5) |
| Search Strategy | ✅ 3-Pillar Hybrid | 🏆 6-Signal Hybrid | ⚖️ Dual-Index | 🏆 4-Stream Neural | ✅ Reflective | ✅ Vector-only | ✅ Compositional | ⚖️ Spatial / AAAK |
| Local Fact Extraction | 🏆 Local SLM | ✅ Deterministic | ✅ ISO-Temporal | ✅ Entity-centric | 🏆 Reflector | ✅ LLM-Powered | ✅ Entity-Res. | ❌ Verbatim only |
| Token Efficiency | 🏆 Working Memory | ✅ Signal Filter | ✅ Event-Pruned | 🔻 Heavy Rerank | ✅ Cache-Stable | 🏆 ~90% Savings | ✅ Verbatim Fall. | ⚖️ AAAK Dialect |
| ▸ Architecture (for context) | ||||||||
| Architecture | 3-Tier (Short / Working / Long-Term) | 6-Signal Hybrid | Event Calendar | 4-Stream | 3-Tier (Obs / Ref) | Dual-Store | Bitemporal KG | Loci Hierarchy |
M3 is not the right answer for every workload. Pick from the table based on what matters most to you:
If sovereignty, bitemporal correctness, and a small auditable codebase matter more than raw recall, M3 is built for that combination — and the table above is the receipt.
For the developer-tool decision (Mem0, Letta, Zep, LangChain Memory), see the developer-facing comparison guide.
Hover any underlined acronym for its full meaning. SOTA = state-of-the-art.
What it means: Independence from cloud services, telemetry, and external dependencies for normal operation.
Why it matters: If your data can't leave the machine — for legal, contractual, or personal reasons — every external dependency is a compliance risk and an attack surface.
M3 standing: Full Sovereign. Local SQLite, local SLM extraction, zero telemetry, native Python — runs on a laptop or in an air-gapped enclave with the same code path.
Sub-dimensions:
pip install — no Docker, no services, no daemons.Cohort context: M3 is tied with agentmemory and MemPalace at the top. Cloud-tied systems (Mem0) can't reach this tier without significant rework. Mastra OM's Docker stack is more dependent than M3's plain-Python install.
What it means: Mechanisms that keep data accurate, consistent, and tamper-evident across time and across agents.
Why it matters: Silent corruption destroys trust slowly. By the time you notice the memory is wrong, the bad fact has already propagated through dozens of decisions.
M3 standing: Bitemporal logic with native undo. Every write is durable (WAL), every fact is bounded by valid-time and transaction-time, and supersedes relationships record exactly which old fact was replaced and when.
Cohort context: Merkle-tree systems (agentmemory, Memento) provide cryptographic audit but no native undo; bitemporal gives undo but isn't cryptographic. JSON-store systems (MemPalace) carry silent-desync risk.
What it means: Tracking facts along two independent time axes — valid time (when the fact was true in the world) and transaction time (when M3 learned it) — with the ability to undo writes.
Why it matters: Agents make mistakes. Without bitemporal logic and undo, every error becomes permanent or requires destructive overwrites that lose context.
M3 standing: SOTA — full bitemporal model + native undo via supersedes relationships.
Cohort context: Memento offers Merkle-style audit (different shape, also strong). Mem0 has no undo; mistakes there are sticky.
What it means: Built-in primitives for the right to erasure (Article 17) and data portability (Article 20).
Why it matters: Many regulated workloads require these capabilities to be operational, not theoretical. Implementing them retroactively on top of a memory layer is expensive and error-prone.
M3 standing: SOTA — gdpr_forget (hard delete) and gdpr_export (portable JSON) ship as MCP tools. See also the FISMA and CMMC alignment notes.
Cohort context: Local-only systems (agentmemory, Memento) inherit privacy by deployment but require custom GDPR tooling. Mem0 has no native GDPR primitives.
What it means: Safe handling of concurrent writes when multiple agents update memory simultaneously.
Why it matters: In real agent swarms, race conditions silently destroy knowledge — and the corruption usually surfaces hours or days later.
M3 standing: Atomic via SQLite WAL. Writes are durable, ordered, and crash-safe.
Cohort context: agentmemory (durable objects) and Hindsight (shared banks) are at parity. MemPalace's "silent failures" mode is the worst category here — writes can drop without surfacing an error.
What it means: Built-in primitives for task handoff, context sharing, and coordinated agent lifecycles.
Why it matters: A single agent is useful. Multiple agents that can hand off work mid-task are the actual value of "agentic" systems.
M3 standing: Native MCP handoffs, agent registry, notifications, tasks — all via the same MCP tool surface, no extra runtime required.
Cohort context: Memento also goes native MCP. Mastra OM uses a supervisor pattern. Mem0 scopes by ID but doesn't ship handoff primitives.
What it means: Runs natively on macOS, Linux, and Windows without Docker or platform-specific dependencies.
Why it matters: Most developers' machines are macOS or Windows; most production servers are Linux. A memory layer that works in all three avoids forcing operational compromises.
M3 standing: Full native support — same install command everywhere.
Cohort context: Mastra OM's Docker-only deployment is the outlier; the rest of the cohort cover at least two OSes.
What it means: Synchronizing memory across multiple physical machines without a central cloud.
Why it matters: A laptop, a desktop, and a server should be able to share memory without giving the data to a SaaS provider.
M3 standing: Bi-directional delta sync via PostgreSQL or ChromaDB — set one env var, your memories follow you across devices.
Cohort context: Cloud-native systems (Mem0) deliver sync trivially but at the cost of sovereignty. MemPalace requires manual sync.
What it means: LongMemEval-S, a 500-question benchmark for long-horizon conversational memory retrieval. The single number captures recall accuracy averaged across question types.
Why it matters: A retrieval layer that can't find what's there is a liability. But — and this is the part vendor pages skip — the benchmark doesn't measure sovereignty, integrity, undo, or compliance. A 96% score sourced from a cloud LLM tells you nothing about whether your data left the machine.
M3 standing: 89.0% — solid mid-pack. We chose to optimize correctness, sovereignty, and a small codebase over chasing the last 7 percentage points of recall.
Cohort context: agentmemory (96.2%), MemPalace R@5 (96.6%), Chronos (95.6%), and Mastra OM (94.9%) lead on raw recall. If recall is the only dimension that matters, those are the better picks. M3 (89.0%) is closer to Mem0 (89.1%), Memento (90.8%), and Hindsight (91.4%).
What it means: The retrieval architecture under the hood — how the system blends keyword, vector, and structural signals.
Why it matters: Vector-only search hallucinates synonyms. Keyword-only misses paraphrases. The blend is what matters.
M3 standing: 3-Pillar Hybrid — FTS5 (BM25) + vector cosine + MMR diversity reranking. Explainable per-result scores via memory_suggest.
Cohort context: agentmemory's 6-signal hybrid and Hindsight's 4-stream neural model push further. Vector-only systems (Mem0) are simpler but lose precision on terminology-heavy queries.
What it means: Distilling structured facts from raw conversational text — entirely on-device.
Why it matters: Raw text is noisy. Extracted facts are dense, queryable, and easier to refresh. Doing this locally preserves sovereignty.
M3 standing: SOTA — dedicated local SLM pipeline (LM Studio / Ollama / vLLM compatible).
Cohort context: Mastra OM's reflector matches M3 in capability but runs in the cloud. Mem0's LLM-powered extraction is strong on quality but breaks sovereignty. MemPalace's verbatim-only design skips extraction entirely.
What it means: The high-level system design and internal memory organization.
Why it matters: Architecture determines how the system scales, what kinds of queries it can answer, and how easy it is to extend.
M3 standing: 3-Tier (short / working / long-term) optimized for real agent lifecycles, with bitemporal logic threaded through every tier.
Cohort context: agentmemory's 6-signal hybrid is richer; MemPalace's spatial loci hierarchy is novel. Trade-off: more complex architectures cost more to maintain.
What it means: How effectively the system reduces context-window usage and downstream LLM costs.
Why it matters: Every token saved is dollars saved and latency reduced. At scale, the difference is order-of-magnitude.
M3 standing: Strong — working-memory optimization plus 3-pillar retrieval keeps context tight without sacrificing recall coverage.
Cohort context: Mem0's reported ~90% savings is the leader; M3 is mid-pack but balances token efficiency against bitemporal richness. Heavy reranking (Hindsight) wastes tokens.