Persistent memory for Vapi voice agents. The assistant knows who's calling — names, preferences, past interactions — before saying a word. Apache 2.0, free tier, self-host or hosted.
Get started free → View on GitHubVapi assistants are stateless across calls. The caller phones again next week and starts from scratch: "Hi, who am I speaking with? What's your phone number?" The Vapi team is explicit — memory across calls "is not currently supported" and you're told to build it yourself with webhooks and a database.
Sign up at mengram.io (free, no card). Your key is shown once on the Dashboard → Keys page — copy it. You'll paste it into the YOUR_MENGRAM_KEY placeholders below.
Drop this into your assistant's Tools section in the Vapi dashboard. The recall_caller tool hits Mengram's voice webhook at call start.
Returns a string the assistant can verbalize. Example:
In your Vapi assistant's system prompt:
In the Vapi assistant config, set the Server URL to:
Vapi's Server URL receives all assistant events (status-update, partial transcript, end-of-call-report, etc.). Mengram filters internally — only end-of-call-report triggers extraction, everything else returns a benign 200. Safe to wire one Server URL for the whole assistant.
Mengram extracts entities, facts, and a summary from the final transcript — keyed to the caller's phone. Next call, recall_caller returns it.
Each phone number gets its own memory namespace. One Mengram account can power thousands of caller memories across your white-label clients.
Vector (pgvector) + BM25 + Reciprocal Rank Fusion. Beats pure cosine search on keyword-heavy queries like "what was the policy number".
Ebbinghaus forgetting curve on facts (e^(-0.03 · days)). Old facts naturally fade. New ones surface first. No manual cleanup.
Daily reflection cron synthesizes patterns across episodes — "this caller prefers morning slots", "anxiety about novocaine" — so the next call gets context, not raw transcripts.
How is this different from mem0?
Mem0 is text-agent-shaped. You glue it to Vapi with n8n + custom code. Mengram's webhook adapter is Vapi-native: paste the JSON, you're done. Same hybrid retrieval underneath, just less wiring.
Will this work with Retell / Pipecat / LiveKit?
The recall/save endpoints are HTTP webhooks — anything that can POST JSON works. Native Pipecat processor and LiveKit agent helper aren't built yet; if you need one, email ali@mengram.io and I'll prioritize based on demand.
What about HIPAA?
Self-host gives you full data residency (Apache 2.0, your Postgres, your OpenAI key). Hosted-cloud BAA isn't yet available — for now, healthcare voice agents should self-host.
How much does it cost?
Free tier = 40 saves + 200 recalls/month, which is roughly 40 inbound calls/month (1 call = 1 recall + 1 save). Enough to validate the integration. Paid tiers from $5/mo. Each recall_caller consumes 1 search quota; each end-of-call save consumes 1 add quota. Full pricing →
What about latency?
Single recall: ~500–900ms. Under concurrent load (10–20 simultaneous calls): p50 ≈ 1200ms, p95 ≈ 1300ms. The agent calls recall_caller during the natural greeting pause, so callers don't notice — but if you have hard sub-1s SLA, benchmark in your setup first. Measured against mengram.io production with 1186-word transcripts indexed.
Can I test without a phone number using Vapi's web "Talk to Assistant" button?
Yes, but recall won't have a phone to key on (web calls don't have customer.number). You'll see "Web caller — no phone number yet" until you switch to a phone-attached assistant. For full end-to-end testing, buy a $1/mo Vapi phone number and call yourself.
What if my assistant config uses the OpenAI-style nested toolCalls shape?Both work. Vapi sends toolCalls (nested function.name / function.arguments) and toolCallList (flat) — Mengram parses either. arguments can also arrive as a JSON string, handled.
40 saves + 200 recalls per month on the free tier. No credit card.
Get your Mengram key →