Mengram × Vapi

Stop making your callers repeat themselves.

Persistent memory for Vapi voice agents. The assistant knows who's calling — names, preferences, past interactions — before saying a word. Apache 2.0, free tier, self-host or hosted.

Get started free → View on GitHub

The problem

Vapi assistants are stateless across calls. The caller phones again next week and starts from scratch: "Hi, who am I speaking with? What's your phone number?" The Vapi team is explicit — memory across calls "is not currently supported" and you're told to build it yourself with webhooks and a database.

Without memory Caller: "Hi, calling back about the consult."
Agent: "Sure! Can I get your name and phone number?"
Caller: "...you should have it. I just called last week."
With Mengram Caller: "Hi, calling back about the consult."
Agent: "Hi Sarah! How did the cleaning on the 14th go? Want me to book that wisdom teeth consult we discussed?"
Caller: 🤯

Setup in 4 steps

Get your Mengram API key

Sign up at mengram.io (free, no card). Your key is shown once on the Dashboard → Keys page — copy it. You'll paste it into the YOUR_MENGRAM_KEY placeholders below.

Add Mengram as a Vapi tool

Drop this into your assistant's Tools section in the Vapi dashboard. The recall_caller tool hits Mengram's voice webhook at call start.

{ "type": "function", "function": { "name": "recall_caller", "description": "Get what we know about this caller", "parameters": { "type": "object", "properties": { "phone": { "type": "string" } } } }, "server": { "url": "{{BASE_URL}}/v1/voice/vapi/recall", "headers": { "Authorization": "Bearer YOUR_MENGRAM_KEY" } } }

Returns a string the assistant can verbalize. Example:

{ "results": [{ "toolCallId": "call_abc", "result": "Known about caller (Sarah Johnson): Sarah Johnson: prefers morning slots before 11 AM | Sarah Johnson: gets anxiety with novocaine | Sarah Johnson: booked cleaning May 14" }] }

Instruct the assistant to call it first

In your Vapi assistant's system prompt:

// system prompt At the start of every call, call recall_caller with the caller's phone number. Use what's returned to greet them naturally. Don't ask for info you already have.

Add the end-of-call save webhook

In the Vapi assistant config, set the Server URL to:

{{BASE_URL}}/v1/voice/vapi/save Header: Authorization: Bearer YOUR_MENGRAM_KEY

Vapi's Server URL receives all assistant events (status-update, partial transcript, end-of-call-report, etc.). Mengram filters internally — only end-of-call-report triggers extraction, everything else returns a benign 200. Safe to wire one Server URL for the whole assistant.

Mengram extracts entities, facts, and a summary from the final transcript — keyed to the caller's phone. Next call, recall_caller returns it.

What's under the hood

Per-caller isolation

Each phone number gets its own memory namespace. One Mengram account can power thousands of caller memories across your white-label clients.

Hybrid retrieval

Vector (pgvector) + BM25 + Reciprocal Rank Fusion. Beats pure cosine search on keyword-heavy queries like "what was the policy number".

Temporal decay

Ebbinghaus forgetting curve on facts (e^(-0.03 · days)). Old facts naturally fade. New ones surface first. No manual cleanup.

Background consolidation

Daily reflection cron synthesizes patterns across episodes — "this caller prefers morning slots", "anxiety about novocaine" — so the next call gets context, not raw transcripts.

FAQ

How is this different from mem0?

Mem0 is text-agent-shaped. You glue it to Vapi with n8n + custom code. Mengram's webhook adapter is Vapi-native: paste the JSON, you're done. Same hybrid retrieval underneath, just less wiring.

Will this work with Retell / Pipecat / LiveKit?

The recall/save endpoints are HTTP webhooks — anything that can POST JSON works. Native Pipecat processor and LiveKit agent helper aren't built yet; if you need one, email ali@mengram.io and I'll prioritize based on demand.

What about HIPAA?

Self-host gives you full data residency (Apache 2.0, your Postgres, your OpenAI key). Hosted-cloud BAA isn't yet available — for now, healthcare voice agents should self-host.

How much does it cost?

Free tier = 40 saves + 200 recalls/month, which is roughly 40 inbound calls/month (1 call = 1 recall + 1 save). Enough to validate the integration. Paid tiers from $5/mo. Each recall_caller consumes 1 search quota; each end-of-call save consumes 1 add quota. Full pricing →

What about latency?

Single recall: ~500–900ms. Under concurrent load (10–20 simultaneous calls): p50 ≈ 1200ms, p95 ≈ 1300ms. The agent calls recall_caller during the natural greeting pause, so callers don't notice — but if you have hard sub-1s SLA, benchmark in your setup first. Measured against mengram.io production with 1186-word transcripts indexed.

Can I test without a phone number using Vapi's web "Talk to Assistant" button?

Yes, but recall won't have a phone to key on (web calls don't have customer.number). You'll see "Web caller — no phone number yet" until you switch to a phone-attached assistant. For full end-to-end testing, buy a $1/mo Vapi phone number and call yourself.

What if my assistant config uses the OpenAI-style nested toolCalls shape?

Both work. Vapi sends toolCalls (nested function.name / function.arguments) and toolCallList (flat) — Mengram parses either. arguments can also arrive as a JSON string, handled.

Try it free

40 saves + 200 recalls per month on the free tier. No credit card.

Get your Mengram key →