Memory & Cache¶
Agora has a persistent memory system that lets the model remember information across conversations. Combined with automatic embedding-based caching, it provides a knowledge base that grows with your usage.
Memory Types¶
Active Memory¶
A single, always-on memory context that is included with every API call to the model. Think of it as a sticky note the model always sees.
Use Active Memory for: - Your name, preferences, and background - Project context the model should always know - Standing instructions that apply to all conversations - Facts you're tired of repeating
Example Active Memory content:
User: Newo Ether
Preferences: Prefers Chinese for casual chat, English for technical topics.
Project: Building Agora — a BYOK Android LLM client.
Coding style: Kotlin, Jetpack Compose, MVVM architecture.
Editing Active Memory¶
- Go to Settings → Memory
- Scroll to Active Memory
- Tap Edit Active Memory
- Enter your content
- Tap Save
The model can also update active memory via tool calls if Access Active Memory is enabled.
Saved Memories¶
A collection of named memory files the model can search, read, create, edit, and delete. Unlike Active Memory (always sent), saved memories are retrieved on demand.
Use Saved Memories for: - Reference material (API docs, config details, commands) - Project-specific notes - Learnings and insights from past conversations - Anything you want the model to recall when relevant
Creating Memories Manually¶
- Go to Settings → Memory
- Tap Add Memory
- Enter:
- Title — descriptive name
- Description — brief summary (used for search matching)
- Content — the full memory content
- Tap Create
Model-Created Memories¶
When Access Saved Memories is enabled, the model can create, read, update, and delete memory files via tool calls. This lets the model:
- Remember facts you tell it
- Save useful code snippets or configurations
- Build a knowledge base over time
- Clean up outdated information
Memory Permissions¶
Control what the model can access:
| Setting | Location | When to Enable |
|---|---|---|
| Access Saved Memories | Settings → Memory | You want the model to read/write memory files |
| Access Active Memory | Settings → Memory | You want the model to update the persistent context |
| Access Past Conversations | Settings → Conversation Search | You want the model to search chat history |
All three default to off. Enable only what you need.
Auto-Cache¶
Auto-caching automatically generates embeddings for new messages as they arrive. This keeps your conversation search index up to date without manual intervention.
Enable Auto-Cache¶
- Go to Settings → Conversation Search
- Choose an embedding model (if you haven't already — see Embedding / RAG)
- Under Caching, toggle Auto-cache new messages
When enabled, every new message (user and model) is automatically embedded and indexed for semantic search.
Manual Caching¶
If auto-cache is off, you can manually cache messages:
- Go to Settings → Conversation Search
- Tap Cache — computes embeddings for all uncached messages
- Progress is shown as a circular indicator
Tap Re-cache to rebuild the entire index from scratch. This deletes all cached embeddings and re-processes every message. Use when: - You've changed embedding models - The cache seems corrupted or outdated - Search results are unexpectedly poor
Warning
Re-caching is irreversible and may take a while depending on your message count and embedding model speed.
Cache Status¶
The embedding model settings show how many messages are cached vs. uncached: - "All N messages cached" — up to date - "X of Y messages not cached" — backlog to process
Memory Tool Calls in Chat¶
When the model uses memory tools, you'll see inline cards:
| Tool | Card Text |
|---|---|
| Look up | "Looked through N saved memories" |
| Read | "Read [memory name]" |
| Save | "Saved [memory name]" |
| Edit | "Updated [memory name]" |
| Delete | "Removed [memory name]" |
| Update Active | "Updated active memory" |
Tap any card to see the full content that was read or written.
Best Practices¶
- Keep Active Memory concise — it's included in every API call, so verbose content wastes tokens
- Use descriptive titles for Saved Memories — titles help the model find the right memory
- Enable auto-cache if you use conversation search regularly
- Re-cache after switching embedding models — different models produce incompatible embeddings