SPEAKER NOTES — title slide. Welcome back to Session 3. In Session 1 we built the foundation. In Session 2 we gave the agent hands. Today we give it personality, a safe place to live on disk, and a roadmap for long-term memory. We also ship two new dev-tools that you'll want on your taskbar. Big session — let's go.
SPEAKER NOTES — recap. Quick recap so latecomers can catch up. Session 1 = a working chat app over Aspire/Blazor with five providers. Session 2 = the agent loop, two tool surfaces, one approval gate, and the security primitives every tool framework needs. Both sessions are recorded and the code is on the repo. If you missed them, the slides and demos are at docs/sessions.
SPEAKER NOTES — scope. Six chapters. The first two are "what shipped since session 2 — quality of life and tool-calling polish". Chapters three and four are the meat — skills and storage. Chapter five is forward-looking: where memory is going. Chapter six is hands-on. We'll move fast on the recap-y bits and slow down on the design choices.
SPEAKER NOTES — mental model. The whole session pivots on this. Skills = behavior, storage = boundaries. Both are about giving the agent more autonomy SAFELY. You can't ship skills if the agent can write anywhere on disk; you can't trust storage hardening if anyone can drop a malicious skill. They land together.
SPEAKER NOTES — Part 1 divider. Two new dev-tools we shipped between Session 2 and Session 3. They live in the system tray and they make day-to-day OpenClawNet development much less painful. Ten minutes total.
SPEAKER NOTES — pain points. This is what every local-LLM developer experiences. You forget to start Ollama, the agent times out, you debug for ten minutes before realizing the model server isn't even up. Or your Aspire app is running but you closed the dashboard tab. We built two tray apps that put the answers in your system tray.
SPEAKER NOTES — Ollama Monitor. First tool: Ollama Monitor. Distributed as a global dotnet tool — one command to install, lives in your system tray. Green = Ollama is up and serving. Yellow = up but slow. Red = down. Click the icon and you get model details, GPU stats, recent requests. The toasts are the killer feature — you find out before your demo does.
SPEAKER NOTES — features detail. Every five seconds we hit /api/tags and /api/ps. The sparkline shows tokens/sec aggregated across active requests. Quick stop/start uses the Ollama CLI under the hood. The status colors are deliberate — yellow doesn't mean broken, it means "your laptop is on battery and the model is paged out". Triple-fail before we toast so we don't spam you on a flaky network.
SPEAKER NOTES — install. Three lines to install and run. Auto-start adds a Windows scheduled task at logon — survives reboots, no startup-folder shortcut to manage. Settings are JSON, you can sync them across machines if you like. Log directory uses the standard "local app data" pattern — same place every other modern Windows app puts logs.
SPEAKER NOTES — demo. Live demo if Ollama is running. Right-click the tray icon. Show the loaded model, the GPU layer count — for a quantized 3B that should be 33/33 meaning fully on GPU. Sparkline shows the last 60 seconds. Active reqs goes up when you fire a chat request. The dashboard hotkey opens a more detailed window with per-request timeline.
SPEAKER NOTES — Aspire Monitor. Second tool. Aspire Monitor solves a different pain point: you're working on multiple Aspire apps, you forget which one is running, the dashboard URL changes every restart. This pins itself to a folder, knows which AppHost lives there, and gives you start/stop without going back to the terminal.
SPEAKER NOTES — Aspire features. The pinned window is the feature people fall in love with. You pin "Gateway", "Ollama", and "Open Dashboard" — those three sit in a tiny floating window in the corner of your screen. One click and you're at any of them. Behind the scenes we're hitting the Aspire dashboard's resource API, so we get health for free.
SPEAKER NOTES — Aspire install. Same install pattern as Ollama Monitor — dotnet tool, global, no admin needed. The --auto flag is for CI / demo recording: it starts the AppHost the moment the monitor opens. You can run multiple copies pointing at different folders and they'll show up as separate tray icons.
SPEAKER NOTES — demo. Right-click the tray icon. Three resources, all green, with their endpoints. Pinned section at the bottom — the three URLs I open ten times a day. Start/Stop/Restart applies to the whole AppHost. Logs opens the dashboard's logs page directly, not the dashboard root, so you skip a click.
SPEAKER NOTES — Part 2 divider. Now the under-the-hood work. Between Session 2 and 3 we did a chunk of plumbing on tool calling — alignment with OpenAI's format, a refactor of FileSystemTool, and three new sanitizers. None of it is glamorous; all of it makes the agent more reliable.
SPEAKER NOTES — why. Honest origin story. We tested mostly on Ollama in Session 2. As soon as we hit Azure OpenAI and Foundry with the same agent, we saw subtle differences in how tool calls were serialized — argument order, JSON quoting, error shapes. Same with the FileSystemTool: it had become a kitchen sink. This part of the session shows what we changed and why.
SPEAKER NOTES — format. The OpenAI tool-call shape is the de-facto standard. Arguments-as-string is the historical wart that everyone supports because the original API shipped that way. Microsoft.Extensions.AI gives us the right abstraction (AIFunction) but providers can drift. We canonicalized everything internally so we never have to special-case "is this Ollama or Azure?" in the runtime.
SPEAKER NOTES — diff table. What actually changed. The public ITool interface is identical — your custom tools from Session 2 still compile and run. What we cleaned up is the inside: a record-based ToolResult so callers can pattern-match, schema-driven argument parsing, structured error codes, sanitizers behind an interface. The FileSystemTool refactor is the one that matters most to the next slide.
SPEAKER NOTES — refactor. This is the structure we'll repeat for the other tools over the next sessions. One orchestrator that picks the operation, one file per operation, validators in their own folder. The win is testability — instead of mocking the entire tool, you test ReadOperation against a fake IFileSystem. The PathValidator is a thin wrapper that delegates to the new ISafePathResolver, which is the bridge into Part 4.
SPEAKER NOTES — sanitizer contract. Sanitizers are now first-class. Each tool declares which sanitizers it needs and they run in order before the operation body sees the input. This is the pattern we want for any future tool: never trust LLM-supplied strings, always sanitize, always have a structured rejection reason. The rejection reason flows back to the model so it can correct itself on the next turn.
SPEAKER NOTES — three sanitizers. PathSanitizer is the bridge to Part 4 — it's the user-facing layer over ISafePathResolver. UrlSanitizer keeps the Session 2 SSRF defenses but as a reusable component. JsonArgumentSanitizer is the one that pays for itself fastest: every time the model invents a property name or sends a number as a string, the sanitizer either coerces it correctly or rejects with a clear message. Tokens saved on retries pay for the engineering effort within a week.
SPEAKER NOTES — pipeline. This is the pipeline now. Four explicit gates, each independently testable. JsonArgumentSanitizer first because it's free and rejects most garbage. Then any path-typed argument goes through PathSanitizer. Then the approval policy. Only then does the operation body run. Every gate emits an audit record on accept and reject — H-8 in Part 4.
SPEAKER NOTES — diff. Same operation, two different worlds. The before is fine for a demo and a foot-gun in production. The after delegates everything dangerous to a single, audited resolver. No raw Path.GetFullPath on LLM input. No string-prefix check that breaks at C:\openclawnet vs C:\openclawnet-evil. No silent rewrite. This is the pattern we'll repeat for every path-taking tool.
SPEAKER NOTES — metadata unchanged. Important reassurance. If you wrote a custom tool in Session 2, NOTHING CHANGES for you at the API level. Same ITool, same metadata, same approval gate. You DO get the new sanitizers for free if you opt in. Backwards compatibility was a hard requirement of this refactor.
SPEAKER NOTES — NDJSON. One new event type — ToolSanitizationFailed — emitted when a sanitizer rejects an input before approval. The UI shows it as a yellow inline note in the conversation so the user can see "the model tried to read C:\Windows\System32 and the sanitizer blocked it". That transparency is gold for debugging prompt-injection attempts.
SPEAKER NOTES — payoff. This is the foundation slide. None of skills, storage hardening, or memory could land cleanly on top of Session 2's tool internals as they were. We had to do this refactor first. The provider-portability win is the big external benefit; the internal benefit is that the next sessions can ADD without rewriting.
SPEAKER NOTES — Part 3 divider. The biggest part of the session — about thirty slides. Skills are the feature most users will notice first. Markdown files that change agent behavior. Hot-reload. Per-agent enablement. Let's go.
SPEAKER NOTES — what is a skill. Read it on screen. Three lines of YAML, a paragraph of Markdown, and the agent now behaves like a senior .NET architect. No C#. No deploy. No restart. The point we're making all session: skills are CONTENT, not CODE. Anyone on the team can write one — your PM, your QA engineer, your security lead.
SPEAKER NOTES — vs tools. Lots of people see skills and ask "isn't that just a system prompt?" Yes — but with structure, lifecycle, and audit. The crucial column is "authored by". Tools require an engineer; skills don't. That single fact changes who in the company can shape agent behavior. Risk surface is also genuinely different — skills can't run code in v1 (S-8) but they can prompt-inject the model into doing bad things, which is why approval gates and per-agent enablement matter.
SPEAKER NOTES — headline bug. Painful one to admit but worth showing. We had two skill subsystems that both worked, neither knew about the other. Click disable, agent keeps using the skill. Click install, file lands in a folder the agent doesn't scan. Hot reload reloads the loader the agent doesn't use. This was the catalyst for the entire skills proposal.
SPEAKER NOTES — fix. We deleted the parallel loader. MAF — Microsoft Agent Framework — already implements the full agentskills.io spec, including progressive disclosure, YAML parsing, and resource tools. Our job becomes a thin scoped decorator that adds three things: layer attribution, per-agent enablement filtering, and structured logging. The UI calls our registry; the registry feeds MAF; MAF feeds the agent. One pipeline.
SPEAKER NOTES — 3-layer. Three layers, one precedence rule. System ships with the app and is read-only. Installed is shared by all agents and lives behind the import pipeline. Agents-slash-name is per-agent overrides — both for enablement (enabled.json) and for genuine custom skills. Quarantine is where imports land before approval. Highest layer wins on name collisions, so an agent can shadow a system skill with its own.
SPEAKER NOTES — shared storage. Drummond's call from the security review. The threat is what enters the system prompt at runtime, not what sits on disk. If we copied every installed skill into every agent's folder, we'd have N copies to update on every CVE. Users would skip the approval gate just to keep up. Shared storage with per-agent enablement is both safer and saner.
SPEAKER NOTES — frontmatter. agentskills.io is the open spec we're aligning with. It defines the core fields — name, description, license — and reserves a metadata namespace for vendor extensions. Our extras (tags, category, examples) move into metadata.openclawnet.* so we're forward-compatible with any other host that speaks the spec. MAF parses YAML the right way, including quoted multi-line strings — our hand-rolled parser was choking on those.
SPEAKER NOTES — old fields. The big change is dropping enabled-true from the frontmatter itself. Why: enablement is per-agent, not per-skill. Putting it in the file makes it look global. Defaults for "which built-ins are on by default for new agents" move to a SystemSkillsDefaults.json in the gateway content root. Cleaner separation of "what the skill is" from "who has it on".
SPEAKER NOTES — watcher. FileSystemWatcher is notoriously noisy — VS Code saves a file by writing a temp, deleting the original, and renaming. Three events for one save. We coalesce with a 500ms debounce: any number of events inside the window collapse to one rebuild. Per-layer watchers because the layers can live on different drives or volumes and we want independent failure domains.
SPEAKER NOTES — snapshot. Important design call. When the watcher fires we don't reach into running conversations and patch them. We rebuild a new immutable snapshot. Active turns finish on the old snapshot; the next turn picks the new one. This is the answer to Bruno's open-question Q2 in the proposal — "auto-reload mid-conversation? no — snapshot-per-turn." It avoids torn reads and keeps a single conversation deterministic.
SPEAKER NOTES — enabled.json. Three-valued logic. enabled = explicit on. disabled = explicit off. use-default = "ask the registry what the default is" — useful for new skills the agent hasn't been told about yet. The default for newly imported external skills is disabled. The default for built-ins is enabled. Authoritative state lives in SQLite so we can query "show me every agent that has skill X enabled" without reading every JSON file. The on-disk JSON is the human-friendly projection.
SPEAKER NOTES — fail closed. This is S-7 from the proposal — fail closed. It's deliberately friction. We don't want a user to accept an import dialog and have skill content slip into every agent's system prompt. Two gestures: import and enable. The enable step makes you choose which agents get it. If you imported by accident, no agent is affected.
SPEAKER NOTES — Skills page. We promoted skills out of the Settings sub-menu to a top-level nav item. Two halves: Browse (left) and Act (right). The "Enabled in" column is the killer — at a glance you see which agents have which skills. The per-agent assignment modal is where you flip enablement; we'll see it next.
SPEAKER NOTES — assignment modal. Single dialog. Default toggle at the top — what new agents inherit. Per-agent dropdowns below — explicit on, explicit off, or "use default". Save persists to SQLite, file projection updates on the next snapshot rebuild. The "effective on the next chat turn" copy is important — sets expectations about hot-reload semantics.
SPEAKER NOTES — import wizard. Four steps, deliberately. Entry collects the URL. Preview is the security-critical step: full file list, sizes, SHA-256 per file, rendered Markdown so you see what the model will see, and a diff against any previously installed version. The "this content enters the system prompt" warning is non-dismissable — that single sentence is the difference between an informed user and a click-through.
SPEAKER NOTES — allowlist. S-12 from the proposal. v1 ships with one trusted source. The reasoning: allowing arbitrary GitHub URLs turns import into a remote prompt injection primitive. Pinning to a commit SHA defeats time-of-check/time-of-use attacks where the upstream changes between preview and confirm. To add a new source, an admin edits appsettings — deliberately high friction.
SPEAKER NOTES — manual authoring. Critical escape hatch. The import pipeline is the safest path for sharing skills. But day-to-day, you and your team will hand-author skills in your editor, save them under agents/{name}/skills/, and the watcher picks them up automatically. No restart, no API call. This is also how you iterate while writing a skill — save, test, save, test.
SPEAKER NOTES — anatomy. Concrete example we can copy-paste. Frontmatter at the top — fenced by triple-dash. Body below — pure Markdown. The body becomes part of the system prompt verbatim. The examples array is metadata only — it doesn't get injected, but the UI uses it to show "try asking…" hints. Total file is usually under 2KB; max we allow is 256 KB (S-11) so a single skill can't blow your token budget by accident.
SPEAKER NOTES — bounded. Token budgets are real money. We cap each skill file size and the total contribution to the system prompt. If three big skills are enabled and they exceed the budget, the oldest by load order is dropped — but we WARN-audit so you can find out. Default budget is 8KB which fits comfortably in any modern context window. Configurable per agent.
SPEAKER NOTES — S-1..S-6. First half of the hardening list. S-1 is the manifest. S-2 is the file-type allowlist — no .py, .ps1, .dll, no executable bit, nothing starting with MZ or shebang. S-3 hands path resolution to the storage layer. S-4 stops a malicious skill from claiming to be the built-in shell-exec. S-5 is two-step preview/confirm. S-6 means upstream changes never apply silently — you re-run the gate.
SPEAKER NOTES — S-7..S-12. Second half. S-7 we covered. S-8 is "no executable content yet" — that's a future proposal with its own sandbox. S-9 audit trail covers every lifecycle event so you can answer "who installed what when" forensically. S-10 — disable takes effect on the next chat turn, not next process restart. S-11 token budgets. S-12 source allowlist. Twelve invariants, every PR is reviewed against them, no exceptions for v1.
SPEAKER NOTES — logging. Fourteen structured events covering the full lifecycle. LoggerMessage source-generators give us zero-allocation logging at hot-paths. Eight correlation fields means a single SQL query can answer "show me everything that happened during this user's last chat turn including which skills loaded, which functions invoked, which import attempts there were". OTel spans go straight into the Aspire dashboard you saw in Part 1.
SPEAKER NOTES — what we don't log. Critical sensitivity rule. Parameter values can contain anything — API keys, passwords, PII. Return values too. SKILL.md bodies are attacker-controlled, so logging them amplifies log-injection attacks. We log SCHEMA — types and shapes — plus size and a partial hash. That's enough for forensics, not enough to leak. Dylan's recommendation in the proposal.
SPEAKER NOTES — E2E. Six end-to-end tests are the acceptance criteria for K-1 (the foundational wave). Each one is a real HTTP request against a running gateway. Hot-reload test drops a file and waits one turn. Per-agent test asserts isolation. Import test exercises the full preview-confirm-install pipeline including SHA verification and quarantine cleanup. We run these in CI on every PR.
SPEAKER NOTES — waves. Four waves. K-1 is the foundation — delete the parallel loader, get one registry. K-2 is observability. K-3 is the import pipeline. K-4 is UX polish. The dependency arrow at the bottom is the punchline of this whole session: skills depend on storage. We can't safely write user-supplied content to disk until the storage layer enforces containment. Which is exactly Part 4.
SPEAKER NOTES — Part 4 divider. Twenty slides on storage. This is the load-bearing part of the session. Without H-1..H-8, skills can't ship safely; without a sane default root, users can't find their files. Both problems, one design.
SPEAKER NOTES — Bruno's question. Direct quote from the issue that started this. Bruno wants ONE root, discoverable, predictable. Today's default is buried in bin/Debug/net10.0/ — useless for end users. We need to fix the default AND harden every code path that takes a path from the LLM.
SPEAKER NOTES — what was wrong. Six concrete problems. Five about discoverability. One about safety. The last bullet is the dangerous one: even after we fixed the default to C:\openclawnet, the agent could STILL write anywhere on disk by emitting an absolute path. Redirection isn't restriction. The hardening review made that explicit.
SPEAKER NOTES — defaults. This is what your C:\openclawnet looks like after Session 3. Seven well-known subfolders, each with a clear purpose. Agents get their own folder per agent name — basis for future per-agent isolation. Models is shared. Workspaces are user-named scratch areas. Uploads and exports separate inbound from outbound user files. Skills you saw. Dataprotection-keys we'll cover in the ACL slide.
SPEAKER NOTES — config. Three sources. Env var wins so containers and CI can override without touching JSON. Appsettings is the everyday answer. Built-in default kicks in for first-run UX. AdditionalWritablePaths is the explicit allowlist for "yes, I want the agent to also be able to write here" — used carefully. The startup INFO log is the hardening recommendation: misconfiguration becomes visible in the dashboard, not silent.
SPEAKER NOTES — single name. Subtle threat model from Drummond. If you respect both OPENCLAWNET_STORAGE_ROOT and OPENCLAW_STORAGE_DIR, an attacker who can set one but not the other in a misconfigured container redirects all your writes. Pick one name, document it loudly, ignore everything else. The startup log includes the SOURCE of the value — env var, appsettings, or default — so misconfig is one Aspire dashboard glance away.
SPEAKER NOTES — resolver. The single chokepoint. Every tool that takes a path delegates to this resolver. Inside it, all the hardening invariants live in ONE testable class — not five copies across five tools. The scope parameter is the seam for future per-agent isolation: today it defaults to RootPath; tomorrow we can pass agents/{name}/ without an API break.
SPEAKER NOTES — H-1. Most important invariant. Every write — every single one — has to land under the storage root or under an explicit additional-paths allowlist. Reads can be broader because reads are lower-risk and you sometimes legitimately need to look at a sibling project. The crucial design decision: REJECT, not REWRITE. If the LLM emits C:\Windows\System32, we say "no" — we don't say "I'll quietly redirect that to C:\openclawnet\Windows\System32".
SPEAKER NOTES — H-2. The "stop sprawling implementations" rule. The most reliable way to make sure every path resolution is hardened is to have only ONE place that does it. We have a code-review checklist item: any new tool that takes a path string must inject ISafePathResolver and delegate. No exceptions. Adversarial unit tests live next to it.
SPEAKER NOTES — H-3. Subtle one. Path.GetFullPath does NOT follow reparse points — it resolves .. and redundant slashes, but a directory junction inside the storage root pointing at C:\Windows passes the prefix check. We use ResolveLinkTarget on the final path AND on every parent directory. Yes, it's expensive on cold paths; we cache. Symlinks created by the agent are forbidden outright — too easy to use as a stash.
SPEAKER NOTES — H-4. String prefix bug that bites every path-handling library at some point. C:\openclawnet vs C:\openclawnet-evil — same prefix, different directory. The fix is to require either equality OR startswith of root+separator. There's a regression test for this exact pair so a future refactor can't reintroduce the bug.
SPEAKER NOTES — H-5. Goodbye to the old denylist of three substrings. Allowlist beats denylist every time. Sixty-four character cap because Windows MAX_PATH gets ugly past that. Reserved device names — CON, PRN, AUX, NUL, COM1-9, LPT1-9 — are special on Windows and would create a directory you can't delete. Trailing dots and spaces too: Windows silently strips them, so "foo." and "foo" collide in surprising ways.
SPEAKER NOTES — H-6. Forward-looking. Today the agent runtime passes scope=null and the resolver uses RootPath. Tomorrow when we ship multi-agent isolation — Slack agent vs Telegram agent vs research agent — the runtime can pass agents/SlackAgent/ and that single agent invocation can ONLY write into its own subtree. Cross-agent leakage becomes impossible. Today: just the parameter. Tomorrow: the policy.
SPEAKER NOTES — H-7. ACL hardening for the directories that hold secrets. By default C:\openclawnet inherits from the volume root, which on most Windows installs grants Users(OI)(CI)M — every local user can read it. That's the wrong default for a directory holding ASP.NET DataProtection keys, OAuth tokens, and future API key vaults. We set an explicit DACL on the credential subdirs at startup: current user + SYSTEM, no inheritance. POSIX gets chmod 0700. If the check fails, we refuse to start the credential-bearing services with a clear remediation message.
SPEAKER NOTES — H-8. Every write to disk leaves a trace. Successful writes get the resolved path, byte count, SHA-256 of contents, source attribution (was this LLM-suggested or user-explicit), correlation ids. Failed writes — blocked traversal, ACL denied, name allowlist failure — also audited at WARN with the original unresolved input string for forensics. Combined with skill audit (S-9 from Part 3) you can tell exactly what happened during any chat turn.
SPEAKER NOTES — recap. Eight invariants on one slide. Memorize these — they're the contract any path-taking code must satisfy. Same as the skills S-1..S-12 list, every PR is reviewed against them. We have unit tests covering each one with adversarial cases.
SPEAKER NOTES — wiring. Three extension methods, in this order. AddOpenClawStorage binds StorageOptions, ensures the directory tree, and runs the ACL hardening. AddSafePathResolver registers the singleton resolver. AddOpenClawTools wires every built-in tool to use the resolver. Custom tools just inject ISafePathResolver and they're done.
SPEAKER NOTES — settings UI. Up until now you had to know about Storage:RootPath in appsettings to even check where files were going. The Settings page now has a Storage card showing the current root, the SOURCE (so you know if it came from env or config), and free space. Move-root requires a restart by design — we don't want to migrate live writes. ACL status surfaces H-7 violations as red dots.
SPEAKER NOTES — migration. We dropped the /storage suffix between releases. Existing installs would lose track of their files unless we handle migration explicitly. On first boot we detect the old layout, ask the user, and either move atomically or keep the old root pinned via config. The CLI flag exists for unattended deployments where prompting isn't possible.
SPEAKER NOTES — Part 5 divider. The forward-looking part. Memory is mostly designed, partly built, and the production-grade version is what Session 4 will pick up. Eight slides on what the problem is, what the strategy is, and what's coming next.
SPEAKER NOTES — context window problem. Why memory matters. Even with a 128K context window, every turn re-sends the entire history. After 100 messages your prompts are huge, your latency is high, your GPU is hot, and the model starts losing the middle of the context anyway (the U-shape attention problem). Naive truncation — drop the oldest N — means the agent forgets the user's name. Both are bad.
SPEAKER NOTES — strategy. Three-tier strategy. Recent messages stay verbatim because the model needs them word-for-word for coherence. Older messages collapse into a paragraph summary. Very old messages are gone from the active prompt entirely but stored in a vector index — the agent can retrieve them by semantic similarity when relevant. This is the standard pattern across every modern agent framework.
SPEAKER NOTES — entity. One small EF Core entity. SessionId is the foreign key, Summary is the actual prose, CoveredMessageCount tells the prompt composer "the first N messages of this session are summarized — start verbatim from message N+1". Each summary is immutable; new summaries are added rather than updated, so we can rebuild any historical view of the conversation.
SPEAKER NOTES — interface. Three methods. Get the latest summary so the composer can inject it. Store a new summary when the threshold fires. GetStatsAsync feeds the UI memory panel. The factory pattern is the right one for async services — singleton service, scoped DbContext per call, no thread-safety pitfalls. The 20-message threshold is configurable per session.
SPEAKER NOTES — embeddings. Embeddings are the primitive that powers semantic search. Microsoft offers managed embedding APIs but for local-first development we use Elbruno.LocalEmbeddings which wraps ONNX. 384 dimensions is the all-MiniLM size — tiny, fast, good enough for conversational retrieval. The example shows the win: "dependency injection" and "IoC container" embed to vectors that are 0.82 cosine similar even though they share no surface words.
SPEAKER NOTES — semantic search. The eventual third tier. Every message gets embedded once and cached. New question comes in, we embed it, run a cosine-similarity scan over past message embeddings, take the top K, inject those snippets as additional context. SQLite's not a vector DB but at conversation-history scale (thousands of messages) it's perfectly adequate — we use a serialized BLOB column and a hand-rolled top-K. If you grow past that, swap in DuckDB or a real vector store with no API change.
SPEAKER NOTES — dashboard. Transparency is a feature. Users panic when an AI claims to "remember" things — they want to know how. The dashboard shows total messages, how many are summarized (and into how many summaries), how many are still verbatim, when the last summary fired, and the token impact. The "was 26 K" number is the most powerful — it shows the savings without summarization in concrete terms.
SPEAKER NOTES — what's coming. Where we are: the design is done, the entity exists, the interfaces are stable. What's left: a robust summarizer that handles model failures gracefully, a real vector index, per-agent memory isolation, and import-export. All of that lands in Session 4 along with cloud providers and the scheduler. By the end of Session 4 the agent will remember across restarts and across sessions.
SPEAKER NOTES — Part 6 divider. Three demos, eight minutes total. We'll start the stack, hit the skills API, and drop a hand-authored skill in to prove hot-reload.
SPEAKER NOTES — demo 1. Live demo. aspire run on the repo, watch Aspire Monitor in the tray go green as resources come up. Open the dashboard, scroll the log to find the storage line we shipped this session — "Storage root resolved to C:\openclawnet (source: default)". Source is the value the hardening review asked for. If env var is set, it shows "source: env". If appsettings, "source: config". Visible at a glance.
SPEAKER NOTES — demo 2. Curl the skills endpoint. Two skills, both system layer, both enabled for GptAgent. Note the per-agent shape: enabledFor is an array, not a global boolean. POST disable, then send a chat message — the agent responds without the memory skill in its prompt. We verify by checking the prompt audit. This is the unified API the headline-bug slide promised.
SPEAKER NOTES — demo 3. The "wow" demo. Author a skill in real time. Save the file. Watch the gateway log emit SkillDiscovered and SkillLoaded — that's the FileSystemWatcher and the registry rebuild firing. Send a chat message in the UI and the response is suddenly two sentences and lacks "leverage". No restart. No deploy. No code. That's the whole pitch in 60 seconds.
SPEAKER NOTES — closing divider. Two slides plus a question slide.
SPEAKER NOTES — insights. Five takeaways. Skills are content not code — that's the most user-visible win. Storage is fail-closed — that's the safety win. Memory is transparent — that's the trust win. Tool-calling portability is the engineering win. Tray monitors are the dev-experience win. Each item maps to one of the six parts of the session.
SPEAKER NOTES — what we built. Twelve checkmarks. Half are user-visible (monitors, skills page, settings card). Half are under-the-hood quality (refactors, sanitizers, hardening). The two halves go together: the user-visible features are only safe BECAUSE the under-the-hood work landed first. That's the lesson of session 3.
SPEAKER NOTES — Session 4 preview. Where we go next. Cloud providers means the same agent runs against Azure OpenAI without code changes — the tool-calling alignment work in Part 2 is what makes that possible. Scheduling means cron-driven jobs that the agent runs autonomously. Memory at scale finishes the work we sketched in Part 5. Tests + health checks turn the demo into a deployment. Session 4 is the finale.
SPEAKER NOTES — closing. Thanks everyone. Repo is github.com/elbruno/openclawnet, MIT licensed. Everything from today — slides, speaker script, copilot prompts, the proposal documents — lives under docs/sessions/session-3/. The two new tools install with one dotnet tool install -g command and live in your tray. If you want to extend something, the manual skill drop-in is the most rewarding starting point: write a SKILL.md, save it, see the agent change. Questions?