Reference sync ledger
=====================

Reference repo:
- path: `references/mempalace`
- pulled on: `2026-04-27`
- fast-forward: `87102fb` -> `94f1689`

Upstream areas changed in this pull:
- `mempalace/project_scanner.py`
- `mempalace/convo_scanner.py`
- `mempalace/llm_client.py`
- `mempalace/llm_refine.py`
- `mempalace/entity_detector.py`
- `mempalace/miner.py`
- `mempalace/config.py`
- `mempalace/mcp_server.py`
- `mempalace/palace_graph.py`
- `mempalace/cli.py`
- `hooks/*`
- tests and docs around the above

Ported into Rust in this session
--------------------------------

1. `crates/core/src/project_scanner.rs` added
- Manifest-first project discovery for `package.json`, `pyproject.toml`, `Cargo.toml`, `go.mod`
- Git-author based people discovery with bot filtering and identity dedupe
- Claude Code conversations-root project discovery (`.claude/projects/*/*.jsonl`) using `cwd`
- Real-signal-first merge with prose/entity detection

2. `crates/core/src/cli.rs`
- `mpr init --yes` now uses `project_scanner::discover_entities(...)` instead of raw text-only scan
- `mpr init --yes` now writes `<dir>/entities.json`
- Detected people/projects now merge into the global registry path from `Config::registry_file_path()`

3. `crates/core/src/entity_registry.rs`
- Added `merge_detected_entities(...)` for init-time registry enrichment

4. `crates/core/src/entity_detector.rs`
- Synced upstream-style false-positive reduction for bare `NAME:` metadata lines
- Added boilerplate filename skips (`LICENSE`, `NOTICE`, etc.) during detection scans
- Added versioned candidate extraction (`MemPalace_v2` style)
- Added stronger pronoun-only promotion logic
- Preserved existing Rust behavior for strong single-signal person detection (dialogue/action/direct-address repeated enough)
- Added regression tests for the new cases

5. `crates/core/src/miner.rs`
- `entities.json` is now skipped during project mining, matching upstream intent

6. `crates/core/src/config.rs`
- `MEMPALACE_PALACE_PATH` / `MEMPAL_PALACE_PATH` are now normalized to an expanded absolute path

7. `crates/core/src/mcp_server.rs`
- `mempalace_diary_write` accepts optional `wing`
- `mempalace_diary_read` accepts optional `wing`; when omitted it reads diary entries across all wings for that agent

Remaining gaps after this sync
------------------------------

Still not ported from upstream `94f1689`:

1. LLM-assisted init/entity refinement
- Python: `mempalace/llm_client.py`, `mempalace/llm_refine.py`
- Rust status: no `mpr init --llm`, no provider abstraction, no refine pass

2. CLI surface for LLM init flags
- Python added: `--llm`, `--llm-provider`, `--llm-model`, `--llm-endpoint`, `--llm-api-key`
- Rust status: absent

3. Palace graph cache invalidation
- Python added TTL cache + invalidation hooks in `palace_graph.py`
- Rust status: graph rebuilt from DB each time; no cache/invalidation layer added in this sync

4. Hook/doc sync
- Python hook docs/scripts changed around transcript auto-mine UX and interpreter resolution
- Rust status: wrapper scripts remain intentionally Rust-delegating; docs were not fully reworded to mirror the new Python hook narrative

5. i18n/doc site sync
- Python repo added/expanded several `i18n/*.json` files and docs polish
- Rust status: not mirrored

6. Broader test parity for the new upstream modules
- Rust now has targeted regression tests for scanner/entity/diary changes
- Full parity suite for the newly introduced upstream modules is still open

Beads / parity tracking
----------------------

These remaining gaps map onto the existing parity program:
- `mr-nrt.17` — MCP, registry, config, hook/instructions parity tests
- `mr-nrt.19` — final parity gate and approved-deviation ledger
- `mr-nrt.20` — docs sync and final parity acceptance

This file is the current gap ledger for the `94f1689` reference state.


Sync update — upstream HEAD `68319dc` (2026-05-11)
==================================================

Followed `.devin/skills/sync-upstream-mempalace/SKILL.md`. Diffed upstream
`94f1689..68319dc` (335 commits) and ported the high-priority correctness
fixes below. Schema-changing and feature-flag work is deferred and listed in
the updated "Remaining gaps" section.

Ported in this session
----------------------

1. `mempalace/mcp_server.py` `e9222b4` (#1243) — `tool_diary_write` /
   `tool_diary_read` now lowercase `agent_name` so reads are
   case-insensitive ("Claude" / "claude" / "CLAUDE" all resolve to the
   same agent identity).
   - Rust: `crates/core/src/mcp_server.rs::tool_diary_write` and
     `::tool_diary_read`, regression test
     `mcp_server::tests::test_diary_read_case_insensitive_agent`.

2. `mempalace/knowledge_graph.py` `0b8c2c1` (#1214) — reject inverted
   intervals (`valid_to < valid_from`) at write time; they would otherwise
   be invisible to every KG query.
   - Rust: `crates/core/src/knowledge_graph.rs::add_triple`, regression
     tests `test_add_triple_rejects_inverted_interval` and
     `test_add_triple_allows_point_in_time_and_open_intervals`.

3. `mempalace/entity_registry.py` `4f36145` + `2e441d1` (#1215) — atomic
   save: write to sibling `.tmp`, fsync, rename over target, fsync the
   parent directory on Unix for ext4-class durability.
   - Rust: `crates/core/src/entity_registry.rs::save`, regression tests
     `test_save_leaves_no_temp_file_on_success`,
     `test_save_overwrites_previous_content_atomically`,
     `test_save_sets_owner_only_permissions`.

4. `mempalace/config.py` + `mempalace/mcp_server.py` `4d98b05` (#1164) —
   validate ISO-8601 date / canonical UTC datetime at the MCP boundary
   for `as_of`, `valid_from`, `valid_to`, and `ended`. Malformed dates now
   produce a clear MCP error instead of silently returning empty result
   sets.
   - Rust: new `config::sanitize_iso_temporal`, used by `tool_kg_query`,
     `tool_kg_add`, and `tool_kg_invalidate`. Tests in `config::tests`
     (`test_sanitize_iso_temporal_*`) plus MCP-level
     `test_kg_add_rejects_invalid_iso_date` and
     `test_kg_query_rejects_invalid_iso_date`.

5. `mempalace/mcp_server.py` `e4e25ed` (#1314) — `tool_kg_add` forwards
   `valid_to` and `source_file` (so callers can backfill an already-ended
   historical fact in a single call and preserve adapter provenance);
   `tool_kg_invalidate` resolves an omitted `ended` to today's date in
   the response instead of returning the literal sentinel `"today"`.
   - Rust: `crates/core/src/mcp_server.rs::tool_kg_add` (schema +
     handler), `tool_kg_invalidate` (handler), regression tests
     `test_kg_add_forwards_valid_to` and
     `test_kg_invalidate_resolves_default_ended_to_today`.
   - Note: `source_drawer_id` from upstream is **not** ported here because
     it requires a `triples` table migration. See "Remaining gaps".

Remaining gaps after this sync
------------------------------

1. Gitignore-aware drawer prune (`1d3eecb` feat(sync))
   - Python added `mempalace/sync.py` to remove drawers whose source
     files are deleted or `.gitignore`-d.
   - Rust status: no equivalent module; opportunity for a follow-up PR.

2. (Carried forward) LLM-assisted init/entity refinement
   - Python: `mempalace/llm_client.py`, `mempalace/llm_refine.py`
   - Rust status: no `mpr init --llm`, no provider abstraction, no refine
     pass.

3. (Carried forward) CLI surface for LLM init flags
   - Python: `--llm`, `--llm-provider`, `--llm-model`, `--llm-endpoint`,
     `--llm-api-key`.
   - Rust status: absent.

4. (Carried forward) Palace graph cache invalidation
   - Python: TTL cache + invalidation hooks in `palace_graph.py`.
   - Rust status: graph rebuilt from DB each time; no cache layer.

5. (Carried forward) Hook/doc sync, i18n/doc site sync.

Sync update — upstream develop `68319dc` (2026-05-11, follow-up)
----------------------------------------------------------------

Follow-up PR to PR #5. Ports the schema-changing `source_drawer_id`
provenance from upstream `e4e25ed` (#1314 / RFC 002 §5.5) that was
explicitly deferred in the previous sync.

Ported in this session
----------------------

1. `mempalace/knowledge_graph.py` schema + `mempalace/mcp_server.py`
   `e4e25ed` (#1314, RFC 002 §5.5) — add adapter provenance columns and
   wire them through the MCP boundary.
   - Schema: `triples` table now declares `source_drawer_id TEXT` and
     `adapter_name TEXT` directly in the canonical `CREATE TABLE`. Legacy
     palaces are migrated in-place by a new `KnowledgeGraph::migrate_schema`
     that introspects `PRAGMA table_info(triples)` and only issues the
     `ALTER TABLE ADD COLUMN` when the column is missing — mirrors the
     upstream `_migrate_schema` no-op-on-new-installs contract.
   - API: `KnowledgeGraph::add_triple` gains `source_drawer_id` and
     `adapter_name` Option<&str> kwargs (defaulting to None for source
     compatibility); `Triple` and `EntityQueryResult` expose
     `source_drawer_id` (and `Triple` exposes `adapter_name`) so callers
     can navigate back to the drawer that produced the fact.
   - MCP: `tool_kg_add` accepts a `source_drawer_id` input and forwards
     it to the KG layer; the tool schema advertises the new property.
     `adapter_name` is intentionally not exposed at the MCP boundary,
     matching upstream (only adapters set it).
   - Tests: `knowledge_graph::tests::test_add_triple_persists_source_drawer_and_adapter`,
     `test_query_entity_exposes_source_drawer_id`,
     `test_migrate_schema_adds_missing_provenance_columns`, and
     `mcp_server::tests::test_kg_add_forwards_source_drawer_id`.

Remaining gaps after this sync remain unchanged (gitignore-aware drawer
prune, LLM init, palace-graph cache, hook/doc/i18n sync). No new gaps
were introduced by this PR.

This file is the current gap ledger for the `68319dc` reference state.


Sync update — upstream develop `bb31396` (2026-05-15)
=====================================================

Followed `.devin/skills/sync-upstream-mempalace/SKILL.md`. Diffed upstream
`68319dc..bb31396` (11 non-merge commits touching `mempalace/`) and ported
the two highest-priority upstream fixes: a CLI-ergonomics diagnostic for
silently-skipped symlinks during mining and a structured MCP error for
unknown parameter names. Chunk-size configuration plumbing (#1024, #1410,
#1519, plus the `342cfa6`/`bea7cb1` follow-ups) is feature-flag work and
is deferred to a follow-up sync.

Ported in this session
----------------------

1. `mempalace/miner.py` `becf561` + `d7d9604` (#1462) — `mine_project` and
   `mine_conversations` used to silently skip symlinks, so a directory full
   of symlinked files surfaced only as a baffling `Files: 0` line with no
   indication why. Emit a `  SKIP: <relative-path> (symlink)` diagnostic to
   `stderr` (stdout stays clean for callers parsing the `Files:` /
   `Drawers filed:` markers). The polished path uses the scan-root-relative
   posix-style path so nested symlinks remain unambiguous.
   - Rust: `crates/core/src/miner.rs::scan_project` and
     `crates/core/src/convo_miner.rs::scan_convos`. Regression tests
     `miner::tests::test_scan_project_skips_symlinks`,
     `test_scan_project_skips_nested_symlinks`,
     `test_scan_project_skips_dangling_symlinks`,
     `convo_miner::tests::test_scan_convos_skips_symlinks`, and
     `test_scan_convos_skips_dangling_symlinks` (`#[cfg(unix)]`-gated since
     the Windows symlink API requires admin privileges).

2. `mempalace/mcp_server.py` `51c03a0` (#1512) — unknown kwargs at the MCP
   dispatch layer (a typo like `text=` instead of `content=`) used to be
   silently dropped by the schema filter, only resurfacing indirectly as a
   later "Missing required 'X'". Emit `-32602` naming the offending kwarg
   instead, symmetric with the missing-required path. Gated by an
   `accepts_var_keyword` equivalent (`TOOLS_ACCEPTING_EXTRAS` — currently
   `mempalace_add_drawer`, whose Input uses `#[serde(flatten)]` for custom
   metadata), and skips the internal `wait_for_previous` transport kwarg.
   - Rust: `crates/core/src/mcp_server.rs::validate_known_params`, wired
     into `invoke_with_wal`. Tool-name → allowed-property lookup is cached
     once via `OnceLock` from `make_tools()`'s JSON schema so the schema
     stays the single source of truth. Regression tests
     `mcp_server::tests::test_unknown_param_returns_invalid_params_for_wrong_kwarg_name`,
     `test_two_unknown_params_list_both_names`,
     `test_wait_for_previous_not_flagged_as_unknown`, and
     `test_add_drawer_accepts_unknown_custom_metadata_keys`.

Remaining gaps after this sync
------------------------------

Carried forward unchanged: gitignore-aware drawer prune, LLM-assisted
init / CLI flags, palace-graph cache invalidation, hook/doc/i18n sync,
broader test parity for upstream-added modules.

New deferred work (not ported here):

1. Configurable chunk sizes (`1cc2876` feat (#1024), `b1d75b3` /
   `fd63703` follow-ups (#1410, #1519), `bea7cb1` / `342cfa6` validation
   fixes). Upstream wires `--chunk-size`/`--chunk-overlap` through CLI →
   config → `miner` / `convo_miner`. Touches non-trivial config plumbing
   and is not yet user-requested in the Rust port.

2. CHANGELOG / docs polish around the same chunk-size feature (`68dc1bf`).


Sync update — upstream develop `1b94f4e` (2026-05-19)
=====================================================

Followed `.agents/skills/sync-upstream-mempalace/SKILL.md`. Diffed upstream
`bb31396..1b94f4e` (~60 non-merge commits) and ported the highest-priority
correctness fixes below. Also: cleaned up leftover benchmark/plan files,
moved the sync skill from `.devin/skills/` to `.agents/skills/`, and
refreshed the VitePress website to the upstream Crystal-Lattice landing page.

Ported in this session
----------------------

1. `mempalace/entity_registry.py` `e6f7398` (#1373) — clean up the `.tmp`
   sidecar file on any write/rename failure during atomic save. Previously
   a disk-full or perms-flip error left stale `.tmp` debris in the palace
   directory.
   - Rust: `crates/core/src/entity_registry.rs::save` — the write+chmod+
     rename block is wrapped in a closure; on error the `.tmp` is unlinked
     before propagating. Regression test
     `test_save_cleans_tmp_on_rename_failure`.

2. `mempalace/convo_miner.py` `92de001` (#1505) — scope `file_already_mined`
   and drawer IDs by `extract_mode` so mining the same transcript with
   `--extract general` after `exchange` (or vice versa) is no longer silently
   skipped. Legacy drawers without `extract_mode` are treated as exchange-mode
   for back-compat.
   - Rust: `crates/core/src/palace_db.rs::file_already_mined_with_mode`,
     `crates/core/src/convo_miner.rs::generate_drawer_id` (now folds
     `extract_mode` into hash input).

3. `mempalace/palace_graph.py` `fba6372` (#1504) — preserve hyphenated wing
   slugs in `create_tunnel` write-path. `list_tunnels` now normalizes both
   the stored and queried wing names at read-time so legacy underscore data
   and post-fix verbatim data both resolve via either query form.
   `compute_topic_tunnels` canonicalizes wing keys to prevent duplicate
   tunnels.
   - Rust: new `config::normalize_wing_name` helper, new
     `palace_graph::_normalize_wing`. `list_tunnels` and
     `compute_topic_tunnels` updated.

4. `mempalace/convo_miner.py` `3cac26f` (#1534) — enforce `CHUNK_SIZE` in
   `chunk_by_paragraph` and its line-group fallback. A single oversized
   paragraph (e.g. a 135 KB Claude Code paste) was emitted as one giant drawer,
   blowing the embedding model's attention budget.
   - Rust: new `emit_bounded` helper (char-boundary-safe slicing), wired
     into `chunk_by_paragraph` and `chunk_by_exchange`. Constants
     `CHUNK_SIZE`, `LINE_GROUP_SIZE`, `LINE_FALLBACK_MIN_NEWLINES` added.

5. `mempalace/palace.py` + `mempalace/cli.py` `c933095` (#1498) — split the
   single "No palace found / run init" message into three actionable states
   so users no longer see the same wrong hint whether the palace dir is
   missing, init-pending, or empty. Backend gains a typed
   `CollectionNotInitializedError` (subclass of `PalaceNotFoundError`).
   - Rust: new `palace_db::PalaceState` (`Missing` / `NotInitialized` /
     `Empty` / `Ready`) + `classify_palace` + `print_palace_state_hint`.
     `cmd_compress` and `cmd_status` route through the classifier;
     `searcher::SearchError` gains `NotInitialized` and `Empty` variants
     so the search error message matches the state. Regression tests
     `test_classify_palace_*` (palace_db) and
     `test_search_memories_palace_*` (searcher).

Not ported (Rust N/A)
---------------------

Each entry below is a commit in `bb31396..1b94f4e` (~36 non-merge commits)
that does not apply to the Rust port. Rationale recorded so the next sync
can skip the same triage.

Python-only fixes that don't apply to Rust's typed implementation:

- `#1426` (`afaedc6`) — tolerate `None` metadata cells. Rust's
  `HashMap<String, serde_json::Value>` is statically typed; entries are
  either present (and a concrete `Value` variant) or absent. No `None`
  cell to tolerate.
- `#1473` (`3e92719`) — surface `tool_create_tunnel` `ValueError`. Rust's
  MCP layer returns typed `Result` errors and has no `ValueError` masking
  path; the equivalent error is already surfaced via `ErrorData`.
- `02c05aa` — refactor merging the two `ValueError` guards. Follow-up to
  `#1473`. N/A for the same reason.
- `2c75c6f` + `c49c10f` — UTF-8 lock holder encoding. Rust `String` is
  UTF-8 by definition and `mine_palace_lock` writes via `writeln!`, which
  emits UTF-8 bytes; no `bytes vs str` mismatch exists.
- `2bf3621` — refactor `_metadata_matches_extract_mode` helper. The Rust
  port already inlines the equivalent check in `file_already_mined_with_mode`
  (#1505 port above); extracting a one-liner helper has no payoff in Rust.

Python ChromaDB / FTS5 / HNSW-specific fixes (Rust uses embedvec):

- `3da27d3` (#1516 / #1517) — VACUUM + FTS5 rebuild after repair to
  reclaim SQLite space. Rust's drawer storage is `palace.json`
  (HashMap-backed) plus embedvec vector files; no ChromaDB SQLite
  metadata table and no FTS5 index to rebuild. Repair already truncates
  and rewrites the JSON.
- `e811b6d` — avoid false HNSW metadata quarantine. Chroma-specific
  quarantine code path; embedvec stores HNSW directly without the
  metadata-collection sidecar that triggers the false positive.
- `cf96e68` (#1495) — cold-start embedder diagnostics + opt-in warmup.
  Targets `chromadb.utils.embedding_functions`; Rust embeds via
  `edgebert` / `onnxruntime` and has its own warmup path
  (`OnnxModel::warmup`).
- `cb13036` — close SQLite connection + clean temp palace on exception
  during migrate. Rust's `migrate.rs` uses RAII: the `Connection`
  auto-closes on scope exit and the function does not currently
  `mkdtemp`-swap (the embedvec upsert path is a TODO). Will revisit when
  the swap step lands.

Python KG-cache-specific fixes (Rust has no in-process KG handle cache):

- `#1372` (`6f9fe80`) — canonicalize KG cache key via realpath +
  normcase. Rust opens a fresh `KnowledgeGraph` per MCP call (see
  `mcp_server.rs::kg_path`); there is no `_kg_by_path` dict to keep
  consistent.
- `51eb279` + `4dc3ccf` + `3ee158e` — capture canonical KG path /
  thread it through `_get_kg` / tighten `_canonicalize_kg_path` tests.
  Follow-ups to #1372. N/A for the same reason.

Python hook-script + test-isolation fixes (Rust delegates to upstream
Python hooks unchanged):

- `e241aa3` + `2bdb1e8` + `c736a57` (#1440) — `mapfile` → `sed` for bash
  3.2 compat, post-review hardenings, force `umask 022`. The Rust port
  carries the upstream `hooks/*.sh` files verbatim; bash-script-level
  fixes are inherited at next vendor.
- `eee5f32` + `b201255` (#1443) — write placeholder PID when claiming a
  mine slot. Rust's `mine_palace_lock` already writes the live PID via
  `writeln!` at lock creation. The Python-style separate "claim slot"
  helper does not exist in `hooks_cli.rs`.
- `90fece7` + `426d164` (#1510) — isolate `_MINE_PID_DIR` / `test_hooks_cli`
  from `test_cli`. Pytest-fixture-specific.
- `c736a57` — force `umask 022` in hooks tests. Pytest-fixture-specific.

Python style / CI fixes (no Rust counterpart):

- `2cab6f5` + `83e4711` + `3c74ba6` + `419a34c` — ruff 0.15.9 reformats.
  Rust port uses `cargo fmt` + `clippy -D warnings` and already passes.
- `283ef03` + `96a3272` — pin ruff in pre-commit + uv.lock. Rust port's
  CI pins toolchain via `rust-toolchain.toml`; no ruff equivalent.

Specific upstream review polishes:

- `990b155` — address Gemini review on #1532. Targets a Python-side
  refactor that does not exist in Rust.

Remaining gaps after this sync
------------------------------

Carried forward unchanged: gitignore-aware drawer prune, LLM-assisted
init / CLI flags, hook/doc/i18n sync, broader test parity.

Closed by this sync: palace-graph cache invalidation — `_normalize_wing`
and `list_tunnels` now use normalized comparison; `compute_topic_tunnels`
canonicalizes wing keys.

New deferred work:

1. (Carried forward) Configurable chunk sizes CLI plumbing.
2. (Carried forward) CHANGELOG / docs polish.

This file is the current gap ledger for the `1b94f4e` reference state.


Reference sync
==============

Reference repo:
- path: `references/mempalace`
- pulled on: `2026-05-21`
- fast-forward: `1b94f4e` -> `95caf80`

Upstream areas changed in this pull:
- `mempalace/miner.py` (chunk-cap config, format-miner integration)
- `mempalace/format_miner.py` (new — virtual line numbers + format coverage)
- `mempalace/hallways.py` (within-wing entity connectors)
- `mempalace/mine_palace_lock.py` + `mempalace/mcp_server.py` (stale-PID
  timeout + idle exit + structured errors, #1552)
- tests around the above

Ported into Rust in this session
--------------------------------

1. `crates/core/src/miner.rs`
- Raised `MAX_CHUNKS_PER_FILE` default from 500 -> 50_000 (upstream #1455
  `5decc0c`). At `CHUNK_SIZE` (800 chars) the previous cap dropped
  legitimate long-form content (~400 KB hand-written prose);
  50,000 still gives two orders of magnitude of safety margin against
  the original lockfile / generated-artifact case that motivated the
  cap in #1296.
- Added `resolve_max_chunks_per_file(override) -> usize` helper that
  implements the upstream precedence: explicit override > env var
  `MEMPALACE_MAX_CHUNKS_PER_FILE` > module default. Sentinel value `0`
  (from any source) disables the cap entirely. Negative / non-numeric
  env values emit a stderr warning and fall back to default rather than
  silently disabling the cap.
- Added `Miner::with_max_chunks_per_file(Option<usize>)` builder for the
  per-instance override, and a new public `mine_with_options(...)`
  entry point that threads the override through `scan_and_mine`.
- `Miner::mine_file` now returns `(usize, Option<SkipReason>)` so
  callers can account chunk-cap drops separately from "already filed /
  unreadable / too short" skips. New `SkipReason::ChunkCap` variant
  mirrors upstream's `skip_reason = "chunk_cap"` tag.
- The `[skip]` notice now goes to stderr instead of stdout, matching
  upstream so `mpr mine ... > out.log 2> err.log` keeps progress on
  stdout and degraded outcomes on stderr.
- Regression tests: existing `test_mine_file_skips_when_chunks_exceed_cap`
  rewritten to use an explicit low cap (so the fixture stays small under
  the raised default); added `test_mine_file_zero_cap_disables_check`,
  `test_resolve_max_chunks_per_file_default`,
  `test_resolve_max_chunks_per_file_override_wins`.

2. `crates/core/src/cli.rs`
- New `--max-chunks-per-file <N>` flag on `mpr mine` that wins over the
  env var (#1455 parity).
- `print_mining_result` now emits a separate
  `Files skipped (chunk cap): N (raise via ...)` line whenever the
  counter is non-zero, mirroring upstream's summary breakdown.
- Threaded the override through `cmd_mine` -> `miner::mine_with_options`;
  the legacy `miner::mine(...)` entry point delegates to it with
  `max_chunks_per_file = None`.

3. `README.md`
- Bumped test count (`432 -> 435`, `362 -> 435`).
- Added new row in the Upstream PRs table for #1455 (configurable
  per-file chunk cap).
- Added an Environment Variables table covering `MEMPALACE_PALACE_PATH`,
  `MEMPALACE_NONINTERACTIVE`, `MEMPALACE_READONLY`,
  `MEMPALACE_EMBED_MODEL`, `MEMPALACE_MAX_CHUNKS_PER_FILE`,
  `MEMPAL_VERBOSE`.
- Refreshed Project Structure to reflect the `crates/{core,cli,bench}/`
  workspace layout (was still showing the old top-level `src/`).
- Refreshed Requirements: bumped Rust MSRV to 1.85 (edition 2024 via
  rmcp-macros) and dropped the obsolete "ChromaDB server or compatible"
  line (Rust port has shipped embedvec since the initial port).
- Added a chunk-cap example to the CLI command list.

Not ported (Rust N/A)
---------------------

Each entry below is a commit in `1b94f4e..95caf80` that does not apply
to the Rust port. Rationale recorded so the next sync can skip the same
triage.

Python compat / refactor follow-ups to #1455:

- `22cec56` — `Optional[int]` for Python 3.9 compat. Rust uses
  `Option<usize>` already; PEP 604 syntax is irrelevant.
- `81f3e32` — consolidate skip accounting under `--dry-run`. Rust's
  `--dry-run` short-circuits before per-file accounting, so the same
  dual-counter problem does not exist; chunk-cap skips are accounted
  uniformly via `MiningResult::files_skipped_chunk_cap`.
- `8d8a8c9` — multi-agent review polish on #1455 (assorted wording /
  doc tweaks). No code path Rust needs.

Python-only fixes carried forward to the next sync:

- `7bddb65` (#1552) — stale-PID timeout + MCP idle exit + structured
  errors. Substantial Python refactor across `mine_palace_lock.py`,
  `mcp_server.py` and the hook scripts. Rust's `mine_pid_guard.rs`
  already stores a PID + RFC3339 timestamp, so the stale-PID half is
  partially in place; the MCP idle watchdog + structured `error_class`
  enrichment is the next item to port. Deferred to its own sync PR.
- `1c818d5` — review feedback on `7bddb65`. Follow-up to the above; will
  port together with #1552.
- `0b9c562` (#1455 fix) — case-insensitive entity matching in
  `extract_entities_for_metadata` to mirror init-time behavior. Rust's
  `crates/core/src/diary_ingest.rs` has the same exact-match path;
  worth porting but isolated enough to defer to a dedicated diff.

Format coverage feature track (Python-only feature flag):

- `5e3d3dd` (3.3.6) — virtual line numbering + `--mode extract` for
  format coverage. New feature behind the `format_miner.py` module.
  Rust port does not yet have a `format_miner.rs`; treat as net-new
  feature work, not a sync fix.
- `e362dfa`, `5c40cc3` — review parity fixes against `miner.py` for the
  new `format_miner.py`. Follow-ups; port together with the parent
  feature.
- `ffc2ae4` — `MissingDependencyException` surfacing + sub-extras for
  the new format-miner backends. Tied to the format-miner feature.

Hallways feature (within-wing entity-to-entity connectors):

- `07807ab` (`hallways.py`) — new primitive that connects entities
  inside a single wing. Net-new feature; Rust already has cross-wing
  tunnels via `compute_topic_tunnels`. Treat as feature work, not sync.
- `fc13be5` — integrate `compute_hallways_for_wing` into the post-mine
  flow. Tied to the new module.

Cross-wing entity tunnels feature (`c18879b`):

- Net-new feature derived from the hallways primitive. Port together
  with `07807ab`.

Remaining gaps after this sync
------------------------------

Closed by this sync: per-file chunk cap (#1455) is now configurable in
Rust with raised default, separate accounting, and stderr routing.

Carried forward unchanged: LLM-assisted init / CLI flags, hook/doc/i18n
sync, broader test parity.

New deferred work:

1. Port #1552 (stale-PID timeout finalization + MCP idle exit watchdog
   + structured `error_class` enrichment). Rust already has half of
   the stale-PID story via `mine_pid_guard.rs`; finish the idle exit +
   error enrichment in a focused PR.
2. Port #1455 case-insensitive entity matching fix `0b9c562` into
   `diary_ingest.rs::extract_entities_for_metadata`.
3. Port format-miner feature track (`format_miner.py` + virtual line
   numbers + `--mode extract`) when the Rust port is ready to add a new
   ingest mode.
4. Port within-wing hallways + cross-wing entity tunnels as a feature
   pair on top of the existing tunnels code.

This file is the current gap ledger for the `95caf80` reference state.


Sync update — upstream HEAD `d0d011e` (2026-05-25)
==================================================

Followed `.agents/skills/sync-upstream-mempalace/SKILL.md`. Diffed
upstream `95caf80..d0d011e` (33 commits touching `mempalace/`) against
the Rust port and ported the single highest-priority correctness fix
below. Larger / data-shipping / feature-flag work is deferred and
listed in the updated "Remaining gaps" section.

Ported in this session
----------------------

1. `mempalace/palace_graph.py` + `mempalace/config.py` `3171a1b`
   (#1467) — tunnel file now follows `palace_path` config instead of
   the hardcoded `~/.mempalace/tunnels.json`. Without this, drawers
   landed in the configured palace while tunnels silently landed in a
   different file invisible to other processes touching the same
   palace (subagent profiles, sandboxes, multi-tenant hosts, container
   mounts moving the palace to `/srv/`).
   - Rust: new `Config::tunnel_file()` returning
     `dirname(palace_path)/tunnels.json`;
     `palace_graph::_tunnel_file` split into `_get_tunnel_file`
     (configured) + `_legacy_tunnel_file` (legacy detection).
     `_load_tunnels` warns via `tracing::warn!` if the configured
     tunnel file is missing but the pre-fix hardcoded path has one —
     no auto-migration, matching upstream's intentional choice.
   - Regression tests:
     `config::tests::test_tunnel_file_is_sibling_of_palace_path`,
     `palace_graph::tests::test_tunnel_file_follows_palace_path_config`,
     `palace_graph::tests::test_load_tunnels_returns_empty_when_configured_file_missing`.

Not ported (Rust N/A)
---------------------

- `f620809` / `997ec1b` / `3e00192` (#1235) — strip lone UTF-16
  surrogates with U+FFFD before writing to ChromaDB. Rust's `String`
  type forbids invalid UTF-8 by construction; lone surrogates cannot
  exist in a Rust `String` reaching the storage layer. The closest
  surface (JSON decode) returns a parse error before any sanitizer
  runs. Stricter than upstream but never silently corrupts data.

Carried forward to a focused PR (too large to bundle here)
----------------------------------------------------------

- `0ecf0e5` (#1537) — validate FTS5 at end of mine. Wires
  `_validate_palace_fts5_after_mine` into the three mine entry points
  and surfaces `MineValidationError` as exit 1. Rust's
  `crates/core/src/repair.rs` does not yet have a
  `sqlite_integrity_errors` primitive; porting the validation hook
  requires landing that helper first. Tracked for the next sync.
- `6658a4d` (#1539 / #1540) — chunk content before embedding in
  `tool_add_drawer`, `tool_diary_write`, and two other write sites to
  avoid silent data loss when content exceeds the embedder context
  window. Substantial diff that touches MCP request handling,
  chunk-id derivation, and parent-drawer metadata across multiple
  call sites. Rust's `Miner::chunk_text` is the canonical chunker
  already; lifting it into a shared helper and threading it through
  the MCP write surfaces is a dedicated piece of work.
- `db4f6cd` (#1538) — paragraph chunker bug where the splitter can
  emit oversized chunks. Lands together with #1539/#1540 (same chunk
  pipeline).
- `3171a1b` (#1468) — second half of the tunnels fix: explicit-tunnel
  endpoint validation against ChromaDB. Requires plumbing
  `_check_room_exists(wing, room, col)` through `create_tunnel`'s
  call graph. Rust's `palace_graph::create_tunnel` is invoked from
  contexts that don't currently hold a `PalaceDb` handle; adding it
  here would balloon the diff and risk regressions in topic-tunnel
  pathways that intentionally skip validation. Deferred to land with
  a small refactor of the tunnel-creation API.
- `0b9c562` (#1455 fix) — carried forward from the previous ledger.
  Rust's `extract_entities_for_metadata` is wired through
  `entity_detector::detect_from_content` (which is already
  case-insensitive), so this may already be structurally N/A; needs a
  parity test before closing the gap definitively.
- `7bddb65` / `1c818d5` (#1552) — stale-PID timeout + MCP idle exit +
  structured errors. Carried forward unchanged from the previous
  ledger.

Feature-flag / data-file work (not sync, treat as feature requests)
-------------------------------------------------------------------

- `3189407` (#1605) — COCA-derived common-English filter for entity
  candidates. Ships a wordlist data file plus filter wiring in
  `entity_detector.py`. Net-new feature.
- `49a8c86` (#494) — `hooks.auto_save` config toggle + env var. New
  config knob + env var.
- `1c5f15c` / `d0d011e` — graph layout features (force-directed
  layout). New visualization feature work.
- `5e3d3dd` / `e362dfa` / `5c40cc3` / `ffc2ae4` — format-miner feature
  track. Carried forward unchanged from the previous ledger.
- `07807ab` / `fc13be5` / `c18879b` — hallways + cross-wing entity
  tunnels feature pair. Carried forward unchanged.

Remaining gaps after this sync
------------------------------

Closed by this sync: tunnel-file location no longer ignores configured
`palace_path` (#1467).

Carried forward (Python-only or large): #1537 FTS5 validation, #1538 /
#1539 / #1540 chunk-before-embed pipeline, #1468 tunnel endpoint
validation, #1552 stale-PID + MCP idle, #1455 case-insensitive entity
match parity check, LLM-assisted init / CLI flags, hook/doc/i18n sync,
broader test parity, format-miner feature track, hallways + entity
tunnels feature pair, COCA filter feature, hooks.auto_save toggle,
graph layout features.

This file is the current gap ledger for the `d0d011e` reference state.
