# Ralph Progress Log
Started: Mon Apr 20 2026
Project: rivetkit-napi-receive-loop-adapter

## Codebase Patterns
- If bare `actor-conn-hibernation` wake/preserve tests fail while `closing connection during hibernation` still passes, the regression is probably in the hibernatable websocket restore/message-buffer path (`actor-conn.ts` / `envoy-client`), not the TS save-state bookkeeping in `registry/native.ts`.
- For `US-015`-style hibernation-removal changes, `pnpm test tests/native-save-state.test.ts` is the fast TS gate for `queueHibernationRemoval(...)` / `takePendingHibernationChanges()` plumbing; if that passes while the wake-path driver cases still fail, chase the preserved-socket wake stack instead of `registry/native.ts`.
- `NativeActorContext.takePendingHibernationChanges()` is a read-only snapshot of core's pending hibernation removals; the actual consume/restore cycle happens inside `rivetkit-core` `ActorContext::save_state(...)`, so TS can poll it for save gating without clearing the removal set.
- Inspector wire-version negotiation is core-owned now: use `ActorContext.decodeInspectorRequest(...)` / `encodeInspectorResponse(...)` backed by `rivetkit-core`, and do not reintroduce TS-side v1-v4 converter glue.
- Query-backed inspector routes can each hit their own transient `guard/actor_ready_timeout` during startup, so active-workflow inspector tests should poll the exact endpoint they assert on instead of waiting on one route and doing a single fetch against another.
- Before cutting a `workflow-engine` fix for an `actor-workflow` driver failure, rerun the targeted repro plus the full `tests/driver/actor-workflow.test.ts` file; earlier runtime fixes can already have flipped the case green, and guessing at workflow-engine changes is wasted motion.
- Completed `workflow()` runs follow the normal actor `run` contract: after the workflow returns, the actor idles into sleep unless user code explicitly calls `ctx.destroy()`.
- For inspector replay coverage, prove "workflow in flight" with the inspector's overall `workflowState` (`pending`/`running`), not `entryMetadata.status` or `runHandlerActive`; those can lag or disagree across encodings even when replay should still be blocked.
- For active-workflow inspector tests, use a test-controlled deferred block plus an explicit `release()` action instead of step timing; fixed sleeps turn replay/history assertions into flaky bullshit.
- For `actor-inspector` active-workflow regressions, rerun both the full bare `tests/driver/actor-inspector.test.ts` file and the isolated `workflow-history` / `summary` tests; this branch can fail only under full-file load while the single-test rerun comes back green.
- For full bare `actor-inspector` driver runs on this branch, keep a per-test timeout override for the active-workflow `/inspector/workflow-history` and `/inspector/summary` polls; the endpoint polling is correct, but 30s can still be too tight once the run falls back through `guard/actor_ready_timeout` retries.
- Process-global `rivetkit-core` `ActorTask` test hooks (`install_shutdown_cleanup_hook`, lifecycle-event/reply hooks) need actor-id filtering plus a shared async test lock, or parallel `cargo test` runs will happily cross-wire unrelated actors and make you chase ghosts.
- In `rivetkit-core` shutdown-race tests, install `actor::task::install_shutdown_cleanup_hook(...)` to inject assertions immediately after `teardown_sleep_controller()`; trying to catch that window with plain `yield_now()` timing is flaky because the stop reply can complete in the same tick.
- In `rivetkit-core` inspector BARE codecs, schema `uint` fields must serialize through `serde_bare::Uint` and schema `data` fields through `serde_bytes`; raw Rust `u64` / `Vec<u8>` serde encoding does not match the generated TypeScript BARE wire format.
- `rivetkit-typescript/packages/rivetkit/tests/driver/shared-harness.ts` mirrors runtime stderr lines containing `[DBG]`; strip temporary debug instrumentation before timing-sensitive driver reruns or hibernation tests or the log spam can fake timeout regressions.
- `POST /inspector/workflow/replay` can legitimately return an empty history snapshot when replaying from the beginning, because the replay endpoint clears persisted workflow history before restarting the workflow.
- During isolated driver reruns, a one-off workflow actor start failure with `no_envoys` can be a runner-registration flake; rerun the exact test once before filing a product bug if the immediate rerun comes back green.
- In `rivetkit-typescript/packages/rivetkit/src/registry/native.ts`, late `registerTask(...)` calls during sleep/finalize teardown can hit `actor task registration is closed` / `not configured`; swallow only that specific bridge error or bare workflow sleep/wake cleanup can crash the runtime and masquerade as `no_envoys`.
- In `rivetkit-typescript/packages/rivetkit/src/registry/native.ts`, keep direct HTTP `/action/*` requests wired to the same `onStateChange` callback path as receive-loop actions; otherwise lifecycle hook behavior diverges between direct fetches and mailbox dispatch.
- In `rivetkit-typescript/packages/rivetkit/src/common/utils.ts::deconstructError`, only pass through canonical structured errors (`instanceof RivetError` or tagged `__type: "RivetError"` with full fields); plain-object lookalikes must still be classified and sanitized.
- Native inspector queue-size reads should come from `ctx.inspectorSnapshot().queueSize` in `rivetkit-core`, not TS-side caches or hardcoded HTTP fallbacks.
- In `rivetkit-core` `ActorTask::run`, bind channel `recv()`s as raw `Option`s and log closure explicitly; `Some(...) = recv()` plus `else => break` swallows which inbox died.
- When `envoy-client` mirrors live actor state into `SharedContext.actors` for sync handle lookups, wrap inserts/removals in `EnvoyContext` helpers so stop-event cleanup updates the async map and the shared mirror in lockstep.
- Once `SleepController::teardown()` starts, `track_shutdown_task(...)` must refuse new work under the same `shutdown_tasks` lock; reopening a fresh `JoinSet` after teardown just leaks late `wait_until(...)` tasks.
- `rivetkit-napi` caches `ActorContextShared` by `actor_id`, so every fresh `run_adapter_loop(...)` must clear per-instance runtime state (`end_reason`, ready/started flags, abort/task hooks) before a wake; otherwise sleep→wake can inherit stale shutdown state and drop post-wake events.
- `rivetkit-napi` `JsActorConfig` is narrower than `rivetkit-core` `FlatActorConfig`; when deleting JS-exposed config fields, keep the Rust conversion explicit and set any wider core-only fields to `None`.
- When native action timeouts originate in Rust (`rivetkit-napi` / `rivetkit-core`), `rivetkit-rust/packages/rivetkit-core/src/registry.rs::inspector_error_status` must map `actor/action_timed_out` to HTTP 408 or clients get the right payload behind the wrong status code.
- On this branch, `vitest -t` can still skip `tests/driver/action-features.test.ts` even with the nested suite path; if that happens, run the full file and grep `/tmp/driver-test-current.log` for the `encoding (bare) > Action Timeouts` pass lines instead of trusting the skipped run.
- Raw `onRequest` HTTP fetches should bypass `maxIncomingMessageSize` / `maxOutgoingMessageSize`; keep those message-size guards on `/action/*` and `/queue/*` HTTP message routes in `rivetkit-typescript/packages/rivetkit/src/registry/native.ts`, not generic `rivetkit-core/src/registry.rs::handle_fetch`.
- Primitive JS<->Rust cancellation bridges in `rivetkit-napi` should pass monotonic token IDs through TSF payloads and poll a shared `scc::HashMap` via a sync N-API function; do not try to smuggle `#[napi]` class instances like `CancellationToken` through callback payload objects.
- `rivetkit-napi` tests that assert on the process-global cancel-token registry should serialize themselves with a test-only guard, or parallel async tests will contaminate the size/cancellation assertions.
- `Queue::wait_for_names(...)` can bridge JS `AbortSignal` through registered native cancel-token IDs, but plain actor queue receives still need the `ActorContext` abort token wired into `Queue::new(...)` so `c.queue.next()` aborts during destroy.
- `SleepController` event-driven drains should wake off `AsyncCounter` zero-transition notifies plus `Notify::notified().enable()` arm-before-check waiters; reintroducing scheduler polling there is just dumb latency.
- Sleep-driven actor shutdown is two-phase now: `SleepGrace` keeps dispatch/save ticks live after an immediate `BeginSleep`, and `SleepFinalize` is the only phase that gates dispatch and sends `FinalizeSleep` teardown work into the adapter.
- For detached `rivetkit-core` lifecycle signals like `ctx.sleep()` / `ctx.destroy()`, rely on the spawned task itself (or an explicit `yield_now()`) for decoupling; adding a fake `sleep(1ms)` only injects jitter.
- For `rivetkit-core` shutdown-side `JoinSet` work, construct the `CountGuard` before `spawn(...)`; teardown can abort before first poll, and a guard created inside the async body will leak the counter.
- Keep `SleepController` region APIs as raw `RegionGuard` counters and put sleep-timer resets, activity notifications, and websocket task metrics in `ActorContext` guard wrappers so RAII migrations do not smuggle side effects into `WorkRegistry`.
- For staged `rivetkit-core` drain migrations, add future-facing counters/guards alongside the legacy `SleepController` fields first, and suppress scaffold-only dead-code locally until the follow-up story wires real call sites.
- Shared Rust async primitives that need to be reused by both `engine/sdks/rust/envoy-client` and `rivetkit-core` should live in `engine/packages/util`; paused-time tests there also need a crate-local `tokio` dev-dependency with `features = ["test-util"]`.
- In `engine/sdks/rust/envoy-client`, sync `EnvoyHandle` accessors for live actor state should read the shared `SharedContext.actors` mirror keyed by actor id/generation; blocking back through the envoy task can panic on current-thread Tokio runtimes.
- Package-local CI guard scripts under non-Rust extensions need to be included in `.github/workflows/rust.yml`'s paths filter or Rust CI will never notice the script changed.
- When filtering a single `rivetkit-typescript/packages/rivetkit/tests/driver/*.test.ts` file with `vitest -t`, include the outer `describeDriverMatrix(...)` suite name before `static registry > encoding (...)` or the whole file gets skipped.
- Driver `vitest -t` filters must also use the exact inner `describe(...)` text from the file, not the progress-template label; examples on this branch include `Action Features`, `Actor onStateChange Tests`, `Actor Database (Raw) Tests`, `Actor Inspector HTTP API`, `Gateway Query URLs`, and `Actor Database PRAGMA Migration Tests`.
- Hot-path shared registries and waiter maps in `rivetkit-napi` / `rivetkit-core` should use `scc::HashMap`, not `Mutex<HashMap<...>>` or `RwLock<HashMap<...>>`; the async entry/remove APIs map cleanly onto the bridge and queue call sites.
- In `rivetkit-core`, shutdown-only immediate persistence should chain through `ActorState` and be awaited via `wait_for_pending_state_writes()`; schedule/state helpers must not fire-and-forget extra save tasks during teardown.
- Reply-bearing TSF dispatches in `rivetkit-typescript/packages/rivetkit-napi/src/napi_actor_events.rs` should go through a timed spawn helper, not raw `spawn_reply(...)`, or a hung JS promise can sit in the adapter `JoinSet` until shutdown.
- When porting callback-era Rust actors to typed `rivetkit`, keep runtime-only data that used to live in `ctx.vars()` in an actor-id keyed map initialized from `run(Start<A>)` and removed on exit so helper methods can migrate without signature explosion.
- In `rivetkit-rust/packages/rivetkit/src/context.rs`, hand-write `Clone` for generic typed wrappers like `Ctx<A>` and `ConnCtx<A>`; `#[derive(Clone)]` can accidentally impose `A: Clone` just because the wrapper carries `PhantomData`.
- In `rivetkit-rust/packages/rivetkit/src/event.rs`, keep typed event-wrapper drop-guard tests inline with the module instead of external integration tests when the wrappers or bridge helpers still rely on `pub(crate)` fields like `Reply<T>` slots or `wrap_start::<A>(...)`.
- In `rivetkit-rust/packages/rivetkit`, canned tests that need `wrap_start(...)` or other `pub(crate)` helpers should live under `tests/` and be re-included through a `src/` `#[cfg(test)] #[path = "..."]` shim instead of widening the public API.
- `rivetkit-rust/packages/rivetkit` is not currently listed in the repo-root Rust workspace members, so a literal repo-root `cargo build -p rivetkit` fails before compile; for isolated validation, use a throwaway copied workspace root that adds the crate as a temporary member instead of editing forbidden root manifests.
- When validating `rivetkit` from a throwaway workspace, `librocksdb-sys` can reuse an existing build by pointing `ROCKSDB_LIB_DIR` and `SNAPPY_LIB_DIR` at a repo `target/debug/build/librocksdb-sys-*/out` directory; otherwise the temporary build may die on disk space before it ever reaches your example code.
- When temp-building `rivetkit` against a reused `librocksdb-sys` archive, add `RUSTFLAGS="-C link-arg=-lstdc++"` or the example binary can fail to link with missing C++ stdlib symbols.
- `rivetkit::prelude` is intentionally tiny (`Actor`, `Ctx`, `ConnCtx`, `Event`, `Start`, `Registry`, `anyhow::{Result, anyhow}`); pull richer typed wrappers like `Action`, `Sleep`, or `SerializeState` from top-level `rivetkit::...` exports instead of bloating the prelude again.
- In `rivetkit-rust/packages/rivetkit/src/registry.rs`, keep the typed-to-core bridge in one helper (`build_factory(...)`) and have both `register_with(...)` and tests use it, so `wrap_start::<A>(...)` only has one runtime path to drift.
- In `rivetkit-rust/packages/rivetkit/src/event.rs`, wrappers that hand off replies after moving owned request data should split the `Reply<T>` into a tiny helper wrapper (like `HttpReply`) so deferred responders keep the dropped-reply warning path instead of silently falling through `Reply` drop.
- In `rivetkit-rust/packages/rivetkit`, typed actor-state `StateDelta` builders belong in `src/persist.rs`; `SerializeState`/`Sleep`/`Destroy` wrappers in `src/event.rs` should stay thin and reuse those helpers instead of re-encoding state ad hoc.
- In `rivetkit-rust/packages/rivetkit/src/event.rs`, keep `Action::decode()` errors flat (`anyhow!("...: {error}")`) instead of hiding the serde cause behind `with_context(...)`; callers and tests need the top-level string to preserve messages like `unknown action variant: ...`.
- Typed event wrapper structs in `rivetkit-rust/packages/rivetkit/src/event.rs` should store reply handles as `Option<Reply<T>>`; once a wrapper implements `Drop`, later `ok()` / `err()` helpers need `take()` to move the reply out without fighting Rust's move-out-of-Drop rules.
- During staged Rust API rewrites, stale examples can be parked behind `required-features` in `Cargo.toml` so `cargo test` stays green until the dedicated example-migration story lands.
- `rivetkit-rust/packages/rivetkit/src/context.rs` should stay a stateless typed wrapper over `rivetkit-core::ActorContext`: keep actor state in the user receive loop, avoid typed vars/state caches on `Ctx<A>`, and do CBOR encode/decode only at wrapper boundaries like `broadcast` and `ConnCtx`.
- `rivetkit-rust/packages/rivetkit/src/start.rs` should write each `ActorStart.hibernated` state blob back onto the `ConnHandle` before wrapping it as `Hibernated<A>`, so `conn.state()` matches the wake snapshot instead of stale handle state.
- In `rivetkit-rust/packages/rivetkit/src/event.rs`, typed connection-event helpers should reuse `ConnCtx<A>` for CBOR state writes and keep `Reply<()>` handles as `Option` so helper methods can `take()` the reply without breaking the existing drop-warning path.
- Adapter-facing startup helpers should live on `rivetkit-core::ActorContext` and be shared by `ActorTask` plus the NAPI preamble; do not fork alarm-resync or overdue-schedule drain logic into NAPI-only shims.
- On this branch, the native TypeScript actor/connection persistence glue still lives in `rivetkit-typescript/packages/rivetkit/src/registry/native.ts`; story docs that mention split `state-manager.ts` or `connection-manager.ts` files are stale unless those modules get restored first.
- Public TS actor `onWake` currently maps to the adapter's `onBeforeActorStart` callback in `rivetkit-typescript/packages/rivetkit/src/registry/native.ts`; the raw NAPI `onWake` hook is wake-only preamble plumbing.
- Static actor `state` literals in `rivetkit-typescript/packages/rivetkit/src/registry/native.ts` must be `structuredClone(...)`d per actor instance or keyed actors will share mutations.
- Every `NativeConnAdapter` construction path in `rivetkit-typescript/packages/rivetkit/src/registry/native.ts` needs both the `CONN_STATE_MANAGER_SYMBOL` hookup and a `ctx.requestSave(false)` callback, or hibernatable conn mutations/removals stop reaching persistence.
- Durable native actor saves in `rivetkit-typescript/packages/rivetkit/src/registry/native.ts` must use `ctx.saveState(StateDeltaPayload)` and a wired `serializeState` callback; the legacy boolean `ctx.saveState(true)` path only requests a save and returns before the durable commit finishes.
- `rivetkit-napi` Rust-side regressions should be validated with `cargo check -p rivetkit-napi --tests` plus `pnpm --filter @rivetkit/rivetkit-napi build:force`; plain `cargo test -p rivetkit-napi` tries to link a standalone N-API test binary and fails without a live Node N-API runtime.
- `rivetkit-core` receive-loop surface changes need a three-point sweep: `src/actor/callbacks.rs` for the public enum, `src/actor/task.rs` for the runtime emitter, and `tests/modules/task.rs` plus `examples/counter.rs` for direct API coverage.
- `rivetkit-core` receive-loop shutdown persistence is explicit now: `Sleep`/`Destroy` only acknowledge with `Reply<()>`, so adapters/examples/tests must call `ctx.save_state(...)` themselves when they want a final flush, and scheduled actions should arrive as `conn: None` instead of a fake `ConnHandle`.
- `ActorContext::conns()` now returns a guard-backed iterator instead of a `Vec`; use it directly for synchronous scans, but `collect::<Vec<_>>()` before any loop body that hits `.await`.
- `ActorContext::disconnect_conns(...)` is best-effort transport teardown: attempt every matching connection, remove the successful disconnects, run connection/sleep bookkeeping, and only then bubble up an aggregated error for any failures.
- Live receive-loop inspector state now comes from `ctx.inspector_attach()` / `ctx.inspector_detach()` + `ctx.subscribe_inspector()`: `ActorTask` debounces `SerializeStateReason::Inspector` via request-save hooks, and websocket handlers should consume the overlay broadcast instead of relying on `InspectorSignal::StateUpdated` for fresh bytes.
- In `rivetkit-typescript/packages/rivetkit-napi/src/napi_actor_events.rs`, inspector `SerializeState` is read-only for the adapter dirty bit; only persisting paths (`Save` or shutdown saves) are allowed to consume and clear pending dirty state.
- NAPI callback payloads build a fresh `ActorContext` wrapper every time, so adapter-owned state like abort tokens, restart hooks, and end reasons must live in shared storage outside `ActorContext::new(...)` or later callbacks lose that state.
- `rivetkit-typescript/packages/rivetkit-napi/src/actor_factory.rs` is now the single receive-loop callback-binding registry: keep TSF slots, payload builders, and `callback_error` / `call_*` bridge helpers there instead of re-creating ad hoc JS conversion code in later adapter stories.
- `rivetkit-typescript/packages/rivetkit-napi/src/napi_actor_events.rs` is the receive-loop execution boundary now; keep `actor_factory.rs` on binding/bridge setup and land event-loop control flow in the dedicated module.
- Receive-loop `SerializeState` handling should stay inline in `napi_actor_events.rs`, reuse `state_deltas_from_payload(...)` from `actor_context.rs`, and only cancel the adapter abort token on `Destroy` or final adapter teardown, not on `Sleep`.
- Adapter-owned long-lived handles like `run` should stay in `napi_actor_events.rs` and be exposed to JS through sync hooks stored on shared `ActorContext` state; use a plain `std::sync::Mutex` for those slots because `restartRunHandler()` is synchronous and must not await or `blocking_lock()` inside Tokio.
- Graceful adapter drains in `napi_actor_events.rs` should use `while let Some(result) = tasks.join_next().await`; `JoinSet::shutdown()` aborts in-flight work and breaks the `Sleep`/`Destroy` ordering guarantees.
- `Sleep` and `Destroy` must set the adapter `end_reason` on both success and error replies; otherwise the outer receive loop keeps consuming queued mailbox events after shutdown has already failed.
- Long-lived NAPI callback bridges that only forward lifecycle signals should `unref()` their `ThreadsafeFunction`, or a waiting Rust task can keep Node alive after user code is done.
- Bare JS-constructed `ActorContext` wrappers are missing the runtime actor inbox wiring; methods like `connectConn()` only work once the context comes from a real runtime-backed actor instance.
- Adapter-only lifecycle timeouts belong on the NAPI boundary: add them to `JsActorConfig` plus `index.d.ts`, but do not thread them into `rivetkit-core::FlatActorConfig` when core does not own that callback.
- Some receive-loop startup helpers in `actor_context.rs` are intentionally adapter-facing shims or no-ops because core already restored alarms/connections before the adapter starts; the adapter's real job is to preserve callback order before it drains the mailbox.
- In `napi_actor_events.rs`, missing action handlers should fail fast before spawning, but once a reply task is spawned its abort branch must send `ActorLifecycle::Stopping` explicitly so the `Reply<T>` drop guard does not paper over shutdown with `dropped_reply`.
- Optional NAPI receive-loop callbacks should keep the TS runtime defaults: missing `onBeforeSubscribe` allows, missing workflow callbacks return `None`, and missing connection lifecycle hooks still accept the connection without inventing conn state.
- `rivetkit-core` private `ActorTask` helpers should be regression-tested in `tests/modules/task.rs` through the existing `#[cfg(test)] #[path = "../../tests/modules/task.rs"]` shim instead of widening visibility or adding test-only public hooks.

