# Ralph Progress Log
Started: Thu 21 May 2026 11:38:05 AEST
---

## Codebase Patterns
- `bun test` only discovers files matching `.test`/`.spec`/`_test_`/`_spec_` naming. To run `.it.ts` files, use a shell glob with a leading `./`: `bun test ./apps/studio/integration/*.it.ts` — without the leading `./`, bun rejects the glob as a filter rather than treating it as a file path.
- To capture full stdout/stderr from a `Bun.spawn` subprocess, prefer `await new Response(proc.stdout).text()` (and same for stderr) — concise, reads to completion, and matches bun idioms. Use `Promise.all([…, …, proc.exited])` to wait on both streams + exit in parallel.
- Integration test support helpers live in `apps/studio/integration/support/`. Each helper resolves paths via `resolve(import.meta.dir, '../..')` to find `apps/studio/` so tests work from any cwd.
- Studio state directory is controlled by env var **`SEEFLOW_WORKSPACE`** (NOT `SEEFLOW_HOME`). When set, `seeflowHome()` returns `${SEEFLOW_WORKSPACE}/.seeflow`. Integration tests inject `SEEFLOW_WORKSPACE=<mkdtempSync(...)>` for hermetic isolation. See `apps/studio/src/paths.ts`.
- Studio CLI defaults to **background daemon** mode. Pass `--foreground` so SIGTERM/SIGKILL reach the actual server process (otherwise the parent forks and the handle's `pid` points to the short-lived launcher). Setting `SEEFLOW_DAEMON=1` is the child-detection signal; the wrapper uses it internally so do not set it from tests.
- `/healthz` (Kubernetes-style readiness probe) returns `{ status: 'ok' }` — distinct from legacy `/health` which returns `{ ok: true }`. Use `/healthz` for the harness health-poll.
- Biome enforces import ordering (organizeImports). After adding imports run `bun run format` then `bun run lint` from the repo root.
- The Hono server uses `hono/bun` and `serve()` from `apps/studio/src/server.ts`. `app.get('/healthz', ...)` is defined directly in `createApp` at server.ts:93.
- Existing test pattern: `import { describe, expect, it } from 'bun:test'` + `mkdtempSync` for tmp dirs.
- MCP stdio integration: spawn `bun apps/studio/src/mcp-shim.ts` (NOT `mcp.ts`, which is a factory module, NOT `bin/seeflow-mcp` which adds a node→bun spawnSync layer). The shim is a JSON-RPC proxy to the studio's HTTP `/mcp` — pass `SEEFLOW_STUDIO_URL=${studio.baseURL}/mcp` via env.
- MCP tool names use `seeflow_` prefix (e.g. `seeflow_list_flows`, `seeflow_add_node`). PRD references to legacy `listDemos` are outdated — always grep `apps/studio/src/mcp.ts` for the actual `name:` strings before asserting.
- `@modelcontextprotocol/sdk` Client.close() invokes transport.close() which handles SIGTERM→SIGKILL of the child. Capture `transport.pid` BEFORE close() because it's nulled by the transport once close runs.
- `StdioClientTransport.env` expects `Record<string, string>`; `process.env` has `string | undefined` values, so filter undefined before spreading.
- SSE route is `/api/events?flowId=<id>`; the route REQUIRES `flowId` and a registered flow (400 / 404 otherwise). Initial frame is always `event: hello\ndata: {flowId, ts}`. Status emits land as `node:running` / `node:done` / `node:error` with `data: {ts, nodeId, runId?, ...payload}`. Source: `apps/studio/src/api.ts` (api.get('/events') + EMIT_STATUS_TO_EVENT map).
- Bun has no built-in EventSource — SSE clients must hand-roll `fetch` + `response.body.getReader()` + a frame parser. Normalize `\r\n` → `\n` before splitting on `\n\n` so the parser doesn't care which line ending the server uses. SSE field rules: `event:` (default 'message'), `data:` (multi-line joined with '\n'), `id:`, `:` line is a comment, single leading space after the colon is trimmed.
- POST `/api/emit` body shape is `{ flowId, nodeId, status: 'running'|'done'|'error', runId?, payload? }` — `nodeId` does NOT need to exist on the flow; the broadcaster just maps `status` → event type and forwards the payload. See `EmitBodySchema` + `EMIT_STATUS_TO_EVENT` in `api.ts`.
- POST `/api/projects` body is `{ name }` and returns `{ id, slug, scaffolded }`. Use the `id` for any flow-scoped route (`/api/events?flowId=<id>`, `/api/emit`, `/api/flows/:id/*`).
- Project scaffolds land at `${studio.workspace}/<slug>/.seeflow/flow.json` (where `studio.workspace = ${SEEFLOW_WORKSPACE}/.seeflow`). The scaffold body is `{ version: 2, name: <input-name>, nodes: [], connectors: [] }` — empty arrays validate cleanly against `ResolvedFlowSchema` / `FlowSchema`.
- Two PRD route names that are misleading: it says `/api/flows/:id/register` and `/api/flows/:id/validate`, but the actual routes are `/api/flows/register` (body `{ repoPath, flowPath, name? }` per `RegisterBodySchema`) and `/api/flows/validate` (body `{ demo, tier? }` per `ValidateRequestSchema` in `diagram.ts`). Always grep `apps/studio/src/api.ts` for `api.post`/`api.get` lines before authoring assertions — the PRD's path suggestions are sometimes stale.
- `deleteFlowImpl` removes the **registry** entry (and unwatches) but does NOT delete `flow.json` on disk. The on-disk file is preserved — tests should assert registry-side state (404 on GET, missing from list) and that the file still exists.
- For integration tests that need a flow registered without going through POST `/api/projects` (e.g. testing `/api/flows/register` directly), write a flow.json under `${studio.home}/<sub-dir>/.seeflow/flow.json` (a sibling of `studio.workspace`, NOT inside it), then POST `{ repoPath: ${studio.home}/<sub-dir>, flowPath: '.seeflow/flow.json' }`. This avoids the project ending up inside the workspace where /api/projects also writes.
- Disk-state assertions must know whether a field lives in flow.json or style.json. The split is governed by `apps/studio/src/merge.ts`: NODE_DATA_FLOW_KEYS (name, kind, stateSource, handlerModule, icon, description, detail, playAction, statusAction, shape, path, alt, html) and NODE_STYLE_KEYS (width, height, borderColor, backgroundColor, borderSize, borderStyle, fontSize, textColor, cornerRadius, borderWidth, color, strokeWidth, autoSize) — plus `position` always to style. CONNECTOR_FLOW_KEYS (id, source, target, kind, label, method, url, eventName, queueName) and CONNECTOR_STYLE_KEYS (sourceHandle, targetHandle, sourceHandleAutoPicked, targetHandleAutoPicked, sourcePin, targetPin, style, color, direction, borderSize, path, fontSize). Grep merge.ts BEFORE writing assertions that check disk state.
- The smallest valid node seed for boilerplate-heavy tests is `{ type: 'shapeNode', data: { shape: 'rectangle' } }` — no playAction, stateSource, kind, or name required (shapeNode is decoration-only). Use this whenever a test needs to seed nodes that aren't the focus of the assertion. Default position is auto-defaulted to `{x:0,y:0}` by `addNodeImpl`.
- PATCH endpoints to remember:
  - `PATCH /api/flows/:id/nodes/:nodeId` — partial merge of NodePatchBodySchema (top-level `position`, every other key under `data`). Returns `{ ok: true }`.
  - `PATCH /api/flows/:id/nodes/:nodeId/position` — body `{x, y}`. Returns `{ ok: true, position: {x, y} }`. Persists to style.json.
  - `PATCH /api/flows/:id/nodes/:nodeId/order` — body is the discriminated `ReorderBodySchema` on `op` (`forward|backward|toFront|toBack|toIndex`). Returns `{ ok: true }`. Persists by mutating flow.nodes[] array order.
  - `PATCH /api/flows/:id/connectors/:connId` — partial merge of ConnectorPatchBodySchema. Returns `{ ok: true }`.
- PRD acceptance-criteria verbs are NOT always trustworthy — US-006 said "POST nodes/:nodeId/position" / "POST nodes/:nodeId/reorder" but both are PATCH (and the reorder route is named `/order`, not `/reorder`). Always grep api.ts for the actual `api.patch`/`api.post`/`api.delete` lines before authoring assertions.
- All mutating routes funnel through `mutateMergedFlowAndBroadcast` → `withFlowWriteLock` → `mutateMergedFlow` → `writeFileAtomic`. By the time the HTTP response returns 200, the on-disk write is complete (no need for sleeps or polling in tests).
- Two distinct schemas govern node shape: `ResolvedFlowSchema` is what the in-memory merged flow uses (includes top-level `position`, visual/style fields on `data`), while `FlowSchema` (and the Flow*NodeDataSchema variants) is what's serialized to flow.json on disk — **every Flow*NodeDataSchema is `.strict()`** and `position` is stripped (lives in style.json). Tests that write flow.json directly (external-edit, register-flow, fixtures) MUST omit `position` and any style fields per `NODE_STYLE_KEYS`/`CONNECTOR_STYLE_KEYS` or the watcher's reparse will broadcast `valid: false`.
- Watcher event name is `flow:reload` (NOT `reloaded` — the PRD's wording is loose). Payload shape (server side, post-`JSON.stringify({ts, ...payload})`) on success: `{ ts, valid: true, flow: <ResolvedFlow> }`; on schema failure: `{ ts, valid: false, error: <string> }`. The watcher uses a 100ms debounce + a SHA-256 ring of recent self-write hashes to suppress own-write echoes — external writes (not in the ring) always fire. See `apps/studio/src/watcher.ts` `broadcastReload` + `notifyWritten`.
- runPlay (proxy.ts) anchors scripts under `<cwd>/.seeflow/nodes/<nodeId>/` and verifies the realpath stays inside that root. `data.playAction.scriptPath` is relative TO THAT ANCHOR (so `'scripts/play.ts'`, NOT `'nodes/<id>/scripts/play.ts'`). The node directory is created as a side effect of `addNodeImpl`'s `detail` externalization, so callers can: (1) POST the playNode, then (2) write the script file under the existing node dir. Script stdout is JSON.parse'd by runPlay; print a single JSON line + `process.exit(0)` and runPlay returns `{ runId, status: 200, body: <parsed> }`.
- CLI subcommand testing pattern: spawn ONE shared studio per file (beforeAll/afterAll) and pass both `SEEFLOW_STUDIO_URL=${studio.baseURL}` and `SEEFLOW_WORKSPACE=${studio.home}` via `runCli({ env })`. `SEEFLOW_STUDIO_URL` short-circuits `studioUrlOrDie` out of the daemon-spawn path (healthOk hits `/health` and returns true). `SEEFLOW_WORKSPACE` aligns `seeflowHome()` with the studio's state dir so any subcommand that reads `config.json` / `seeflow.pid` / project files sees the same files.
- The `stop` subcommand MUST be tested with its OWN per-test studio — using the shared one would kill every later test in the file. The harness writes `seeflow.pid` at `${workspace}/seeflow.pid` even in --foreground mode (writePid runs unconditionally in runStart after the foreground branch).
- HTTP-passthrough CLI subcommands print a single trailing `{"ok":true,...}` JSON envelope via `printOk`. Some commands also log human-readable lines (e.g. `register` prints `Registered "name" → url/d/slug` BEFORE printOk). Use a `parseOkLine(stdout)` helper that splits stdout into non-empty lines and parses the LAST one — that's always the JSON envelope.
- Stop pre-existing bug surfaced by integration tests: `apps/studio/src/cli-e2e.ts` was reading `demoData.demo` but the GET `/api/flows/:id` response key is `flow` (per `FlowGetResponse` in operations.ts). The bug made `seeflow e2e <id>` always exit 1 with "demo not valid". Fixed by renaming `DemoShape` → `FlowBody` and `demo?` → `flow?` in cli-e2e.ts. Lesson: integration tests catch field-name drift between cli helpers and the API; in-process parity tests don't because they wire directly to impl functions.
- `seeflow e2e <flowId>` against a flow with NO playNodes and NO statusNodes is a vacuous pass: both arrays are empty and `allPlaysOk && allStatusesOk` are both true. Useful smoke test for the subcommand wiring (arg parsing, SSE channel open, hard-ceiling, printOk) without needing to seed any nodes.
- The CLI's `runFlowsLayout` requires the flow.json on disk to be valid (FlowSchema.safeParse on the file contents). An empty body via `--json '{}'` exercises the "use defaults" path. The `loadBody` helper in `cli-helpers.ts` requires EXACTLY one of `--json`, `--file`, `--stdin` — even an empty payload still needs the flag.
- MCP tool result envelope: every CallToolResult on success is `{ content: [{ type: 'text', text: '<JSON-stringified payload>' }] }` per `okResult` in mcp.ts. On failure it's `{ isError: true, content: [{ type: 'text', text: '<message>' }] }`. Test helpers should `JSON.parse(result.content[0].text)` and assert `result.isError !== true` BEFORE parsing — a small `okJson<T>(result)` helper keeps each test small.
- MCP `validate_seeflow` wraps `validateImpl` DIRECTLY — it returns `{ ok: true }` on success or `{ ok: false, issues: [...] }` on failure. The REST `/api/flows/validate` route wraps validateImpl with an EXTRA envelope (`{ ok, stats, issues, warnings }`); the MCP tool does NOT. Don't reuse the REST `ValidateReport` type for MCP assertions.
- MCP tool count + names (16 tools total, sorted): `seeflow_add_connector`, `seeflow_add_connectors`, `seeflow_add_node`, `seeflow_add_nodes`, `seeflow_create_project`, `seeflow_delete_connector`, `seeflow_delete_flow`, `seeflow_delete_node`, `seeflow_get_flow`, `seeflow_list_flows`, `seeflow_move_node`, `seeflow_patch_connector`, `seeflow_patch_node`, `seeflow_register_flow`, `seeflow_reorder_node`, `validate_seeflow`. The single non-`seeflow_*` name is `validate_seeflow` (intentional — it's a stateless cross-cutting validator, not a flow-scoped operation). Snapshot is asserted in `apps/studio/integration/mcp.it.ts`.
- For MCP integration tests, use REST to seed preconditions (flow creation, node/connector setup) — POST `/api/projects` is already battle-tested in rest.it.ts and keeps each MCP test focused on ONE tool. For tools that ARE the seed mechanism (seeflow_create_project, seeflow_register_flow), exercise them directly as the test's seed step.
- The harness's `stop()` does SIGTERM → 5s grace → SIGKILL → `rmSync(home, {recursive,force})`. The rmSync is UNCONDITIONAL — if a test needs to inspect on-disk state AFTER the process dies (atomicity tests, stale-pid recovery), DON'T call `studio.stop()` to kill the process. Instead, send the signal externally via `process.kill(studio.pid, 'SIGTERM'|'SIGKILL')` and poll `process.kill(pid, 0)` (throws ESRCH = dead) until the process is gone. Then read the workspace. Then call `studio.stop()` afterwards to clean up file handles + tmp dir (it sees `proc.exitCode !== null` and skips the kill, just runs the rmSync). `rmSync` with `force:true` is also safe to call twice on the same path — second call no-ops on missing dir.
- The harness accepts `home: opts.home ?? mkdtempSync(...)` so tests can share a workspace across multiple `spawnStudio()` calls (e.g. stale-PID recovery: spawn A, SIGKILL externally, spawn B with same home → B sees A's stale pid file and overwrites it). When sharing home across spawns, do NOT call A's `stop()` until B is also done — A's stop would rmSync the home mid-B-run. Final cleanup is `await B.stop()`; A's deferred stop runs no-op on the already-gone home.
- `Bun.serve({port})` throws synchronously on EADDRINUSE (port already bound). In `runStart`, the throw becomes an unhandled rejection at the top-level `await runStart()` and Bun exits with code 1, surfacing an error message containing "in use" / "EADDRINUSE" / similar in stderr. ONLY foreground mode triggers this — the default daemon mode probes /health first via `spawnDaemon`/`healthOk` and exits 0 with "Studio already running" if it sees a healthy studio, regardless of which process is actually serving. Double-start integration tests must use `--foreground` to exercise the real port-collision path.
- writeFileAtomic (write-temp + rename) makes flow.json + style.json torn-write proof under any signal. SIGTERM mid-write yields either the OLD bytes (rename never happened) or the NEW bytes (rename completed) — never partial JSON. Combined with `withFlowWriteLock` serializing concurrent mutations per flowId, parallel PATCH from N callers must converge on exactly one input value (last-writer-wins).
- The 6 canvas node types (playNode/stateNode/shapeNode/imageNode/htmlNode/iconNode) and the single editable edge ALL live in `packages/canvas/src/nodes/` and `packages/canvas/src/edges/`. `apps/web` is a thin host — `demo-view.tsx` imports `SeeflowCanvas` from `@seeflow/canvas`. The PRD's hint to grep `apps/web/src` for node components is misleading; always grep `packages/canvas/src/nodes/` for the actual sources.
- Stable test attributes on the React Flow edge wrapper `<g class="react-flow__edge">` can't be set via JSX inside the edge component — xyflow owns that wrapper in a sibling render path. The established pattern in `editable-edge.tsx` is a useEffect that does `document.querySelector('.react-flow__edge[data-id=<id>]')` + `setAttribute()` (see `data-handoff` and `data-edge-kind`). Use this pattern for any new stable attribute on edges.
- For one-shot canvas-ready signals (`data-canvas-ready`, etc.) set the attribute imperatively in `<ReactFlow onInit>` via the existing `wrapperRef` — do NOT add a new useState slot. Per `packages/canvas/CLAUDE.md`, the SeeflowCanvas hook-shim test runner uses `useStateOverrides[N]` indexed by declaration order, and a new useState anywhere except the END of the body shifts every test's slot reference. Imperative setAttribute via the existing wrapperRef avoids this entirely.
- The seeflow.pid file is NOT a startup blocker. `runStart --foreground` writes it unconditionally AFTER `serve()` succeeds (no pre-flight check), so a stale pid from a SIGKILL'd predecessor is silently overwritten. The pid is only consulted by `runStop` (to find the target to kill) and by `ensureStudioRunning` (to skip-spawning when a healthy studio is already up). Tests asserting "stale PID recovery" should assert (a) pid file still exists after SIGKILL, (b) new spawn writes its own pid into the same file, (c) /healthz responds — not anything about the stale entry being "cleaned" before startup.
- Single-file fixtures with positions MUST be authored as ResolvedFlows (not Flows) — `FlowSchema` is `.strict()` and rejects `position`, while `ResolvedFlowSchema` accepts it. When the disk-write target needs FlowSchema-compliance (e.g. registerFlowImpl, watcher reparse), use `splitFlow(resolved)` to produce a `{flow, style}` pair and write each to flow.json/style.json. Validate the on-disk pair against FlowSchema + StyleSchema. See `apps/studio/integration/fixtures/kitchen-sink.flow.json` + `fixtures.it.ts` + `e2e/support/studio-fixture.ts` for the canonical pattern.
- Playwright's `test.extend<TestArgs, WorkerArgs>` requires worker-scoped fixtures in the SECOND generic — putting them in the first with `{ scope: 'worker' }` typechecks `scope` as `'test'` only. The required empty TestArgs spelling is `type EmptyTestArgs = Record<never, never>` — NOT `Record<string, never>` (which has an index signature that widens to `{[K]: never}` and intersects with WorkerArgs to make worker fixture values `never`, producing cryptic "X not assignable to never" errors at the `use()` call site) and NOT `{}` (banned by biome's `noBannedTypes`).
- `apps/studio/tsconfig.json`'s `include` covers `src/**/*`, `test/**/*`, `integration/**/*`, `e2e/**/*`. Earlier stories left integration/ out so pre-existing TS errors in test helpers went undetected — US-012 widened the include and surfaced two. When adding new test directories under apps/studio, extend the include or they won't be type-checked.
- `bunx playwright --version` resolves to whichever `playwright` binary the cwd's bun finds first. From the REPO ROOT it may pick up a hoisted/older version (e.g. 1.58.0); from `apps/studio/` it picks the package-local devDep (1.60.0). **`--config=…` does NOT make the resolver win** — it just hands the path to whichever binary bunx already chose. For deterministic CI behavior, invoke playwright via the absolute binary path: `bun apps/studio/node_modules/.bin/playwright test --config=apps/studio/e2e/playwright.config.ts`. This works from any cwd.
- Playwright specs live under `apps/studio/e2e/` with the **`.e2e.ts`** suffix — NOT `.spec.ts`. `bun test`'s default discovery matches `*.spec.ts` everywhere (no `testIgnorePatterns` in Bun) and would try to execute Playwright specs through bun's runner, which fails on the missing `test`/`expect` globals. The `.e2e.ts` suffix dodges bun's matcher; Playwright's config sets `testMatch: '**/*.e2e.ts'` to scope its matcher to the new suffix. Same applies to any future Playwright spec.
- Invoke Playwright with **`bun apps/studio/node_modules/.bin/playwright test --config=apps/studio/e2e/playwright.config.ts`** (NOT `bunx --bun playwright`). Using `bun <path-to-binary>` runs the package-local devDep under bun's runtime so the harness's bun-only APIs (`Bun.spawn`, `import.meta.dir`, `Bun.file`) work. `bunx --bun playwright` resolves against `~/.bun/install/cache/` and from REPO_ROOT picks up older cached versions (e.g. 1.58.0) whose CLI surface lacks the `test` subcommand and errors with `unknown command 'test'`. The explicit binary path is cwd-independent and deterministic — `bun run test:it` from anywhere uses the right version.
- Use **`bun apps/studio/scripts/test-integration.ts`** (or `bun run test:it` from REPO_ROOT) to run the full integration tier end-to-end: SPA freshness check → rebuild if stale → `bun run test:it:bun` → playwright e2e → `run.json` summary → exit `max(stepCodes)`. Per-run artifacts land under `apps/studio/integration/.artifacts/<runId>/` and the orchestrator exports `SEEFLOW_IT_ARTIFACT_DIR=<that path>` for any future harness consumer to honor. The orchestrator uses `Bun.spawn` with `stdout/stderr: 'inherit'` so live test output streams to the caller (matters for local UX and for CI log scraping).
- `apps/studio/scripts/test-integration.ts`'s SPA freshness check recursively walks `apps/web/src/` for the newest mtime and compares to `apps/studio/dist/web/index.html`'s mtime. If src is newer (or dist is missing), it rebuilds via `bun run --filter @seeflow/web build`. The PRD wording "newer than apps/web/src" is loose — checking only the directory mtime would miss file edits in subdirs, so the recursive walk is required.
- Playwright fixture API REQUIRES the first parameter to be an object destructuring pattern (it introspects the function source to discover fixture deps). `async (_args, use) =>` throws "First argument must use the object destructuring pattern". For fixtures with no deps, use `async ({}, use) =>` with a `// biome-ignore lint/correctness/noEmptyPattern: required by Playwright fixture API` comment — Biome's `noEmptyPattern` rule otherwise blocks the empty pattern.
- SPA URL for a flow is `${studio.baseURL}/d/<slug>` (path-based), NOT `?flow=<slug>` (query-based). See `apps/web/src/App.tsx` `matchDemoSlug`. Any test that opens a flow page must use the `/d/<slug>` route.
- Studio mode is inferred from `apps/studio/dist/web/index.html` existing (per `inferMode` in `server.ts`). Tests / e2e expect the **built** SPA — if `dist/web/` is stale or missing, the studio falls back to dev mode and proxies to Vite on `:5173` (which the harness doesn't run). Always rebuild via `bun run --filter @seeflow/web build` before running Playwright e2e tests. US-014's orchestrator script automates this; for local one-shot runs, build manually first.
- Visual-baseline snapshots are per-OS by default (Playwright suffixes `-chromium-<platform>.png`). Local macOS baselines (`<type>-chromium-darwin.png`) won't pass on a Linux CI runner — font hinting and antialiasing diverge enough that even `maxDiffPixelRatio: 0.02` won't paper over it. US-015 wired the CI workflow but DID NOT generate Linux baselines (out of US-015 scope; codebase pattern carries over). Before US-017 cuts a release, either (a) generate Linux baselines via `mcr.microsoft.com/playwright:v1.60.0-jammy` running `bunx playwright test --update-snapshots`, (b) commit both per-OS sets, or (c) restrict the visual baseline test to Linux-only. PRD US-013's local-verify criterion is satisfied with darwin baselines; cross-OS is the LAST remaining gap before US-017 can land.
- Reusable GitHub Actions workflows live under `.github/workflows/_<name>.yml` (leading underscore) and use `on: workflow_call:` with NO other top-level triggers. Callers reference them as `jobs.<id>.uses: ./.github/workflows/_<name>.yml`. The reusable workflow declares its own `permissions:` block (the caller's permissions do NOT cascade to called workflows for `id-token: write` or similar elevated scopes). Cache, setup-bun, and install-deps should live INSIDE the reusable workflow so it's self-contained. See `.github/workflows/_integration.yml` for the canonical pattern.
- YAML syntax validation for workflows: `bunx --bun js-yaml < .github/workflows/<file>.yml > /dev/null` parses without error on valid syntax. This is the lightweight check called out in PRD US-015/US-016 acceptance criteria — it catches structural issues without requiring `actionlint` or schema validation.
- Reusable workflow callers are one-liners: a caller integrates a `workflow_call`-gated reusable workflow by adding a top-level job with `uses: ./.github/workflows/<file>.yml` — no `runs-on`, no `steps`. Downstream jobs gate via `needs: <job-id>`. publish.yml + docker.yml both consume `_integration.yml` this way; the integration job appears in each workflow's run independently. `needs:` ordering does NOT cross workflows, so a `v*.*.*` tag push fires both workflows in parallel and they can succeed/fail independently — watch both via `gh run watch` when releasing.

## 2026-05-21 - US-001
- Implemented studio harness + integration dir + plumbing test (US-001).
- Files created:
  - `apps/studio/integration/support/studio-harness.ts` — `getFreePort()`, `waitForHealthz()`, `spawnStudio()` returning `StudioHandle { port, baseURL, home, workspace, pid, logs, stop() }`. Captures stdout/stderr to disk under `integration/.artifacts/<runId>/` AND in-memory `handle.logs`. `stop()` sends SIGTERM, waits 5s, falls back to SIGKILL, then `rmSync` the tmp home. Process-level exit guard (`process.on('exit'/'SIGINT'/'SIGTERM', killAll)`) kills orphaned children on runner crash.
  - `apps/studio/integration/support/ids.ts` — `uniqueFlowId(testName)` returning `it-${slug}-${nanoid6}` (rolled inline; no nanoid dep).
  - `apps/studio/integration/healthz.it.ts` — boots studio via `spawnStudio()`, GETs `/healthz`, asserts 200 + `{ status: 'ok' }`, calls `stop()`.
- Files modified:
  - `.gitignore` — added `apps/studio/integration/.artifacts/`.
  - `package.json` — added `test:it:bun` script.
- Verified: `bun run typecheck` (apps/studio) passes; `bun run lint` (root biome) passes; `bun run test:it:bun` passes; no orphan `cli.ts` processes after exit (`pgrep -f cli.ts` count unchanged); full `bun test` suite still passes (1576 pass / 0 fail).
- **Learnings for future iterations:**
  - **Important:** the test script in PRD is `bun test apps/studio/integration/`, but bun's test discovery does not match `.it.ts` in a directory. Implemented as `bun test ./apps/studio/integration/*.it.ts` so future tests using the `.it.ts` suffix are picked up. If you add nested directories under `integration/`, expand the glob accordingly.
  - PRD refers to `$SEEFLOW_HOME` semantically; the actual env var the codebase reads is `SEEFLOW_WORKSPACE`. The harness sets `SEEFLOW_WORKSPACE=<tmpDir>` and exposes both `handle.home` (= tmp dir) and `handle.workspace` (= `${tmpDir}/.seeflow`, where the studio actually writes).
  - The harness uses `--foreground` flag — without it, the CLI forks a daemon child and the spawned process exits immediately, so the harness loses its handle to the real server PID. This is critical for clean SIGTERM/SIGKILL in `stop()`.
  - On macOS, `Bun.spawn`'s `proc.kill('SIGTERM')` correctly hits the foreground child. No need for `process.kill(proc.pid, ...)` workarounds.
  - `bun test` produces a "JSON Parse error" log line during the full suite — this is a thrown-and-caught error from `registry.test.ts` testing parse-failure handling, not a real failure (1576 pass / 0 fail).
---

## 2026-05-21 - US-002
- Implemented cli-runner support helper (US-002).
- Files created:
  - `apps/studio/integration/support/cli-runner.ts` — exports `CliResult { code, stdout, stderr, durationMs }` and `runCli(args, opts?)`. Spawns `bun apps/studio/src/cli.ts <args>` via `Bun.spawn`, defaults to 30s timeout, SIGKILLs on timeout and throws an error naming the args + tail of captured output. Captures full stdout/stderr via `new Response(proc.stdout/stderr).text()`. Optional `stdin` is fed via the proc's stdin (handles both WritableStream and legacy writer shapes).
  - `apps/studio/integration/cli-runner.it.ts` — calls `runCli(['--help'])` and asserts `code === 0`, `stdout` contains `seeflow`, and `durationMs > 0`.
- Verified: `bun run typecheck` (apps/studio) passes; `bun run format` + `bun run lint` (biome) pass; `bun run test:it:bun` passes (2 files, 2 tests, 5 expect calls). Full `bun test` shows 541 pass / 2 fail / 1 skip — the 2 failures (`server.test.ts` US-012 vendored tailwind CDN traversal regex) are pre-existing on this branch (confirmed by stashing my changes and re-running) and unrelated to this story.
- **Learnings for future iterations:**
  - The cli.ts top-level matches sub-command on `argv[0]`; `--help` falls through to the default help path. Using `runCli(['--help'])` is the canonical smoke test — short (~100ms) and exercises full module-load.
  - `Bun.spawn`'s `stdin: 'pipe'` returns a `proc.stdin` that has been a `FileSink` on older bun and a `WritableStream` on newer bun. The helper duck-types both so it survives bun upgrades.
  - Biome's formatter splits long union type annotations onto multiple lines automatically — don't fight it; let format own line breaks for type annotations.
  - The pre-existing 2 failures in `server.test.ts` are unrelated to integration test work. If future stories add tests in that file, fix the regex match (`%2F` URL-decoded inside Hono routes) — but it's out-of-scope for US-002.
---

## 2026-05-21 - US-003
- Implemented MCP stdio client support helper (US-003).
- Files created:
  - `apps/studio/integration/support/mcp-client.ts` — exports `McpClient { listTools(), callTool(name, args), close() }` and `spawnMcpClient(env?)`. Wraps `@modelcontextprotocol/sdk` Client + StdioClientTransport. Spawns `bun apps/studio/src/mcp-shim.ts` (not `mcp.ts`, not `bin/seeflow-mcp` — see Codebase Patterns). Filters undefined out of process.env before passing to the transport. `close()` calls `client.close()` under a 2s race, then SIGKILLs the captured child pid if it overshot.
  - `apps/studio/integration/mcp-client.it.ts` — boots studio via `spawnStudio()`, spawns mcp client with `SEEFLOW_STUDIO_URL=${baseURL}/mcp`, calls `listTools()`, asserts non-empty + contains `seeflow_list_flows`.
- Verified: `bun run format` + `bun run lint` (biome) pass; `cd apps/studio && bun run typecheck` passes; `bun run test:it:bun` passes (3 files, 3 tests, 8 expects, 2.55s); full `bun test` suite still passes (1580 pass / 1 skip / 0 fail). No orphan `mcp-shim.ts` processes after the run (verified with `ps auxww | grep mcp-shim.ts`).
- **Learnings for future iterations:**
  - The PRD listed `apps/studio/src/mcp.ts` as a candidate child-process entry; that file is the `createMcpServer()` factory, not a runnable process. The real stdio entry is `apps/studio/src/mcp-shim.ts`. Using the shim directly (rather than `bin/seeflow-mcp`) skips an extra node→bun `spawnSync` layer and yields a single child pid.
  - The mcp-shim is a *proxy* to the studio's HTTP `/mcp` endpoint — it does not run tools in-process. Tests MUST spawn a studio first and pass `SEEFLOW_STUDIO_URL=${baseURL}/mcp` via env, otherwise the shim defaults to `http://127.0.0.1:4321/mcp` and the test would hit any other studio running on that port (or fail with STUDIO_NOT_RUNNING).
  - PRD said to assert the listTools array contains `listDemos`. Actual MCP tool names are prefixed `seeflow_*` (e.g. `seeflow_list_flows`). Always grep `apps/studio/src/mcp.ts` for `name:` strings before authoring assertions — the PRD's tool-name suggestions are from an older API.
  - StdioClientTransport's `env: Record<string, string>` doesn't accept `process.env` directly (its values are `string | undefined`). Filter out undefined entries before spreading.
  - `transport.pid` is reset to null inside `close()` (the transport sets `this._process = undefined` first). Capture the pid up-front if you need it for a fallback SIGKILL.
  - `Client.close()` cascades to `transport.close()` which already does end-stdin → SIGTERM → SIGKILL with internal 2s+2s waits. The 2s ceiling in this helper is a belt-and-braces guard, not the primary cleanup path.
---

## 2026-05-21 - US-004
- Implemented SSE client support helper (US-004).
- Files created:
  - `apps/studio/integration/support/sse-client.ts` — exports `SseEvent { event, data, id? }`, `SseClient { events, waitFor, close }`, and `connectSse(baseURL, path = '/api/events')`. Wraps `fetch` with `Accept: text/event-stream`, drains `response.body` via getReader+TextDecoder in the background, normalizes `\r\n` → `\n`, splits on `\n\n` for frame boundaries, and parses SSE fields per spec (event/data/id, comment lines starting with `:`, single leading space trimmed). `waitFor` polls events.find every 25ms with 5s default; on timeout throws with the last 3 `event:data` pairs for fast triage. `close()` aborts the fetch via AbortController.
  - `apps/studio/integration/sse-client.it.ts` — boots studio via `spawnStudio()`, creates a flow via POST `/api/projects` (since `/api/events` rejects unknown flowIds with 404), connects SSE to `/api/events?flowId=<id>`, asserts the initial `hello` frame parses (flowId + ts), POSTs to `/api/emit` with `{flowId, nodeId, status:'running', runId}`, and asserts `node:running` arrives within 2s with the right nodeId+runId.
- Verified: `bun run format` + `bun run lint` (biome) pass; `cd apps/studio && bun run typecheck` passes; `bun run test:it:bun` passes (4 files, 4 tests, 13 expects, ~2.8s); full `bun test` suite passes (1580 pass / 1 skip / 0 fail — the registry "JSON Parse error" log is the documented thrown-and-caught case). No orphan `cli.ts` processes after the run.
- **Learnings for future iterations:**
  - The PRD signature `connectSse(baseURL, path?)` is intentionally generic — the helper defaults `path` to `/api/events`, but every real test needs to append `?flowId=<id>` because the route is flow-scoped (returns 400 missing / 404 unknown). Don't try to push flowId into the helper's API; let callers compose the path.
  - Always wait for the initial `hello` frame before POSTing to `/api/emit`. The route writes hello immediately when the subscription registers; if you emit too early, the broadcast can land before `events.subscribe(flowId, …)` is actually wired and you miss it. Waiting on `hello` is a cheap handshake.
  - `data:` payloads from studio SSE are JSON-encoded — the parser intentionally leaves `data` as a raw string. Tests do `JSON.parse(evt.data)`; this keeps the helper agnostic to payload shape and lets it be reused for non-JSON SSE streams later.
  - When draining via `body.getReader()`, wrap the read loop in try/catch — the loop will throw with "BodyStreamBuffer was aborted" or similar when `close()` aborts the fetch. That's not a failure, it's the documented exit path.
  - Bun's `Bun.sleep(25)` is the right sleep primitive in test helpers (it's a real timer, not a busy-wait). Don't reach for `setTimeout` + Promise wrappers inside a hot polling loop.
  - The `/api/emit` body fields `payload` (record) get spread onto the broadcast event payload under `nodeId`. `runId` lands as its own key. So the SSE frame's `data` parses as `{ ts, nodeId, runId?, ...payload }` — match assertions against that exact shape.
---

## 2026-05-21 - US-005
- Implemented REST integration tests for flow-lifecycle routes (US-005).
- Files created:
  - `apps/studio/integration/rest.it.ts` — one shared studio per file (beforeAll/afterAll). Seven happy-path tests, one per route from the acceptance criteria. Each mutating test uses `uniqueFlowId(testName)` for parallel safety.
    - GET `/healthz` → 200 + `{ status: 'ok' }`.
    - POST `/api/projects` → 200 + `{ id, slug, scaffolded: true }`; asserts on-disk `${studio.workspace}/<slug>/.seeflow/flow.json` exists and contains `{ version: 2, name, nodes: [], connectors: [] }`.
    - GET `/api/flows` → 200; list includes the just-created flow with matching slug/name and `valid: true`.
    - GET `/api/flows/:id` → 200 + `{ id, slug, name, filePath, flow, valid, error }`; asserts `valid: true`, `error: null`, `flow.name === name`, and `filePath` ends with `flow.json`.
    - POST `/api/flows/register` → 200 + `{ id, slug, sdk }`. Writes a flow.json under `${studio.home}/<slug>/.seeflow/flow.json` (sibling of workspace), registers via absolute `repoPath` + relative `flowPath`. Verifies via GET `/api/flows`.
    - POST `/api/flows/validate` → 200 + `{ ok: true, issues: [], stats: { nodeCount: 0, connectorCount: 0 } }`.
    - DELETE `/api/flows/:id` → 200 + `{ ok: true }`. Asserts: removed from list, GET returns 404, and (per actual deleteFlowImpl behavior) flow.json on disk is preserved.
- Verified: `bun run format` + `bun run lint` (root biome) pass; `cd apps/studio && bun run typecheck` passes; `bun run test:it:bun` passes (5 files, 11 tests, 60 expects, ~2.9s); full `bun test` suite passes (1580 pass / 1 skip / 0 fail — the registry "JSON Parse error" log is the documented thrown-and-caught case). No new orphan `cli.ts` processes after the run (the 3 existing `bun --hot run` processes are the user's `bun run dev`, NOT artifacts of integration tests).
- **Learnings for future iterations:**
  - The PRD's route names `POST /api/flows/:id/register` and `POST /api/flows/:id/validate` don't exist — the real routes are `/api/flows/register` and `/api/flows/validate` (both without `:id`). Always grep `apps/studio/src/api.ts` for the actual `api.post`/`api.get` lines before writing assertions.
  - `deleteFlowImpl` is registry-only — it does NOT delete the on-disk flow.json. The PRD's "removes from disk + registry" wording is loose; the test should assert registry-side state and file preservation.
  - Empty `{ version: 2, name: '...', nodes: [], connectors: [] }` validates clean against both `FlowSchema` (used by POST `/api/projects` scaffold) and `ResolvedFlowSchema` (used by POST `/api/flows/validate`'s validateDemo). Useful as a smoke-test fixture for any new lifecycle integration test.
  - For tests that need to register a flow WITHOUT going through `/api/projects`, write the flow.json under `${studio.home}/<slug>/.seeflow/` (sibling of `studio.workspace`) so the project file isn't inside the registry's auto-watched workspace path. Then call `/api/flows/register` with the absolute `repoPath`.
  - The PRD permits the "shared harness" pattern at the file level — every mutating test creates its own flow via `uniqueFlowId`, which already prefixes with `it-` and suffixes with a nanoid for parallel-safety. No need for a per-test harness in this file.
  - `pgrep -fl 'cli.ts'` will return any long-running studios on the dev machine (e.g. `bun run dev`). To check for actual orphans from integration tests, filter for processes matching the harness invocation: `pgrep -fl 'cli.ts start --port'` (the harness uses `--port`, dev uses `--hot run`).
---

## 2026-05-21 - US-006
- Implemented REST integration tests for node + connector mutations (US-006).
- Files modified:
  - `apps/studio/integration/rest.it.ts` — added two new top-level describe blocks (`integration: REST — nodes`, `integration: REST — connectors`) with 10 happy-path tests. Added small helpers: `readFlowJson`, `readStyleJson`, `postJson`, `patchJson`, `seedShapeNodes`. Each test creates its own flow via `uniqueFlowId(testName)` and reuses the existing file-level shared studio harness.
    - Node tests: POST `/nodes` (single add), POST `/nodes/bulk` (3 nodes), PATCH `/nodes/:nodeId` (label change → flow.json), PATCH `/nodes/:nodeId/position` (x/y → style.json), PATCH `/nodes/:nodeId/order` (toFront on 3-node seed), DELETE `/nodes/:nodeId` (cascade: removes 2 adjacent connectors).
    - Connector tests: POST `/connectors` (single default), POST `/connectors/bulk` (default + event), PATCH `/connectors/:connId` (label → flow.json), DELETE `/connectors/:connId` (nodes preserved).
- Verified: `bun run format` + `bun run lint` (root biome) pass; `cd apps/studio && bun run typecheck` passes; `bun run test:it:bun` passes (5 files, 21 tests, 122 expects, ~3.0s); full `bun test` suite passes (1580 pass / 1 skip / 0 fail — the registry "JSON Parse error" log is the documented thrown-and-caught case). No orphan `cli.ts start --port` processes.
- **Learnings for future iterations:**
  - PRD said "POST nodes/:nodeId/position" / "POST nodes/:nodeId/reorder", but the actual routes are PATCH and `/order` (not `/reorder`). The reorder body is a discriminated union on `op` — `{op:'toFront'}` is the smallest happy-path payload. Always grep `apps/studio/src/api.ts` for the actual `api.patch`/`api.post` lines and `apps/studio/src/operations.ts` for the Zod schema before authoring assertions.
  - `position` lands in style.json (not flow.json) via `splitFlow` in `merge.ts`. The full routing tables (NODE_DATA_FLOW_KEYS, NODE_STYLE_KEYS, CONNECTOR_FLOW_KEYS, CONNECTOR_STYLE_KEYS) decide where every field goes — see the Codebase Patterns block at the top of this file. Without checking this, the on-disk assertion would silently fail (file exists, but the field isn't where you'd guess).
  - Empty default position (`{x:0,y:0}`) creates a non-empty style.json on the very first node add — splitFlow writes any non-empty styleEntry. So `readStyleJson(slug)` works without seeding extra style fields.
  - The shapeNode is the cheapest seed shape — `{type:'shapeNode',data:{shape:'rectangle'}}` is the entire payload. Useful for tests where the node itself isn't the assertion target (e.g. reorder, connector-edit tests need 2+ nodes for valid source/target refs).
  - `withFlowWriteLock` serializes all mutations per flowId; by the time the HTTP response lands in the client, `writeFileAtomic` has completed. Tests can `await fetch(...)` then immediately read the file — no sleeps, no polling.
  - The connector cascade on node delete is enforced at the `deleteNodeImpl` layer, not the schema — it strips every connector with `source === id` OR `target === id` BEFORE the post-mutation ResolvedFlowSchema parse. Test asserts that two connectors both referencing the deleted node disappear together.
  - `Bun.file(path).text()` is the idiomatic read in bun-test integration code (matches `connector-helpers.ts`/`mcp-shim.ts` style). Don't reach for `readFileSync` unless you specifically need sync semantics.
---

## 2026-05-21 - US-007
- Implemented REST integration tests for runtime surfaces — play, emit, SSE (US-007).
- Files modified:
  - `apps/studio/integration/rest.it.ts` — added `integration: REST — runtime (play / emit / SSE)` top-level describe with 4 tests:
    - POST `/api/flows/:id/play/:nodeId` — seeds a playNode with `playAction.scriptPath = 'scripts/play.ts'`, writes a tiny `console.log(JSON.stringify({hello:'play'})); process.exit(0);` script under `${workspace}/${slug}/.seeflow/nodes/${nodeId}/scripts/play.ts`, opens SSE, POSTs /play, asserts the response has `{runId, status: 200, body: {hello:'play'}}` and the matching `node:done` SSE event arrives with `nodeId`+`runId`+`status:200`+parsed body.
    - POST `/api/emit` — calls 3× with the same body, asserts 200 + `{ok:true}` each time (idempotent stateless broadcast).
    - GET `/api/events` (SSE) — opens via the sse-client helper, POSTs to /api/emit with `status:'done'`+`payload:{status:201}`, asserts `node:done` arrives within 2s with the right `nodeId`/`runId`/`status` (`payload` keys are spread onto the broadcast event payload).
    - External flow.json edit — writes a modified flow.json directly under `${workspace}/${slug}/.seeflow/flow.json` with a single shapeNode (NO `position` — strict schema rejects it on disk), asserts `flow:reload` SSE event arrives within 3s with `valid:true` and the new `flow.nodes` includes `ext-1`.
- Verified: `bun run format` + `bun run lint` (root biome) pass; `cd apps/studio && bun run typecheck` passes; `bun run test:it:bun` passes (5 files, 25 tests, 149 expects, ~3.3s); full `bun test` suite passes (1580 pass / 1 skip / 0 fail — registry JSON Parse error log is the documented thrown-and-caught case). No orphan `cli.ts start --port` processes.
- **Learnings for future iterations:**
  - The watcher's broadcast event name is `flow:reload` (NOT `flow:reloaded` as the PRD acceptance criteria hint). Always grep `apps/studio/src/watcher.ts` for the actual `events.broadcast({ type: ... })` lines.
  - flow.json on disk uses the **strict** FlowSchema — `position` is split to style.json and Flow*NodeDataSchema variants all use `.strict()`. When writing flow.json directly (external-edit test, register-flow test, fixtures), omit `position` and any style fields, or the watcher's reparse will broadcast `valid: false`. The error path of `broadcastReload` puts the message in `payload.error` (helpful for debugging similar future tests).
  - playNode requires `name`, `kind`, `stateSource`, AND `playAction`. The smallest valid playNode payload for runtime tests is `{ id, type:'playNode', data:{ name:'Play', kind:'http', stateSource:{kind:'request'}, playAction:{kind:'script', interpreter:'bun', scriptPath:'scripts/play.ts'} } }`.
  - `playAction.scriptPath` is relative to `<repoPath>/.seeflow/nodes/<nodeId>/`, NOT to `<repoPath>/.seeflow/`. So `scriptPath: 'scripts/play.ts'` (not `'nodes/play-it-1/scripts/play.ts'`). The node directory is created as a side effect of `addNodeImpl`'s `detail` externalization, so test ordering: (1) POST node, (2) `mkdirSync` + `writeFileSync` the script.
  - When `script` exits 0, runPlay calls `JSON.parse(stdout)` — print a single JSON line + `process.exit(0)` and the response body comes back as the parsed object. Useful for round-tripping a sentinel value through both the HTTP response and the `node:done` SSE event in one assertion pass.
  - Open SSE BEFORE POSTing /api/flows/:id/play — the broadcast registration happens in the SSE route handler under `events.subscribe(flowId, ...)`. The route writes a `hello` frame immediately on subscribe; wait for `hello` before triggering work to avoid a TOCTOU on subscription registration.
  - /api/emit requires `flowId` to be in the registry (404 otherwise). Each runtime test creates its own project first via /api/projects to satisfy this. `nodeId` does NOT need to exist on the flow — emit is a pure broadcast.
  - /api/emit's `payload` field gets spread onto the broadcast event payload alongside `nodeId` + optional `runId`. So sending `payload: {status: 201}` yields an SSE frame data of `{ts, nodeId, runId, status: 201}`.
  - The watcher's own-write echo dedupe is content-hash-based (SHA-256 of `flow.json + '\0' + style.json` bytes). External writes from outside the studio's mutation paths (i.e. our test's `writeFileSync` directly to flow.json) won't have a matching hash in the ring, so the broadcast fires after the 100ms debounce. createProjectImpl uses `writeFileSync` directly (not through `mutateMergedFlowAndBroadcast`/`notifyWritten`), so no scaffold hash is in the ring when the test starts.
---

## 2026-05-21 - US-008
- Implemented CLI integration tests — one per subcommand (US-008).
- Files created:
  - `apps/studio/integration/cli.it.ts` — one shared studio per file (beforeAll/afterAll). 30 tests across 6 describe blocks covering every subcommand currently dispatched in `apps/studio/src/cli.ts`:
    - **meta**: `--help` (asserts all 24 subcommand names appear in the banner), `help` subcommand form, `-h`, `--version`, `-v`, `version` subcommand form, unknown-subcommand exit-1.
    - **projects + flows**: `projects:create --name`, `flows:list`, `flows:get`, `flows:delete`, `flows:layout --json '{}'`, `register --path`, `flows:register --path` (alias of register), `flows:play` (seeds a playNode + scripts/play.ts and asserts the body round-trip).
    - **nodes**: `nodes:add --json`, `nodes:add-bulk --json`, `nodes:patch --json`, `nodes:move --x --y`, `nodes:reorder --op toFront`, `nodes:delete`.
    - **connectors**: `connectors:add`, `connectors:add-bulk`, `connectors:patch`, `connectors:delete`.
    - **validate**: `validate --file <flow.json>`.
    - **e2e**: `e2e <flowId>` against an empty flow (vacuous pass).
    - **lifecycle**: start (verified in --help banner), stop (per-test studio).
  - CLI calls pass `SEEFLOW_STUDIO_URL=${studio.baseURL}` + `SEEFLOW_WORKSPACE=${studio.home}` so `studioUrlOrDie` short-circuits out of daemon-spawn and `seeflowHome()` resolves to the studio's state dir. `parseOkLine` helper picks the last non-empty stdout line to parse the `{"ok":true,...}` JSON envelope.
- Files modified:
  - `apps/studio/src/cli-e2e.ts` — fixed a pre-existing bug: read `demoData.demo` but the API response key is `flow` (per FlowGetResponse in operations.ts). Renamed `DemoShape` → `FlowBody` and `demo?` → `flow?` so `seeflow e2e <id>` actually works. The bug was surfaced by the integration test (it always exited 1 with "demo not valid"). No other consumers of the old shape — `validateEndToEnd` is only imported from cli.ts:797.
- Verified: `bun run format` + `bun run lint` (root biome) pass; `cd apps/studio && bun run typecheck` passes; `bun run test:it:bun` passes (6 files, 54 tests, 292 expects, ~6.7s); full `bun test` suite still passes (1580 pass / 1 skip / 0 fail — registry JSON Parse error log is the documented thrown-and-caught case). No orphan `cli.ts start --port` processes after the run.
- **Learnings for future iterations:**
  - The CLI's `healthOk` hits `/health` (legacy endpoint, returns `{ok: true}`), NOT `/healthz` (k8s-style, returns `{status: 'ok'}`). Both exist (see `server.ts:87` + `server.ts:93`), but tests that need the CLI's `studioUrlOrDie` to recognize a running studio must ensure `/health` responds.
  - The CLI's `studioUrlOrDie` doesn't refuse to dial out — even with `SEEFLOW_STUDIO_URL` set, `readConfig()` is still called and `config.port` is still passed to `ensureStudioRunning`. The port matters ONLY if the URL is unhealthy. Setting just `SEEFLOW_STUDIO_URL=${studio.baseURL}` is enough.
  - `runStart` calls `writePid(process.pid)` unconditionally AFTER the `wantsForeground` branch — so even `--foreground` mode (used by the test harness) writes a pid file at `${seeflowHome()}/seeflow.pid`. This is what makes the `stop` subcommand work against a harness-spawned studio.
  - The PRD's `validateEndToEnd` response key was always `demo` — that was the historical name. After the rename to `flow`, the cli-e2e code wasn't updated. Lesson: when renaming API response fields, also grep for all `.demo` access patterns across cli/skill/UI consumers; in-process parity tests don't catch this because they wire directly to impl functions, but integration tests over the wire do.
  - `seeflow e2e <id>` against a flow with no playNodes/statusNodes is a vacuous pass — both arrays are empty so `allPlaysOk && allStatusesOk` are both true. Sufficient for the subcommand-wiring smoke test (arg parsing, GET /api/flows/:id, SSE channel open, printOk).
  - `loadBody()` requires EXACTLY one of `--json`, `--file`, `--stdin` — passing empty `--json '{}'` satisfies that and exercises the "use defaults" path on subcommands like `flows:layout`.
  - The CLI's `stop` subcommand prints both "Stopped studio (pid N)." (success) AND clears the pid file. Assert on both: stdout contains the message + pid, and `existsSync(pidPath)` is false after.
---

## 2026-05-21 - US-009
- Implemented MCP integration tests — list_tools + one call per tool (US-009).
- Files created:
  - `apps/studio/integration/mcp.it.ts` — one shared studio + one MCP stdio client per file. 18 tests across 5 describe blocks: tools/list (2 — sorted-name snapshot + per-tool schema sanity), read-only tools (3 — `seeflow_list_flows`, `seeflow_get_flow`, `validate_seeflow`), project + flow lifecycle (3 — `seeflow_create_project`, `seeflow_register_flow`, `seeflow_delete_flow`), node tools (6 — `seeflow_add_node`, `seeflow_add_nodes`, `seeflow_patch_node`, `seeflow_move_node`, `seeflow_reorder_node`, `seeflow_delete_node`), connector tools (4 — `seeflow_add_connector`, `seeflow_add_connectors`, `seeflow_patch_connector`, `seeflow_delete_connector`). Includes a small `okJson<T>(result: CallToolResult)` helper that asserts non-error + parses the text content.
- Verified: `bun run format` + `bun run lint` (root biome) pass; `cd apps/studio && bun run typecheck` passes; `bun run test:it:bun` passes (7 files, 72 tests, 444 expects, ~9.1s); full `bun test` from repo root still passes (1580 pass / 1 skip / 0 fail — the registry "JSON Parse error" log is the documented thrown-and-caught case). No orphan `cli.ts start --port` or `mcp-shim.ts` processes after the run.
- **Learnings for future iterations:**
  - The MCP `validate_seeflow` tool wraps `validateImpl` DIRECTLY (no REST envelope). On success it returns just `{ ok: true }`; on failure `{ ok: false, issues: [...] }`. Do NOT assume the same shape as the REST `/api/flows/validate` response (which has extra `stats` + `warnings` fields) — that envelope is added by the REST handler in diagram.ts, not by validateImpl.
  - MCP tool result envelope: every CallToolResult on success is `{ content: [{ type: 'text', text: '<JSON-stringified payload>' }] }` per `okResult` in mcp.ts. On failure it's `{ isError: true, content: [{ type: 'text', text: '<message>' }] }`. A tiny `okJson<T>` helper that asserts not-error and parses the text is the cleanest abstraction — keeps per-test code focused on shape assertions.
  - For MCP integration tests, seed preconditions via REST (POST `/api/projects` is battle-tested in rest.it.ts and `seedShapeNodesViaRest` uses POST `/nodes/bulk`). Then call the MCP tool with minimal valid args. For tools that ARE the seed mechanism (`seeflow_create_project`, `seeflow_register_flow`), the MCP tool itself IS the test's seed step.
  - Snapshot the sorted tool name list with `expect(names).toEqual([...])` so future tool additions/removals fail loudly. The expected list is intentionally maintained alongside the per-tool tests — diffs in both places signal a coordinated add/remove.
  - The mcp-shim child process is reused across all tests in one file via a single beforeAll/afterAll. This is fine because every tool is stateless at the shim level (each `tools/call` builds a fresh studio-side MCP server in stateless mode per the /mcp Streamable HTTP transport).
  - Reorder tool args: `{ flowId, nodeId, op, index? }` — `index` is required only for `op:'toIndex'`. The discriminated union in ReorderNodeInputSchema lets the agent introspect a oneOf JSON Schema, but tests just pass `{op:'toFront'}` for the smallest happy-path.
  - addConnectorImpl returns `{ ok, id }` (just the id, no full connector); addConnectorsBulkImpl returns `{ ok, connectors: [{id, connector}, …] }` (mirrors addNodesBulkImpl shape). When asserting MCP bulk response, expect the `nodes`/`connectors` array of `{id, ...}` entries — NOT just an array of ids.
---

## 2026-05-21 - US-010
- Implemented cross-boundary edge tests (US-010).
- Files created:
  - `apps/studio/integration/edges.it.ts` — 5 tests, each spawning its own studio (each scenario manipulates process lifecycle so a shared harness would interfere). 37 expects, ~1.3s.
    - **SIGTERM during in-flight nodes:add-bulk** — POST `/api/flows/:id/nodes/bulk` for 30 nodes WITHOUT awaiting. Send SIGTERM via `process.kill(studio.pid, 'SIGTERM')` externally (NOT `studio.stop()`, because the harness's stop rmSync's the home before we can inspect). Poll `process.kill(pid, 0)` until ESRCH (max 2s). Re-read flow.json and assert `nodes.length` is either 0 or 30 — `writeFileAtomic` (write-temp + rename) is torn-write proof, never partial. Finally `await studio.stop()` to clean up file handles + tmp dir (sees `proc.exitCode !== null` and skips kill).
    - **Double-start guard** — Use foreground mode (default daemon mode would just probe /health first and exit 0 with "Studio already running"). `runCli(['start', '--port', port, '--host', '127.0.0.1', '--foreground'], {env:{SEEFLOW_WORKSPACE: separateHome}, timeoutMs:5000})`. `Bun.serve` throws synchronously on EADDRINUSE, propagating as a non-zero exit. Assert `code !== 0` AND combined stdout+stderr matches `/EADDRINUSE|in use|already.*running|address.*in use/i`.
    - **External edit triggers SSE flow:reload within 1s** — same shape as US-007's external-edit test but tighter (1s vs 3s). Watcher debounce is 100ms so 1s is plenty. Don't include `position` on the on-disk node — strict FlowSchema rejects it (lives in style.json after splitFlow).
    - **Parallel PATCH /position** — Seed 1 shape node, fire 10 concurrent PATCHes with distinct {x:i*11, y:i*23} via Promise.all. `withFlowWriteLock` serializes the mutations per flowId; all 10 return 200. Read style.json (position lives there per NODE_STYLE_KEYS), assert persisted `{x,y}` equals exactly one input verbatim (last-writer-wins). Also re-read flow.json to confirm no torn JSON.
    - **Stale PID recovery** — Pass `home: mkdtempSync(...)` to BOTH spawns so studio B sees A's stale pid file. SIGKILL A directly (NOT `studio.stop()`, which would rmSync the home A and B share). Poll waitForPidDead. Assert `existsSync(pidPath)` AND `Number(readFileSync(pidPath))` still equals A.pid (A had no chance to clear it). Then `spawnStudio({home})` again → B comes up healthy AND overwrites the pid file with B.pid. Teardown: `await studioB.stop()` rmSyncs home; `await studioA.stop().catch(() => undefined)` no-ops on the already-gone home.
- Files modified: none (pure additive test).
- Verified: `bun run format` + `bun run lint` (root biome) pass; `cd apps/studio && bun run typecheck` passes; `bun run test:it:bun` passes (8 files, 77 tests, 481 expects, ~10.4s); full `bun test` suite passes (1580 pass / 1 skip / 0 fail — the registry/events thrown-and-caught logs are documented thrown-and-caught cases). No orphan `cli.ts start --port` processes after the run.
- **Learnings for future iterations:**
  - The harness's `stop()` unconditionally rmSync's home AFTER process exit. Tests that need to read on-disk state AFTER the process dies must bypass `stop()` for the kill — send the signal externally via `process.kill(studio.pid, 'SIGTERM'|'SIGKILL')`, poll `process.kill(pid, 0)` for ESRCH, THEN inspect the workspace, THEN call `studio.stop()` (which sees `proc.exitCode !== null` and skips the kill, runs rmSync).
  - `Bun.serve` throws synchronously on EADDRINUSE in `runStart` — propagates as an unhandled rejection → exit code 1 with stderr containing "EADDRINUSE" / "in use" / similar. ONLY foreground mode hits this — default daemon mode probes `/health` first via `spawnDaemon`/`healthOk` and exits 0 with "Studio already running". For port-collision integration tests, use `--foreground`.
  - `runStart --foreground` writes pid UNCONDITIONALLY after `serve()` succeeds — no pre-flight check on existing pid file, so a stale pid from a SIGKILL'd predecessor is silently overwritten. Stale-pid integration tests should assert (a) pid file still exists after SIGKILL, (b) new spawn overwrites it with its own pid, (c) /healthz responds — NOT that the stale entry was "cleaned" before the new startup.
  - The harness's `home: opts.home ?? mkdtempSync(...)` option lets multiple spawns share a workspace across a single test (e.g. stale-PID recovery). When sharing, do NOT call A's `stop()` until B is also done — A's stop rmSync's the home mid-B-run. Sequence: SIGKILL A externally → spawn B → run assertions → `await B.stop()` (rmSync's home) → `await A.stop().catch(()=>undefined)` (no-op on gone home; `rmSync force:true` is idempotent).
  - Parallel PATCH last-write-wins is enforced by `withFlowWriteLock` (per-flowId serialization) → `writeFileAtomic` (write-temp + rename = torn-write proof). 10 concurrent PATCHes can be asserted with `expect(inputKeys).toContain(persistedKey)` where keys are `${x},${y}` strings — exact equality to one input verbatim, never a torn mix.
  - For SIGTERM-mid-flight atomicity tests, fire the request via `fetch(...).catch(() => null)` so the test's `finally` block can `await` it without throwing when the server's connection drops mid-write. The actual atomicity assertion happens AFTER `waitForPidDead` (not after the fetch settles) so the read is always against the on-disk state at the moment of process death.
---

## 2026-05-21 - US-011
- Implemented canvas data attributes for Playwright selectors (US-011).
- Files modified:
  - `packages/canvas/src/nodes/play-node.tsx` — added `data-node-type="playNode"` to root <div>.
  - `packages/canvas/src/nodes/state-node.tsx` — added `data-node-type="stateNode"` to root <div>.
  - `packages/canvas/src/nodes/shape-node.tsx` — added `data-node-type="shapeNode"` to root <div>.
  - `packages/canvas/src/nodes/image-node.tsx` — added `data-node-type="imageNode"` to root <div>.
  - `packages/canvas/src/nodes/html-node.tsx` — added `data-node-type="htmlNode"` to root <div>.
  - `packages/canvas/src/nodes/icon-node.tsx` — added `data-node-type="iconNode"` to root <div>.
  - `packages/canvas/src/edges/editable-edge.tsx` — added a useEffect (deps: id, edgeKind) that imperatively `setAttribute('data-edge-kind', kind)` on the React Flow edge wrapper `.react-flow__edge[data-id=<id>]`. Mirrors the existing `data-handoff` imperative pattern. `data.kind` narrowed via `typeof data?.kind === 'string'` because EditableEdgeData extends `Record<string, unknown>`.
  - `packages/canvas/src/components/seeflow-canvas.tsx` — added `wrapper.setAttribute('data-canvas-ready', 'true')` inside the existing `<ReactFlow onInit>` callback (right after the existing `--rf-zoom` style set, BEFORE the mount-fit branch). Canvas root is the outer `<div className="seeflow-canvas-root">` referenced via the existing `wrapperRef`.
- Verified: `bun run typecheck` (all 4 workspaces) passes; `bun run format` + `bun run lint` (root biome) pass; `bun test` (full suite) passes (1580 pass / 1 skip / 0 fail); `bun run test:it:bun` (integration) passes (77 tests / 481 expects).
- **Learnings for future iterations:**
  - The 6 node types referenced in the PRD (playNode/stateNode/shapeNode/imageNode/htmlNode/iconNode) all live in `packages/canvas/src/nodes/` — `apps/web` does NOT have a `components/nodes/` directory. `apps/web/src/pages/demo-view.tsx` imports `SeeflowCanvas` from `@seeflow/canvas` and the package owns every node + edge renderer. The PRD's "grep apps/web/src for the existing node component files" hint is misleading; always grep `packages/canvas/src/nodes/` for the actual sources.
  - Edge data attributes can't be added via JSX in EditableEdge — the React Flow `.react-flow__edge` `<g>` wrapper is a SIBLING render path owned by xyflow. The existing solution for `data-handoff` (and now `data-edge-kind`) is a useEffect that does `document.querySelector('.react-flow__edge[data-id=<id>]')` + `setAttribute()`. Same pattern for any future stable-test-attribute on edges.
  - `EditableEdgeData` extends `Record<string, unknown>` so `data?.kind` types as `unknown`. Narrow via `typeof data?.kind === 'string' ? data.kind : undefined` rather than casting — keeps the access type-safe and silently no-ops if the kind ever gets dropped from connector-to-edge.ts.
  - Per the `packages/canvas/CLAUDE.md` hook-shim note: useState calls in `SeeflowCanvas` MUST stay in order or every downstream test's `useStateOverrides[N]` index shifts (12 documented slots). For one-shot canvas-ready signaling, set the attribute imperatively in `<ReactFlow onInit>` via the existing `wrapperRef` — no new useState needed, no test impact.
  - The `seeflow-canvas-root` wrapper (line 3942) is the right hook for `data-canvas-ready`. `onInit` fires inside `<ReactFlow>` after first layout — exactly the readiness signal Playwright needs. Setting the attribute imperatively also keeps the data attribute on the same DOM node Playwright's existing `data-testid="seeflow-canvas"` selector already finds.
  - All three attributes are passive metadata — no production behavior change. The diff is +30 / -0 across 8 files. Existing tests catch nothing because no test assertions specifically check for these attributes (yet) — US-013 adds Playwright spec assertions that DO check `[data-node-type=...]` / `[data-edge-kind=...]` / `[data-canvas-ready="true"]`. So adding these attributes WITHOUT corresponding spec changes is the right shape: foundation first, then assertions.
  - Browser verification not run in this autonomous loop (no headed dev-browser tool wired into the Ralph harness here). The dev server (`bun run dev`) is already running on the user's box (ports 5173 + 4321 per the integration-test learnings) — manual visual confirmation in DevTools that each `.react-flow__node` has `data-node-type` and each `.react-flow__edge` has `data-edge-kind` after layout settles is recommended before merging US-013 (which writes the Playwright specs that depend on these attributes).
---

## 2026-05-21 - US-012
- Implemented kitchen-sink fixture + Playwright config + e2e scaffold (US-012).
- Files created:
  - `apps/studio/integration/fixtures/kitchen-sink.flow.json` — 6 nodes (one of each type: playNode/stateNode/shapeNode/imageNode/htmlNode/iconNode), 4 connectors covering all kinds (http/event/queue/default), deterministic ids n1..n6 + c1..c4, positions on a 2-column grid (x ∈ {100,500}, y ∈ {100,300,500}). Authored as a ResolvedFlow (positions inline).
  - `apps/studio/integration/fixtures/scripts/noop.ts` — `process.exit(0);` one-liner so the playNode's playAction.scriptPath resolves.
  - `apps/studio/integration/fixtures.it.ts` — 3 tests: (1) ResolvedFlowSchema.parse + type/kind counts, (2) splitFlow → FlowSchema + StyleSchema parse, (3) script file exists + playNode references it.
  - `apps/studio/e2e/playwright.config.ts` — CI-aware (retries 0/2, workers undefined/1, traces+video retain-on-failure, github+html reporter on CI), single chromium project, viewport 1280×800, outputDir relative-up to `../integration/.artifacts/playwright`.
  - `apps/studio/e2e/support/studio-fixture.ts` — worker-scoped `studio` fixture. Wraps the existing `spawnStudio()` harness; splits the kitchen-sink fixture; writes flow.json + style.json + nodes/n1/scripts/noop.ts under `${studio.home}/kitchen-sink/.seeflow/`; POSTs /api/flows/register; returns `{ studio, flow: { id, slug, repoPath } }`. Teardown stops the studio (which rmSyncs the tmp home).
- Files modified:
  - `apps/studio/package.json` — added `@playwright/test@^1.60.0` to devDependencies.
  - `apps/studio/tsconfig.json` — extended `include` to cover `integration/**/*` and `e2e/**/*` so typecheck actually verifies the new code.
  - `apps/studio/integration/mcp.it.ts` — fixed pre-existing TS18048 (`tool.description` possibly undefined). Used `(tool.description ?? '').length`.
  - `apps/studio/integration/support/cli-runner.ts` — fixed pre-existing TS2322 (process.env spread typing). Replaced spread with a typed copy loop that filters out undefined values.
- Verified: `bun run format` + `bun run lint` (root biome) pass; `bun run typecheck` passes for all 4 workspaces; `bun run test:it:bun` passes (9 files, 80 tests, 490 expects, ~10s); full `bun test` from repo root still passes (1580 pass / 1 skip / 0 fail); `bunx playwright --version` returns 1.60.0; `bunx playwright test --config=apps/studio/e2e/playwright.config.ts --list` loads the config cleanly (0 tests found, expected since US-013 adds the spec file). No orphan `cli.ts start --port` processes after the run.
- **Learnings for future iterations:**
  - PRD's "FlowSchema.parse(...)" wording for the fixture validation step is contradictory with the PRD's own "deterministic ids and {x,y} positions" criterion — `FlowSchema` is `.strict()` and rejects `position` (position lives in style.json per NODE_STYLE_KEYS). Single-file fixtures with positions MUST be parsed with `ResolvedFlowSchema`. The fixture validation test uses BOTH: ResolvedFlowSchema for the file-as-authored, and FlowSchema + StyleSchema for the splitFlow output so the fixture stays disk-writable through `splitFlow`.
  - PRD's "POST /api/projects + POST /api/flows/:id/register" pattern is impossible to follow literally — `/api/flows/:id/register` doesn't exist (actual route is `/api/flows/register`, no :id, per RegisterBodySchema). For a pre-existing on-disk fixture, `/api/flows/register` alone is sufficient (matches rest.it.ts's register pattern). Don't use POST /api/projects to scaffold + overwrite — that introduces a watcher-debounce race; direct register is race-free.
  - imageNode `path` MUST start with `nodes/<id>/` (schema's superRefine). The fixture's imageNode uses `nodes/n4/cover.png` — the actual file doesn't need to exist for ResolvedFlowSchema.parse to succeed, but the renderer would 404. For US-013 visual baselines, expect a broken-image placeholder unless we ship a real file.
  - Playwright's `test.extend<TestArgs, WorkerArgs>` needs worker-scoped fixtures in the SECOND generic — putting them in the first generic with `{ scope: 'worker' }` typechecks as `scope: 'test'` only. The required empty TestArgs spelling is awkward because biome's `noBannedTypes` forbids `{}`. Use `type EmptyTestArgs = Record<never, never>` (NOT `Record<string, never>` — the latter's index signature widens to `{[K]: never}` and intersects with WorkerArgs to make worker fixture values `never` too, producing cryptic "KitchenSinkStudio not assignable to never" errors at the `use()` call site).
  - Adding `integration/` and `e2e/` to `apps/studio/tsconfig.json`'s `include` is a worthwhile catch — previously these dirs were only "type-checked" at runtime by bun's transpiler. Surfaced two real pre-existing errors that the prior story typechecks missed (mcp.it.ts `tool.description` could be undefined; cli-runner.ts `process.env` spread allows undefined values into a `Record<string, string>`).
  - Worker-scoped Playwright fixtures via `[fn, { scope: 'worker' }]` tuple syntax mean the studio + kitchen-sink registration boot cost is paid once per Playwright worker, not per test. Cheap re-boot via the existing studio harness (~1.5s) but still — for read-only canvas tests (US-013), worker scope is correct.
  - `bunx playwright --version` from the REPO ROOT may resolve to a different hoisted version (e.g. 1.58.0 from another workspace) than `bunx playwright --version` from `apps/studio/` (1.60.0). For deterministic behavior, run playwright commands FROM `apps/studio/` or pass `--config=apps/studio/e2e/playwright.config.ts` so the package-local devDep is preferred.
---

## 2026-05-21 - US-013
- Implemented canvas.e2e.ts — DOM + per-node screenshot baselines (US-013).
- Files created:
  - `apps/studio/e2e/canvas.e2e.ts` — 3 describe-scoped tests under `canvas — kitchen-sink fixture`:
    1. **every node type renders** — asserts `.react-flow__node` count=6, `.react-flow__edge` count=4, and for each of the 6 node types (playNode/stateNode/shapeNode/imageNode/htmlNode/iconNode): locator count=1, toBeVisible, and (for the 5 text-bearing types) not.toHaveText(''). imageNode is in the `TEXTLESS_NODE_TYPES` set because it renders just an `<img>` with no descendant text.
    2. **per-node visual baseline** — for each of the 6 types, `toHaveScreenshot('${type}.png', {maxDiffPixelRatio: 0.02})`. Baselines auto-committed via `--update-snapshots`.
    3. **connector kinds render distinctly** — for each of [http, event, queue, default]: `[data-edge-kind="<kind>"]` count=1.
  - `apps/studio/e2e/canvas.e2e.ts-snapshots/{playNode,stateNode,shapeNode,imageNode,htmlNode,iconNode}-chromium-darwin.png` — 6 baseline PNGs.
- Files modified:
  - `apps/studio/e2e/playwright.config.ts` — added `testMatch: '**/*.e2e.ts'` so Playwright picks up `.e2e.ts` files (and `bun test`'s default `*.spec.ts` discovery skips them).
  - `apps/studio/e2e/support/studio-fixture.ts` — changed the worker-fixture function signature from `async (_args, use) =>` to `async ({}, use) =>` (with `biome-ignore lint/correctness/noEmptyPattern`). Playwright introspects the first arg's source text to discover fixture deps and REQUIRES a destructuring pattern; the previous form threw "First argument must use the object destructuring pattern" at runtime.
  - `apps/studio/dist/web/` — rebuilt SPA bundle. The committed bundle predated US-011 and lacked the data-node-type/data-edge-kind/data-canvas-ready attributes the spec selects on. Rebuilt assets: `index-y9xYmXhn.js`, `index.es-BlI4Nwy0.js`, `jspdf.es.min-CqUzwjsK.js`, `index.html`. Stale assets removed: `index-DYYHV4dv.js`, `index.es-DUzGNrif.js`, `jspdf.es.min-GZtf5JYE.js`.
- Verified:
  - `bun run format` + `bun run lint` (root biome): clean.
  - `bun run typecheck`: all 4 workspaces pass.
  - `bun test` (repo root): 1580 pass / 1 skip / 0 fail / 4129 expects (baseline — the registry "JSON Parse error" log is the documented thrown-and-caught case).
  - `bun run test:it:bun`: 9 files / 80 tests / 490 expects / ~10.5s.
  - `bunx --bun playwright test --config=e2e/playwright.config.ts --update-snapshots` (from `apps/studio/`): 3 passed (~41s — first run, generates baselines).
  - `bunx --bun playwright test --config=e2e/playwright.config.ts` (clean re-run): 3 passed (~39s — verifies stability against committed baselines).
- **Learnings for future iterations:**
  - **File naming pitfall:** `bun test` default discovery picks up `*.spec.ts` files everywhere in the repo, with no `testIgnorePatterns` option in Bun. Naming a Playwright spec `canvas.spec.ts` would cause `bun test` to load it as a bun test, immediately failing on missing `test`/`expect` globals. The fix is `.e2e.ts` suffix + `testMatch: '**/*.e2e.ts'` in `playwright.config.ts`. Future Playwright specs MUST use this convention.
  - **Bun runtime flag for Playwright:** `bunx playwright test ...` runs Playwright under Node, which fails immediately because the harness uses `Bun.spawn` (and the harness/fixture files use `import.meta.dir`, a bun-specific). The correct invocation is `bunx --bun playwright test ...` — the `--bun` flag forces bun as the runtime for the test runner AND its worker processes. Without `--bun`, the first call to `spawnStudio()` throws "Bun is not defined" / "import.meta.dir is undefined".
  - **Playwright fixture deps require destructuring:** the test runner introspects the function source string to find which fixtures the function depends on. `async (_args, use) =>` (named single arg) throws "First argument must use the object destructuring pattern" at fixture-resolve time. For empty-deps worker fixtures, the right form is `async ({}, use) =>` — needs a `biome-ignore lint/correctness/noEmptyPattern: required by Playwright fixture API` comment since Biome's `noEmptyPattern` rule otherwise blocks it. Same rule applies to test-scoped fixtures.
  - **SPA URL is `/d/<slug>`, not `?flow=<slug>`:** the PRD's `${studio.baseURL}/?flow=<slug>` URL is from an older routing draft. The actual SPA route is `/d/<slug>` per `apps/web/src/App.tsx` `matchDemoSlug`. The fixture's `slug` field is the registration slug returned from `/api/flows/register`.
  - **Studio mode hinges on `dist/web/index.html`:** `apps/studio/src/server.ts` `inferMode()` returns `'prod'` if the file exists, else `'dev'` (which proxies to Vite on `:5173`). The integration harness doesn't run Vite, so e2e tests REQUIRE a built SPA. Stale bundles silently fail in a worse way: the page loads but the new data attributes don't render, and the `every node type renders` test fails with count=0 on the locator. Always rebuild via `bun run --filter @seeflow/web build` before running e2e — US-014's orchestrator will automate this.
  - **dist/web bundle convention:** historically dist/web rebuilds land in their own `build: rebuild dist/web [skip ci]` commits, not bundled with feature commits. US-011 added source changes but didn't rebuild — the bundle on this branch was stale. US-013 bundled the rebuild into its commit because the spec requires US-011's data attributes to be present in the rendered DOM. Future feature stories that touch packages/canvas should either rebuild dist/web themselves OR rely on a later "build: rebuild dist/web" commit before any e2e tests are added.
  - **`addStyleTag` after `data-canvas-ready` + `networkidle`:** the disable-transitions CSS is injected AFTER `[data-canvas-ready="true"]` attaches (so React Flow's first layout has run), then `waitForLoadState('networkidle')` waits for any in-flight asset/network calls to settle. Combined, these give Playwright a stable DOM + zero animations to screenshot against. Don't inject CSS before canvas-ready — React Flow's onInit hasn't fired and the layout/positions aren't computed yet.
  - **imageNode has no descendant text:** the kitchen-sink fixture's imageNode renders just `<img>` (which has empty src because `data.projectId` isn't set). The `not.toHaveText('')` assertion in test 1 has to skip imageNode (via `TEXTLESS_NODE_TYPES`); the screenshot baseline still works (the image is just a blank box of default node dimensions).
  - **iconNode label is absolutely positioned below the node:** `data.name` ('Icon') renders in a `<span position:absolute top:full>` below the iconNode root div. The span is still a descendant of the root div, so `toHaveText('Icon')` works for the text assertion. BUT Playwright screenshots the locator's bounding box only — the label is OUTSIDE the iconNode's box and won't appear in the visual baseline. This is fine: the baseline captures the icon itself, which is the meaningful render target.
  - **Per-OS baselines:** snapshots are `<type>-chromium-darwin.png` (macOS). CI runs on Ubuntu and will need `<type>-chromium-linux.png`. Options for US-015/016: (a) generate Linux baselines via the Playwright Docker image as a one-time bootstrap, (b) commit both sets, (c) run visual baselines only on Linux. PRD's US-013 local-verify criterion is met with darwin baselines; cross-OS is deferred to US-015.
---

## 2026-05-21 - US-014
- Implemented orchestrator script + npm scripts + Makefile targets for the integration test tier (US-014).
- Files created:
  - `apps/studio/scripts/test-integration.ts` — orchestrator entry point. Steps: (a) SPA freshness check (recursive newest-mtime walk of `apps/web/src/` vs `dist/web/index.html`, rebuilds via `bun run --filter @seeflow/web build` if stale or missing), (b) generates `runId = new Date().toISOString().replace(/[:.]/g,'-')` + creates `apps/studio/integration/.artifacts/<runId>/` + exports `SEEFLOW_IT_ARTIFACT_DIR=<that>`, (c) runs `bun run test:it:bun` via Bun.spawn (stdio inherited so output streams live), (d) runs `bun apps/studio/node_modules/.bin/playwright test --config=apps/studio/e2e/playwright.config.ts`, (e) writes `run.json` summary `{runId, startedAt, finishedAt, artifactDir, steps:[…], exitCode}`, (f) `process.exit(Math.max(...stepCodes))`.
- Files modified:
  - `package.json` (root): added scripts `test:it` (orchestrator), `test:it:e2e` (playwright only), `test:it:update-snapshots` (regenerate baselines).
  - `Makefile`: added `test.it` + `test.it.update-snapshots` targets, both registered in `.PHONY`. Help line matches the existing `## comment` style so `make help` formats them with the others.
- Verified:
  - `bun run test:it` from REPO_ROOT: 80 integration tests / 490 expects (~12s) + 3 e2e tests (~39s) — orchestrator exits 0, run.json written.
  - `bun run format` + `bun run lint`: clean.
  - `bun run typecheck`: all 4 workspaces pass (sdk, web, canvas, studio).
  - Full unit suite `bun test`: 1583 pass / 1 skip / 0 fail (the registry.test.ts JSON parse log is a thrown/caught test, not a real failure).
- **Learnings for future iterations:**
  - **PRD's `bunx --bun playwright` invocation is broken from REPO_ROOT.** Progress.txt's prior pattern note was incorrect — `bunx` resolves against `~/.bun/install/cache/` and picks up older cached versions (1.58.0 in this checkout) whose CLI surface lacks the `test` subcommand and errors with `unknown command 'test'`. The fix: use `bun apps/studio/node_modules/.bin/playwright test --config=…` (absolute path to the package-local devDep binary, executed under bun's runtime). This is cwd-independent and deterministic. Updated both the orchestrator and the npm scripts to use this pattern, and fixed the corresponding pattern entries in progress.txt's `## Codebase Patterns` section.
  - **PRD step 3 says `bun test apps/studio/integration/` — that doesn't work.** Per the existing codebase pattern (top of progress.txt), bun's directory discovery doesn't pick up `*.it.ts`. The orchestrator delegates to the canonical `test:it:bun` script (`bun test ./apps/studio/integration/*.it.ts`) which uses the glob form that actually works.
  - **SPA freshness check needs a recursive walk.** Comparing `apps/web/src` directory's own mtime would miss file edits in nested subdirs (`components/`, `pages/`, `lib/`, `hooks/`). The orchestrator's `newestMtimeMs(dir)` does a recursive `readdir` + `stat` to find the genuine newest source mtime, then compares to `dist/web/index.html`'s mtime.
  - **Bun.spawn with `stdout/stderr: 'inherit'`** is the right choice for an orchestrator: live output streams (no buffering) so CI logs show progress and test failures land in the right place visually. Capturing the streams into the run.json was rejected — duplicating the (long) test output is noise; the artifact dir already has per-spawn studio logs from the harness.
  - **The `SEEFLOW_IT_ARTIFACT_DIR` env var is set but currently un-consumed by the harness.** Future stories that want all sub-process logs under one runId directory can read it from the harness's `spawnStudio` and pass it through; it's a future hook. For now it's a useful CI hint (one runId → one artifact upload).
  - **Makefile target naming convention is `name.subname`** (dot-separated, e.g. `docker.build`, `docker.run`, `gh.deploy`) — matched that for `test.it` and `test.it.update-snapshots`. Don't use hyphens (`test-it`) which would clash with the docker.* style and the `make help` awk grep regex `[a-zA-Z._-]+:.*## `.
---

## 2026-05-21 - US-015
- Implemented reusable integration workflow + wired into publish.yml (US-015).
- Files created:
  - `.github/workflows/_integration.yml` — reusable workflow on `workflow_call:` (no top-level triggers). Single `integration` job on ubuntu-latest, timeout-minutes 15. Steps: checkout → setup-bun@v2 (bun-version: latest) → `bun install --frozen-lockfile` → actions/cache@v4 (`~/.cache/ms-playwright` keyed on `hashFiles('bun.lock')` + restore-keys fallback) → `bunx playwright install --with-deps chromium` → `bun run --filter @seeflow/web build` → `bun test` (unit) → `bun run test:it` (integration + e2e) with `CI: 'true'` env → `if: failure()` upload-artifact@v4 of `apps/studio/integration/.artifacts/` (retention-days 7, if-no-files-found ignore). Declares its own `permissions: contents: read`.
- Files modified:
  - `.github/workflows/publish.yml` — added a new top-level `integration` job (uses `./.github/workflows/_integration.yml`) above the existing `publish` job; `publish` now declares `needs: integration`. Removed the old `- name: Test\n  run: bun test` step (now covered by `_integration.yml`'s unit-test step). Kept the existing typecheck + lint steps in the publish job with a one-line comment documenting the choice (PRD allowed either-or — chose "keep in publish job" because they're cheap publish-local concerns and don't depend on the integration tier's build artifacts).
- Verified:
  - YAML syntax: `bunx --bun js-yaml < .github/workflows/_integration.yml > /dev/null` → OK; same for publish.yml → OK.
  - `bun run typecheck` (all 4 workspaces: @seeflow/sdk, @seeflow/web, @tuongaz/seeflow, @seeflow/canvas) → all pass (exit 0).
  - `bun run lint` (biome check on 259 files) → No fixes applied (pass).
  - No unintended file changes (`git status --porcelain` shows only the two workflow files).
- **Learnings for future iterations:**
  - **Linux baseline gap is NOT solved by US-015** — the integration workflow will FAIL when it actually runs on CI because `apps/studio/e2e/canvas.e2e.ts-snapshots/` only contains `-chromium-darwin.png` baselines (committed by US-013 locally on macOS). US-017 (final release verification) requires the workflow to exit 0. Before US-017, generate Linux baselines via the Playwright Docker image (`mcr.microsoft.com/playwright:v1.60.0-jammy` running `bun run test:it:update-snapshots`) and commit them, OR restrict the visual-baseline test to Linux-only, OR commit both per-OS sets. Tracked in Codebase Patterns block at the top.
  - **Reusable workflows live under `.github/workflows/_<name>.yml`** (leading underscore convention). `on: workflow_call:` with NO other triggers — adding `on: push:` would make the file ALSO run independently which is not what we want. Caller syntax: `jobs.integration.uses: ./.github/workflows/_integration.yml` (relative path from repo root, NOT `.github/workflows/_integration.yml@<ref>`).
  - **Caller `permissions` do NOT cascade to called workflows for elevated scopes** like `id-token: write`. The reusable workflow declares its own `permissions:` block (`contents: read` here). For `id-token: write` (needed for npm provenance), keep that scope on the calling publish job, not on the reusable workflow.
  - **YAML validation via `bunx --bun js-yaml < <file> > /dev/null`** is the lightweight syntax check called out in PRD US-015 acceptance criteria. It runs in <2s after a one-time bunx download. For full GitHub Actions schema validation, `actionlint` would be the next step up but is NOT required by the PRD.
  - **`bunx playwright install --with-deps chromium`** is safe on fresh CI runners despite the codebase pattern warning about `bunx playwright test` resolving to stale cached versions — the bun cache is empty on a fresh runner, and the `install` subcommand is stable across versions. Followed PRD verbatim.
  - **Caching strategy**: cache key is `playwright-${{ runner.os }}-${{ hashFiles('bun.lock') }}` with a `playwright-${{ runner.os }}-` restore-key fallback so a partial cache hit avoids redownloading every browser when bun.lock changes minor versions. Path is `~/.cache/ms-playwright` (the Linux/Mac default; on Windows the path is `~/AppData/Local/ms-playwright` but the workflow only runs on ubuntu-latest).
  - **Removing `bun test` from publish.yml is intentional per US-015** — `bun run test:it` (called by `_integration.yml`) does NOT call `bun test`; it only runs `./apps/studio/integration/*.it.ts` + Playwright e2e. The `bun test` step in `_integration.yml` (before `bun run test:it`) is what restores unit-test gating. PRD's wording about "covered by the integration job which already runs `bun run test:it` — which doesn't run `bun test`" was prescriptive about this exact addition.
  - **Workflow file edits trigger the local security_reminder_hook.py** — that's a pre-tool hook reminding about untrusted-input injection (issue titles, PR bodies). The hook's warning does NOT block the edit and our changes don't reference any untrusted inputs, so it's safe to proceed. The existing `Verify tag matches package version` step already used the safe pattern (`env: GIT_REF: ${{ github.ref }}` + `"${GIT_REF#refs/tags/v}"`), preserved as-is.
---

## 2026-05-21 - US-016
- Gated `.github/workflows/docker.yml` on the reusable `_integration.yml` workflow created in US-015. Both deploy paths (npm publish + Docker Hub push) now run the identical unit + integration + Playwright suite before shipping.
- Files modified:
  - `.github/workflows/docker.yml` — added a new `integration` job at the top of `jobs:` that does `uses: ./.github/workflows/_integration.yml`. Changed the existing `docker` job to add `needs: integration`. All other steps (Verify tag, QEMU, Buildx, login, metadata, build-push) are unchanged.
  - `ralph/prd.json` — flipped US-016 `passes: false → true` and filled `notes`.
- Verified:
  - `bunx --bun js-yaml < .github/workflows/docker.yml > /dev/null` parses without error (also re-validated `_integration.yml` and `publish.yml` for good measure).
  - `bun run typecheck` (all 4 workspaces) passes.
  - `bun run lint` (biome) passes.
- **Learnings for future iterations:**
  - **Reusable workflow callers are one-liners.** A caller workflow integrates a `workflow_call`-gated reusable workflow by adding a top-level job with `uses: ./.github/workflows/<file>.yml`. No `runs-on`, no `steps`. Downstream jobs gate on it via `needs: <job-id>`. This is the pattern publish.yml already used (US-015) — docker.yml mirrors it verbatim.
  - **Sharing one reusable workflow across multiple deploy paths is the goal of `workflow_call`.** Both publish.yml and docker.yml now fire on the same `v*.*.*` tag push (push triggers both workflows independently), and both gate on `_integration.yml`. This means a tagged release effectively runs the integration suite twice (once for each deploy path). Acceptable tradeoff per PRD; if cost becomes a concern in the future, consider consolidating into a single `release.yml` that fans out to both publish + docker jobs after integration passes.
  - **`needs:` only enforces ordering within ONE workflow run** — it does NOT cross-link publish.yml's `publish` job to docker.yml's `docker` job. If the npm publish job fails and the docker job succeeds (or vice versa), you'll have an asymmetric release. Mitigate by watching both via `gh run watch` during US-017's release verification.
  - **The local security_reminder_hook.py fires on `.github/workflows/*.yml` edits** — it's an informational reminder about untrusted-input injection (issue titles, PR bodies), but on the first invocation in this session it blocked the Edit tool. Retrying the same Edit immediately works. None of US-016's changes touch untrusted inputs (no `github.event.issue.*`, `github.event.pull_request.*`, etc.), and the pre-existing `Verify tag matches package version` step already uses the safe `env:` + `"$VAR"` pattern.
  - **Cross-OS Playwright baselines are still the blocker for US-017.** Per pattern line 68 and US-015's notes, the `per-node visual baseline` test will fail on ubuntu-latest because baselines under `apps/studio/e2e/canvas.e2e.ts-snapshots/` are `<type>-chromium-darwin.png` only. US-016 wires the gate but doesn't close this gap — US-017 must generate Linux baselines (e.g. via `mcr.microsoft.com/playwright:v1.60.0-jammy` running `--update-snapshots`) before tagging a release, otherwise both publish.yml and docker.yml will fail at the integration step.
---
