-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Local-First Automerge Migration Plan
Migrate the notebook from RPC-with-optimistic-UI to true local-first Automerge ownership.
Status
| Phase | Status | Notes |
|---|---|---|
| 0: Optimistic mutations | ✅ Done | PR #542 merged |
1.1–1.3: Eliminate NotebookState dual-write |
✅ Done | PR #544 merged |
| 1.4: Delegate save-to-disk to daemon | ✅ Done | PR #545 merged |
2.0-pre: runtimed-wasm bindings |
✅ Done | PR #552 merged |
| 2: Frontend Automerge doc | ✅ Done | PR #553 merged — useAutomergeNotebook via runtimed-wasm, old useNotebook removed |
| 2D: Dead code removal | ✅ Done | PR #554 merged — dead commands removed, notebook:updated emissions cleaned up |
| 3: Authority boundary hardening | ⬜ Not started | Formalize writer roles per field |
| 4: Optimize Tauri sync relay | ⬜ Not started | Binary IPC, reduce overhead |
🎉 The migration is complete. The notebook is local-first. Cells mutate instantly in WASM, sync flows to the daemon via Automerge, execution works, widgets work. Everything below is post-migration cleanup and future improvements.
Problem
The current architecture maintains four copies of the notebook:
React useState ←→ invoke() ←→ NotebookState ←→ SyncClient AutoCommit ←→ Daemon AutoCommit
(1) (2) (3) (4)
- React
cellsstate —useState<NotebookCell[]>inuseNotebook.ts:286 NotebookState—nbformat::v4::Notebookstruct incrates/notebook/src/notebook_state.rs:152- Sync client's
AutoCommit— local Automerge replica inNotebookSyncClient(crates/runtimed/src/notebook_sync_client.rs:433) - Daemon's
NotebookDoc— canonical Automerge doc inNotebookRoom(crates/runtimed/src/notebook_sync_server.rs:452)
Cell mutations like addCell and deleteCell use blocking RPC — the frontend awaits invoke(), which round-trips through copies (2), (3), and (4) before the frontend updates copy (1). The user pays full IPC + sync latency for operations that don't need backend involvement.
What's already right
- Outputs flow through Automerge, not RPC. Frontend
onOutputis intentionally() => {}(App.tsx:271-276). Outputs arrive vianotebook:updatedAutomerge sync. - Daemon reads source from Automerge for execution.
ExecuteCellreads from the doc (notebook_sync_server.rs:1850), not from whatever the frontend sent. - Source uses Automerge Text CRDT. Character-level merging via
update_text()(Myers diff). - Python agents are full Automerge peers. The
runtimedPython package (crates/runtimed-py) connects via the same Unix socket, holds its ownAutoCommitdoc, and syncs bidirectionally. Agents cancreate_cell,set_source,append_source,delete_cell— all as CRDT operations that merge with concurrent user edits.
What's wrong
NotebookStateis redundant. It shadows the sync client's Automerge doc. The receiver loop atlib.rs:420-424overwrites it on every peer update.- Frontend has no Automerge doc. It receives materialized
CellSnapshot[]and can't do local CRDT mutations. addCellanddeleteCellblock on RPC for things the frontend can determine locally (UUID generation, "last cell" validation).- Full-state replacement on sync.
notebook:updateddoessetCells(newCells), clobbering any in-flight optimistic state, which forces the blocking RPC pattern.
Peers to account for
| Peer | Connection | Automerge doc? | Writes |
|---|---|---|---|
| Frontend (React) | Tauri IPC | ❌ No — receives CellSnapshot[] |
Cells, source, metadata via invoke() |
| Tauri process | In-process | ✅ Via NotebookSyncClient |
Relays frontend writes |
Daemon (runtimed) |
Unix socket server | ✅ Canonical NotebookDoc |
Outputs, execution_count, kernel status |
Python agent (runtimed package) |
Unix socket client | ✅ Own AutoCommit |
Cells, source (create/edit/delete) |
| Additional windows | Unix socket client | ✅ Via NotebookSyncClient |
Same as frontend |
Python agents are first-class peers. An agent calling session.append_source(cell_id, text) while a user types in the same cell produces concurrent Text CRDT operations that Automerge merges automatically. The migration must preserve this — agents should never need to change their API.
Target architecture
┌──────────────────────────────┐
│ Frontend │
│ Local AutoCommit doc │
│ React state derived from doc│
│ All cell CRUD is local │
└──────────┬───────────────────┘
│ Binary Automerge sync messages
│ (Tauri events or custom protocol)
┌──────────▼───────────────────┐
│ Tauri Process (thin relay) │
│ Forwards sync messages │
│ Handles: save, format, OS │
└──────────┬───────────────────┘
│ Unix socket (v2 typed frames)
┌──────────▼───────────────────┐
│ Daemon │
│ Canonical AutoCommit doc │
│ Writes: outputs, exec_count │
│ Kernel, env management │
└──────────┬───────────────────┘
│ Unix socket (v2 typed frames)
┌──────────▼───────────────────┐
│ Python Agent │
│ Own AutoCommit doc │
│ Writes: cells, source │
│ Reads: outputs, status │
└──────────────────────────────┘
All peers hold Automerge docs. Mutations are local-first. Sync is automatic and bidirectional.
Phase 0: Make existing mutations optimistic ✅
Goal: Eliminate user-visible latency for addCell and deleteCell without touching the sync layer.
Effort: Small (days). Risk: Low — notebook:updated full-replace acts as convergence.
0.1 — Optimistic deleteCell
Current (useNotebook.ts:477-485):
const deleteCell = useCallback(async (cellId: string) => {
try {
await invoke("delete_cell", { cellId });
setCells((prev) => prev.filter((c) => c.id !== cellId));
setDirty(true);
} catch (e) { ... }
}, []);Change to:
const deleteCell = useCallback((cellId: string) => {
// Validate locally — don't delete the last cell
setCells((prev) => {
if (prev.length <= 1) return prev;
return prev.filter((c) => c.id !== cellId);
});
setDirty(true);
// Fire-and-forget sync to backend
invoke("delete_cell", { cellId }).catch((e) =>
logger.error("[notebook] delete_cell sync failed:", e)
);
}, []);- Move "last cell" check to frontend
-
setCells()runs beforeinvoke() -
invoke()is fire-and-forget (catch-only) - Backend
delete_cellcommand becomes infallible from frontend's perspective
0.2 — Optimistic addCell
Current (useNotebook.ts:451-475):
const addCell = useCallback(async (cellType, afterCellId?) => {
const newCell = await invoke<NotebookCell>("add_cell", { cellType, afterCellId });
setCells((prev) => { /* insert newCell */ });
...
}, []);Change to:
const addCell = useCallback((cellType, afterCellId?) => {
const cellId = crypto.randomUUID();
const newCell: NotebookCell = {
id: cellId,
cell_type: cellType,
source: "",
outputs: [],
execution_count: null,
};
setCells((prev) => { /* insert newCell at position */ });
setFocusedCellId(cellId);
setDirty(true);
// Fire-and-forget — backend uses provided cellId
invoke("add_cell", { cellId, cellType, afterCellId }).catch((e) =>
logger.error("[notebook] add_cell sync failed:", e)
);
return newCell;
}, []);- Generate UUID on frontend via
crypto.randomUUID() - Construct
NotebookCelllocally -
setCells()runs beforeinvoke() - Update
add_cellTauri command signature to acceptcell_id: Stringparameter - Backend uses provided ID instead of generating one
0.3 — Verify convergence
- Confirm
notebook:updatedevent still reconciles any divergence - Test: add cell while daemon is disconnected → cell persists after reconnect
- Test: delete cell while daemon is disconnected → deletion syncs on reconnect
- Test: two windows delete the same cell concurrently → both converge
Phase 1.1–1.3: Eliminate NotebookState dual-write ✅
NotebookState dual-write ✅Goal: Remove the redundant nbformat::Notebook struct from the Tauri process as a dual-write target. The NotebookSyncHandle becomes the single source of truth for all cell and metadata operations.
Effort: Medium (1-2 weeks). Risk: Medium — many Tauri commands reference NotebookState.
What was done
All ~25 call sites in crates/notebook/src/lib.rs that dual-wrote to both NotebookState and the sync handle were migrated:
- Cell mutations (
update_cell_source,add_cell,delete_cell) — sync handle only, no moreNotebookStatewrite. - Cell reads (
load_notebook) — reads fromhandle.get_cells(), falls back toNotebookStatewhen daemon disconnected. - Path reads (
has_notebook_path,get_notebook_path,detect_pyproject,detect_pixi_toml,detect_environment_yml,detect_deno_config) — read from newcontext.pathfield. - All 16 metadata commands — read/write via sync handle
get_metadata/set_metadata. Read commands fall back toNotebookStatewhen disconnected. - Format cell — reads source from sync handle, writes formatted result back.
- Receiver loop sync-back removed — the block that overwrote
NotebookStatecells/metadata from Automerge on every peer update is gone. WindowNotebookContextgainedpath: Arc<Mutex<Option<PathBuf>>>andworking_dir: Option<PathBuf>.notebook_stateretained for save (Phase 1.4).
Additional fixes discovered during QA
- Trust signature round-trip —
approve_notebook_trust/verify_notebook_trustnow use raw JSON read/write (get_raw_metadata_additional,set_raw_trust_in_metadata) to preservetrust_signature/trust_timestampwhich aren't modeled in the typedRuntMetadatastruct. - Runtime detection —
get_runtime_from_syncnow matchesNotebookState::get_runtime()semantics:ks.language == "python"check andlanguage_infofallback added. - Initial metadata in handshake — New
initial_metadatafield onHandshake::NotebookSyncsends the kernelspec with the connection handshake so the daemon has it before auto-launching. Fixes Deno notebooks getting Python kernels on File → New Notebook As → Deno. save_notebook_asstale cells — RefreshesNotebookStatefrom Automerge before serializing (same pattern assave_notebook's local fallback).push_metadata_to_syncremoved — Was clobbering dependency changes by pushing staleNotebookStatemetadata to the sync handle on save.add_cellwith disconnected daemon — Now returnsErr("Not connected to daemon")instead of a ghost cell.
Remaining NotebookState usage (Phase 1.4 scope)
| Consumer | Why it still uses NotebookState |
|---|---|
save_notebook |
Serializes to .ipynb (refreshes from Automerge first) |
save_notebook_as |
Same, plus updates path |
clone_notebook_to_path |
Clones notebook struct |
initialize_notebook_sync |
First-peer population reads cells from NotebookState (loaded from disk) |
reconnect_to_daemon |
Passes NotebookState to initialize_notebook_sync |
| Disconnected fallbacks | ~8 metadata read commands fall back to NotebookState when daemon is down |
- Route all cell reads through
handle.get_cells()instead ofstate.lock() - Route all cell writes through sync handle commands
- Route all metadata reads/writes through sync handle
- Remove receiver loop sync-back
- Add
pathandworking_dirtoWindowNotebookContext
Phase 1.4: Delegate save-to-disk to daemon ✅
Goal: Move notebook save-to-disk from the Tauri process to the daemon, eliminating the last major NotebookState consumer.
What was done
SaveNotebookrequest now acceptspath: Option<String>for save-as support and returnsNotebookSaved { path: String }with the daemon-normalized absolute path (.ipynbappended if needed).save_notebook_to_disk()refactored to accept optional target path. Relative paths are rejected (daemon CWD is unpredictable as a launchd service).- Format-on-save moved to daemon. New
format_notebook_cells()function innotebook_sync_server.rsruns ruff (Python) or deno fmt (Deno) on all code cells, writes formatted source back to Automerge, and broadcasts changes to all peers. Client-side formatting loop removed from bothsave_notebookandsave_notebook_as. - Local
NotebookState::serialize()fallback removed. Daemon save is now required — disconnected daemon returns a clear error. save_notebook_asuses the daemon-returned path as canonical for window title,NotebookState.path,context.path, and room reconnection.- Python bindings:
session.save(path=None)andawait async_session.save(path=None)added to bothSessionandAsyncSession.
Known issue: save-as loses outputs
When saving an untitled notebook via Save As, outputs from the current session are lost in the saved file. This happens because save-as creates a new room (new notebook_id derived from the new path), and the daemon's save_notebook_to_disk merges with the existing file (which doesn't exist yet for a new path). The outputs exist in the old room's Automerge doc but not the new one. This may be acceptable since save-as triggers a new kernel anyway, but worth evaluating.
Remaining NotebookState usage
NotebookState is no longer used for persistence. Remaining 13 call sites are:
- Initial loading —
create_window_context,initialize_notebook_sync(first-peer population from disk) - Path/dirty tracking —
save_notebook(path check),save_notebook_as(path + dirty update) - Clone —
clone_notebook_to_path(local serialization for "Make a Copy") - Reconnect —
reconnect_to_daemon(passes state to re-initialize sync) - Disconnected fallbacks — 8 metadata read commands fall back to
NotebookStatewhen daemon is down
Full removal of the NotebookState struct is deferred — it still serves as the disconnected-daemon fallback and the initial notebook loading path. These will naturally go away as Phase 2 gives the frontend its own Automerge doc.
- Add
NotebookRequest::SaveToDiskvariant to the protocol - Handle in
handle_notebook_request: call existingsave_notebook_to_disk, return success/failure - Frontend's save command sends request to daemon via sync handle
- Move format-on-save (ruff/deno fmt) to the daemon save path
- For
save_notebook_as: frontend handles file dialog, sends new path to daemon, daemon writes, Tauri uses daemon-returned path - Remove
NotebookStateserialization fromcrates/notebook/src/lib.rs - Python bindings for
save(path=None)
Phase 2.0-pre: runtimed-wasm bindings ✅
runtimed-wasm bindings ✅Goal: Ship the WASM bindings that solve the string→Text CRDT type mismatch. Pure additive — no changes to existing code.
-
crates/runtimed-wasmcrate withNotebookHandlewasm-bindgen exports -
wasm-pack buildoutput atapps/notebook/src/wasm/runtimed-wasm/(345KB gzip) - 18 Rust unit tests
- 15 Deno smoke tests (cell CRUD, sync, concurrent merges, Text CRDT merge)
- CI: wasm-pack build + Deno test step in
build-linux - biome configured to exclude generated WASM output
- CI green, merged to main
Phase 2: Frontend Automerge doc — PR #553
Draft PR #547 was abandoned — it accumulated layers of fixes for the JS string→Text mismatch we didn't understand yet. PR #553 is a clean rewrite using
runtimed-wasm.
Goal: The frontend owns a local Automerge document. All document mutations happen instantly on the local doc. React state is derived from it. The Tauri process becomes a sync relay.
Effort: Medium (1-2 weeks). Risk: Low — the WASM eliminates the type mismatch that blocked us. Widgets confirmed working.
Strategy: Feature flag toggle. Build useAutomergeNotebook alongside useNotebook, controlled by localStorage or ?automerge=true URL param. Fresh hook using NotebookHandle from runtimed-wasm — no @automerge/automerge JS dependency.
Status: ✅ Complete. useAutomergeNotebook is the only path. useNotebook.ts and useNotebookDispatch.ts removed. Widgets working.
End state vs pragmatic step: The WASM is the pragmatic unblock. The destination is the frontend owning a proper JS Automerge doc with presence, cursors, and ecosystem plugins (@automerge/codemirror). Once the WASM path is stable and the Tauri relay is simplified, switching back to @automerge/automerge JS with ImmutableString for non-text fields is a well-scoped change. We'll be off the broken path and into the paved one.
Hard-won lessons (from QA and debugging)
| Lesson | Detail |
|---|---|
| 🔴 JS Automerge string→Text CRDT mismatch is the root cause of phantom cells | When JS does d.cells.push({ id: "cell-1", ... }) inside Automerge.change(), ALL string fields become Object(Text) CRDTs. But Rust NotebookDoc::add_cell() creates id, cell_type, execution_count as scalar Str (via doc.put()) and only source as ObjType::Text. The Rust read_str() helper sees Object(Text) where it expects ScalarValue::Str and returns None. The cell IS in the doc, sync worked, but get_cells() can't read it. This is not a bug in Automerge — it's a JS API design choice. The fix: use runtimed-wasm which calls the same Rust NotebookDoc code, so all field types match. |
| Sync needs multiple roundtrips | The Automerge sync protocol is not one-shot. generateSyncMessage / receiveSyncMessage must be called in a loop until both sides return null messages. |
| Frontend peer state must not exist before GetDocBytes | If frontend_peer_state is initialized at task startup, daemon sync acks during cell population buffer stale messages for a peer that doesn't exist yet. Fix: start as None, only init inside GetDocBytes. |
@automerge/automerge JS gotchas (for future reference if we return to pure JS) |
v2 requires import { next as Automerge } for updateText/splice. List mutations use proxy methods (d.cells as any).insertAt(). Scalar strings return as RawString objects (use String() for comparison). Non-text fields need new Automerge.ImmutableString() to avoid the Text CRDT issue. |
What to reuse from PR #547
The Rust-side relay infrastructure is sound and reusable. The JS hook needs a clean rewrite.
Keep (from PR #547's Rust changes):
raw_sync_tx/connect_split_with_raw_sync()— daemon sync message forwardingget_automerge_doc_bytescommand — exports Tauri-side doc as bytessend_automerge_synccommand — receives frontend sync messagesfrontend_peer_state— separate sync state for the frontend peer (deferred init onGetDocBytes)automerge:from-daemonTauri event — relay to frontend
Rewrite (fresh hook using runtimed-wasm) ✅:
useAutomergeNotebook— loadsNotebookHandlefrom WASM, not@automerge/automerge- Cell mutations:
handle.add_cell(),handle.delete_cell(),handle.update_source()— noAutomerge.change(), no proxy methods, noRawString - Sync:
handle.generate_sync_message()/handle.receive_sync_message()— same relay, different WASM - Materialization:
handle.get_cells_json()→ parse →cellSnapshotsToNotebookCells()→ React state - Shared
CellSnapshottype extracted tomaterialize-cells.ts(used by both old and new hooks during migration) - Widgets working — ipywidgets comm messages flow correctly through the new path
Dropped (bandaid ripped):
useNotebook.ts— deleted,useAutomergeNotebookis the only hookuseNotebookDispatch.ts— deleted, no more feature flag toggle@automerge/automergenpm dependency — no longer needed- All
RawString/ImmutableString/ proxy method workarounds
✅ Resolved: Phantom cell bug — root cause: JS string→Text CRDT mismatch
Symptom: Frontend loads 1 cell from doc bytes. After daemon sync, a phantom cell appears that doesn't exist in any Rust-side doc. Execution fails with "Cell not found."
Root cause: When JS Automerge creates cells via d.cells.push({ id: "cell-1", ... }) inside Automerge.change(), ALL string fields become Object(Text) CRDTs. But Rust NotebookDoc::add_cell() creates id, cell_type, execution_count as scalar Str. The Rust read_str() helper sees Object(Text) where it expects ScalarValue::Str → returns None. The cell IS in the doc but can't be read.
Not a bug in Automerge — it's a JS API design choice. The fix: runtimed-wasm uses the same Rust code, so field types match.
What we ruled out along the way: race conditions, stale peer state, JS v3 vs v2, initial syncToBackend corruption, sync message decode failures, version mismatch. All were symptoms of the type mismatch.
| Approach | String types | Result |
|---|---|---|
| Rust↔Rust (Python bindings) | Both scalar Str |
✅ Works |
| JS↔JS (compat test) | Both Text |
✅ Works (consistent) |
| JS↔Rust (frontend↔relay) | JS Text, Rust expects Str |
❌ Phantom cells |
| WASM↔Rust (Spike C) | Both scalar Str |
✅ Works |
Spikes:
- A (Python bindings) ✅ — confirmed relay architecture is sound
- B (Deno FFI) — skipped
- C (Custom WASM) ✅ —
crates/runtimed-wasmbuilt, 18 Rust + 15 Deno tests pass - D (Tauri test window) — planned as verification step
- E (Minimal repro) — confirmed the string→Text root cause
Sub-PR 2D — Post-migration cleanup — PR #554
Done in PR #554:
- Remove
useNotebook.ts— deleted (PR feat(notebook): add useAutomergeNotebook hook via runtimed-wasm #553) - Remove
useNotebookDispatch.ts— deleted (PR feat(notebook): add useAutomergeNotebook hook via runtimed-wasm #553) - Remove feature flag infrastructure (PR feat(notebook): add useAutomergeNotebook hook via runtimed-wasm #553)
- Remove dead Tauri commands:
load_notebook,update_cell_source,add_cell,delete_cell,refresh_from_automerge(−217 lines Rust) - Remove
cell_snapshot_to_frontendhelper - Remove
notebook:updatedcell emissions from init and receiver loop (metadata emissions kept — 3 active listeners)
Still to do (next PR):
| Task | Priority | Effort | Notes |
|---|---|---|---|
Remove NotebookState cell mutation methods |
🟡 Short-term | Small | update_cell_source, add_cell, delete_cell, find_cell_index on the struct — no longer called |
Remove cell_snapshot_to_nbformat helper |
🟡 Short-term | Small | Only used by the removed notebook:updated emission path — verify no other callers |
Reduce NotebookSyncClient to pure relay |
🟠 Medium-term | Medium | Remove its local AutoCommit doc — it only needs to forward frames between daemon and frontend |
Remove NotebookState struct entirely |
🟠 Medium-term | Medium | After relay simplification — save_notebook and session restore need alternative paths |
Extract shared notebook-doc crate |
🟠 Medium-term | Medium | Eliminate WASM copy drift — both runtimed and runtimed-wasm depend on one source |
Output streaming: onOutput stays no-opped
onOutput: () => {} — wiring appendOutput causes duplicate outputs. Broadcast appends an output, then Automerge sync replaces all cells with doc state that already has the output. Without a stable output ID to deduplicate, both appear.
Future fix: Add a sequence number or content hash to broadcast output messages so the frontend can skip outputs already present in Automerge state. This is a new feature, not a migration task.
Build pipeline note
crates/runtimed-wasm/src/notebook_doc.rs is currently a copy of crates/runtimed/src/notebook_doc.rs with daemon-only methods removed. When notebook_doc.rs changes (rare — usually daemon-side output operations), the WASM copy must be updated and wasm-pack build re-run. Future cleanup: extract a shared notebook-doc crate that both depend on.
Future improvements (post-migration)
| Item | Category | Notes |
|---|---|---|
| Output dedup with broadcast IDs | Performance | Fix onOutput latency without duplicates |
splice_source WASM export |
Editor integration | Character-level CodeMirror edits instead of full-string update_source |
@automerge/codemirror evaluation |
Editor integration | Needs either JS Automerge with ImmutableString or custom CM extension calling our WASM splice |
| Presence / collaborative cursors | Collaboration | Separate spike — evaluate @automerge/automerge JS with ImmutableString for non-text fields |
| Profile materialization on large notebooks | Performance | get_cells_json() + parse on 100 / 500 / 1000 cells |
| Selective re-materialization | Performance | Only re-read changed cells instead of full get_cells_json() |
Phase 3: Authority boundary hardening
Goal: Formalize which fields each peer can write, so CRDT convergence matches user expectations.
Effort: Small (days). Risk: Low.
Writer roles
| Field | Writer(s) | Rationale |
|---|---|---|
cells list (add/delete/reorder) |
Frontend, Python agents | User/agent intent, instant feedback |
cells[i].source (Text CRDT) |
Frontend, Python agents | User types, agent streams code |
cells[i].cell_type |
Frontend, Python agents | Toggle code↔markdown |
cells[i].outputs |
Daemon only | Kernel produces these |
cells[i].execution_count |
Daemon only | Kernel assigns these |
metadata.notebook_metadata |
Frontend, Python agents | Dependency management |
metadata.runtime |
Frontend | User selects runtime |
Conflict scenarios
| Scenario | Resolution |
|---|---|
| Two frontends add cell at same position | Automerge list CRDT — both cells appear, deterministic order |
| User and agent edit same cell source concurrently | Automerge Text CRDT — character-level merge |
| Two windows delete same cell | Automerge — double-delete is idempotent |
| Agent appends source while user edits beginning | Text CRDT merges cleanly — append is position-independent |
| Daemon writes outputs while frontend edits source | Different fields — no conflict |
clear_outputs ownership
Today the frontend clears outputs locally (clearCellOutputs in useNotebook.ts:441) AND the daemon clears via Automerge. In the new model:
- Daemon owns output clearing — it clears when execution starts
- Frontend stops rendering stale outputs when it sees
execution_startedbroadcast - Remove frontend-side
clearCellOutputsmutation on the Automerge doc - Frontend can show a visual indicator (dimming) while waiting for new outputs
Phase 4: Optimize Tauri sync relay (optional, future)
Goal: Make the Tauri relay as thin and fast as possible. Tauri stays in the path — the frontend should never speak directly to the Unix socket for security reasons.
Effort: Small. Risk: Low.
Why Tauri stays in the loop
The daemon Unix socket is unauthenticated — any process that can reach it can read/write any notebook. In production this is fine because only local processes connect. But letting the webview open a raw socket (even localhost WebSocket) would let a compromised renderer talk to the daemon without Tauri's mediation. Tauri should own the connection lifecycle: auth, handshake, connection teardown on window close.
Binary IPC precedent
We already pass binary data through Tauri events for widget buffers (useDaemonKernel.ts:316-321) — number[][] round-tripped via JSON. This works but is not ideal for Automerge sync messages which are opaque binary blobs that don't benefit from JSON encoding.
Optimization options
| Approach | Tradeoff |
|---|---|
Tauri Channel<Vec<u8>> |
Tauri v2 channels support streaming binary data. Use a Rust-side channel to push sync messages to the frontend as raw bytes, and a command to send bytes back. Avoids JSON base64/array encoding overhead. |
| Tauri custom protocol | Register an automerge:// protocol handler. Frontend fetch()es sync messages as binary responses. Good for large payloads but adds HTTP framing overhead for small, frequent sync messages. |
| Base64 in events | Simplest. Encode sync messages as base64 strings in Tauri events. ~33% overhead but Automerge sync messages are typically small (sub-KB for incremental changes). Good enough to start. |
Recommendation: start with base64-in-events in Phase 2 (simplest, unblocks everything), then migrate to Channel<Vec<u8>> if profiling shows encoding overhead matters.
The blob store already has its own HTTP server (http://127.0.0.1:{blobPort}/blob/{hash}) for large output payloads, so sync messages only carry manifest hashes, not raw output data. This keeps sync message sizes small.
- Benchmark sync message sizes in practice (expect sub-KB for source edits, larger for cell add with initial content)
- If base64 overhead is measurable: migrate to Tauri
Channel<Vec<u8>>for binary streaming - Profile end-to-end latency: frontend Automerge change → Tauri relay → daemon apply → broadcast → other peer receives
Key files reference
Frontend
| File | Role |
|---|---|
apps/notebook/src/hooks/useNotebook.ts |
Current notebook state hook (to be replaced) |
apps/notebook/src/hooks/useDaemonKernel.ts |
Daemon kernel execution, broadcasts |
apps/notebook/src/App.tsx |
Top-level component, wires hooks together |
Tauri process (crates/notebook)
| File | Role |
|---|---|
crates/notebook/src/lib.rs |
Tauri commands, sync initialization |
crates/notebook/src/notebook_state.rs |
NotebookState — the struct to eliminate |
Daemon (crates/runtimed)
| File | Role |
|---|---|
crates/runtimed/src/notebook_doc.rs |
NotebookDoc — Automerge doc wrapper, schema |
crates/runtimed/src/notebook_sync_client.rs |
NotebookSyncClient — local Automerge peer |
crates/runtimed/src/notebook_sync_server.rs |
NotebookRoom, daemon-side sync loop |
crates/runtimed/src/connection.rs |
v2 typed frame protocol |
Python agent
| File | Role |
|---|---|
crates/runtimed-py/src/session.rs |
Session — Python API, full Automerge peer |
crates/runtimed-py/src/async_session.rs |
AsyncSession — async variant |
python/runtimed/src/runtimed/_mcp_server.py |
MCP server — AI agent tools |
Non-goals
- Frontend direct access to the daemon socket. The daemon's Unix socket is unauthenticated — any process that connects can read/write any notebook room. Tauri must mediate all daemon communication so the webview renderer never holds a raw socket. This is a security boundary, not a performance optimization to remove later.
- Replacing the daemon's Automerge with
automerge-repo. The daemon's custom sync protocol (v2 typed frames multiplexing sync + requests + broadcasts) is well-suited to the single-doc-per-room model.automerge-repois designed for multi-doc repos with discovery — unnecessary complexity here. - Moving Automerge to the Python agent. The Python agent already has a full Automerge peer via the Rust
NotebookSyncClient. No JS Automerge needed on the Python side. - Real-time collaborative cursors. Desirable but separate concern. Can be implemented later via Automerge ephemeral messages or the existing broadcast channel.
- Operational transform for CodeMirror. The
@automerge/codemirrorplugin exists but is a Phase 2+ consideration. Initial implementation can useupdateTexton every change event.