refactor(runtimed): reduce relay to pure byte pipe — two-peer Automerge sync (#600) by rgbkrk · Pull Request #608 · nteract/desktop

rgbkrk · 2026-03-08T06:24:16Z

Closes #600.

The Tauri relay (NotebookSyncClient) was a full Automerge peer — it maintained its own AutoCommit doc, merged sync messages from both frontend and daemon, and generated new sync messages for each peer. This created a three-peer topology (frontend ↔ relay ↔ daemon) that was the root cause of 6+ correctness findings from the protocol audit.

What changed

Step 1: Daemon requests for doc bytes and metadata

New NotebookRequest variants — GetDocBytes, GetRawMetadata { key }, SetRawMetadata { key, value } — let the Tauri app read/write the daemon's canonical Automerge doc directly via the request/response protocol, without going through the relay's local replica.

Step 2: Reroute Tauri callers

get_automerge_doc_bytes, get_metadata_snapshot, set_metadata_snapshot, get_raw_metadata_additional, set_raw_trust_in_metadata — all now use send_request() instead of relay-local get_metadata()/set_metadata()/get_doc_bytes().

Step 3: Byte pipe in `run_sync_task`

When raw_sync_tx is present (Tauri mode), the relay is now a transparent byte forwarder:

Operation	Before (3-peer)	After (2-peer pipe)
Frontend → daemon sync	decode → merge into relay doc → `sync_to_daemon()` → re-encode	forward raw bytes as `AutomergeSync` frame
Daemon → frontend sync	merge into relay doc → `generate_sync_message(fe_state)`	forward raw `AutomergeSync` frame payload
`GetDocBytes`	serialize relay's `AutoCommit` + virtual sync handshake	skipped in pipe mode (frontend uses daemon request)
Metadata diffing	read from relay doc, diff, emit `SyncUpdate`	skipped (frontend WASM drives metadata via `useSyncExternalStore`)

runtimed-py retains the full peer mode (no raw_sync_tx) — unchanged behavior.

Audit findings resolved

For the Tauri path, these protocol correctness findings are eliminated:

sync_to_daemon() drops broadcast frames during ack wait — relay no longer calls sync_to_daemon()
Triple-merge divergence risk — two peers (frontend ↔ daemon) instead of three
Virtual sync handshake 10-iteration limit — no virtual handshake in pipe mode
try_send drops sync updates silently — no SyncUpdate/changes_tx in pipe mode
receive_and_relay_sync_message uses wrong peer state — no relay peer state in pipe mode

Handle methods still used from Tauri

Method	Purpose
`send_request()`	Daemon RPC (save, execute, kernel lifecycle, etc.)
`receive_frontend_sync_message()`	Raw byte forward to daemon
`notebook_id()`	ID accessor

No relay-local doc methods (get_metadata, set_metadata, get_doc_bytes, get_cells, etc.) are called from lib.rs.

Test plan

cargo test -p runtimed --lib — 234 passed
cargo test -p runtimed --test '*' — 15 integration tests passed
cargo test -p notebook --lib — 137 passed
Open existing .ipynb — cells load, kernel auto-launches
Cmd-N new notebook — kernel launches, execute works
Save-as — reconnects to new room
Session restore — windows restored
Trust approval — prompt appears for untrusted notebooks
Metadata writes (add dependency) — syncs to daemon and persists

…n requests New NotebookRequest/NotebookResponse variants for reading the daemon's canonical Automerge doc without going through the relay's local replica: - GetDocBytes → DocBytes { bytes } — full doc for WASM bootstrap - GetRawMetadata { key } → RawMetadata { value } — metadata JSON read - SetRawMetadata { key, value } → MetadataSet — metadata JSON write + sync SetRawMetadata notifies peers via changed_tx and persists via persist_tx. These enable the Tauri relay to stop maintaining its own AutoCommit doc for metadata and bootstrap operations (#600).

All metadata operations (get_metadata_snapshot, set_metadata_snapshot, get_raw_metadata_additional, set_raw_trust_in_metadata) now use send_request(GetRawMetadata/SetRawMetadata) instead of the relay's local handle.get_metadata/set_metadata. get_automerge_doc_bytes now uses send_request(GetDocBytes) to fetch from the daemon's canonical doc instead of the relay's local replica. The relay's local AutoCommit is no longer read or written for metadata or doc bootstrap from the Tauri app. Remaining relay doc usage: - receive_frontend_sync_message (merge + forward) - SyncUpdate metadata diffing (changes_tx receiver) These are addressed in subsequent commits.

When raw_sync_tx is present (Tauri app), the relay now acts as a transparent byte pipe between frontend WASM and daemon: - ReceiveFrontendSyncMessage: forwards raw sync bytes to daemon socket instead of decode → merge → re-encode → sync_to_daemon. The daemon processes the sync message and sends back a response frame, which arrives in the socket read branch. - Daemon→frontend AutomergeSync frames: forwarded raw via raw_sync_tx instead of merge → generate_sync_message(fe_state). No local doc merge, no metadata diffing, no SyncUpdate. - GetDocBytes: skips virtual sync handshake in pipe mode. The frontend gets doc bytes from the daemon via send_request(GetDocBytes), not from this command. Handshake preserved for runtimed-py. This changes the sync topology from three Automerge peers (frontend ↔ relay ↔ daemon) to two (frontend ↔ daemon) with the relay as a transparent forwarder. Eliminates these audit findings for the Tauri path: - sync_to_daemon() broadcast dropping (relay doesn't call it) - Triple-merge divergence risk (two peers now) - Virtual sync handshake 10-iteration limit - try_send drops on changes_tx (no SyncUpdate in pipe mode) - biased select starvation (no merge delays) - receive_and_relay_sync_message wrong peer state runtimed-py retains the full peer mode unchanged.

In pipe mode (#608), the Automerge sync path doesn't deliver output changes — the daemon's sync state tracks the relay peer, not the WASM peer, so all sync frames arrive with changed=false. materializeCells never runs after execution, and outputs never render. Re-enable the broadcast-driven output path (appendOutput via onOutput callback). The broadcast pipeline works correctly — outputs arrive, blob manifests resolve, and the external store updates. No duplicate risk: since sync frames have changed=false, materializeCells doesn't run after execution, so there's only one source of output updates (broadcasts). The proper fix is to align the sync states so the daemon talks directly to the WASM through the pipe (skip do_initial_sync in pipe mode). Tracked as a follow-up.

…ing frames (#613) (#616) * fix(runtimed): prevent stream corruption in pipe mode (#613) In pipe mode, ReceiveFrontendSyncMessage was writing sync frames directly to client.stream inside the select! command handler. If the daemon was sending data at the same time, the select! would drop the pending socket read future, then the command handler would write to the stream, corrupting the framing. The daemon would then read payload bytes as a length prefix, producing bogus frame sizes (observed: 1.15 GB). Fix: buffer outgoing pipe frames in a VecDeque and flush them at the top of the loop BEFORE entering select!. This ensures writes only happen when no read is pending on the socket. The queue is drained synchronously before the next select! iteration. Full peer mode (runtimed-py) is unaffected — its writes go through sync_to_daemon() which owns the read/write sequence. * fix(notebook): re-enable broadcast-driven output rendering for pipe mode In pipe mode (#608), the Automerge sync path doesn't deliver output changes — the daemon's sync state tracks the relay peer, not the WASM peer, so all sync frames arrive with changed=false. materializeCells never runs after execution, and outputs never render. Re-enable the broadcast-driven output path (appendOutput via onOutput callback). The broadcast pipeline works correctly — outputs arrive, blob manifests resolve, and the external store updates. No duplicate risk: since sync frames have changed=false, materializeCells doesn't run after execution, so there's only one source of output updates (broadcasts). The proper fix is to align the sync states so the daemon talks directly to the WASM through the pipe (skip do_initial_sync in pipe mode). Tracked as a follow-up.

rgbkrk added 3 commits March 7, 2026 22:19

rgbkrk marked this pull request as ready for review March 8, 2026 06:38

rgbkrk enabled auto-merge (squash) March 8, 2026 06:44

rgbkrk merged commit 1973188 into main Mar 8, 2026
14 checks passed

rgbkrk deleted the pure-relay branch March 8, 2026 06:46

This was referenced Mar 8, 2026

fix(sync): skip do_initial_sync in pipe mode #619

Merged

test: Deno integration tests for WASM sync protocol #620

Closed

feat(sync): sync-only bootstrap — eliminate GetDocBytes from frontend #621

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(runtimed): reduce relay to pure byte pipe — two-peer Automerge sync (#600)#608

refactor(runtimed): reduce relay to pure byte pipe — two-peer Automerge sync (#600)#608
rgbkrk merged 3 commits intomainfrom
pure-relay

rgbkrk commented Mar 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rgbkrk commented Mar 8, 2026

What changed

Step 1: Daemon requests for doc bytes and metadata

Step 2: Reroute Tauri callers

Step 3: Byte pipe in run_sync_task

Audit findings resolved

Handle methods still used from Tauri

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Step 3: Byte pipe in `run_sync_task`