refactor(runtimed): reduce relay to pure byte pipe — two-peer Automerge sync (#600)#608
Merged
refactor(runtimed): reduce relay to pure byte pipe — two-peer Automerge sync (#600)#608
Conversation
…n requests
New NotebookRequest/NotebookResponse variants for reading the daemon's
canonical Automerge doc without going through the relay's local replica:
- GetDocBytes → DocBytes { bytes } — full doc for WASM bootstrap
- GetRawMetadata { key } → RawMetadata { value } — metadata JSON read
- SetRawMetadata { key, value } → MetadataSet — metadata JSON write + sync
SetRawMetadata notifies peers via changed_tx and persists via persist_tx.
These enable the Tauri relay to stop maintaining its own AutoCommit doc
for metadata and bootstrap operations (#600).
All metadata operations (get_metadata_snapshot, set_metadata_snapshot, get_raw_metadata_additional, set_raw_trust_in_metadata) now use send_request(GetRawMetadata/SetRawMetadata) instead of the relay's local handle.get_metadata/set_metadata. get_automerge_doc_bytes now uses send_request(GetDocBytes) to fetch from the daemon's canonical doc instead of the relay's local replica. The relay's local AutoCommit is no longer read or written for metadata or doc bootstrap from the Tauri app. Remaining relay doc usage: - receive_frontend_sync_message (merge + forward) - SyncUpdate metadata diffing (changes_tx receiver) These are addressed in subsequent commits.
When raw_sync_tx is present (Tauri app), the relay now acts as a transparent byte pipe between frontend WASM and daemon: - ReceiveFrontendSyncMessage: forwards raw sync bytes to daemon socket instead of decode → merge → re-encode → sync_to_daemon. The daemon processes the sync message and sends back a response frame, which arrives in the socket read branch. - Daemon→frontend AutomergeSync frames: forwarded raw via raw_sync_tx instead of merge → generate_sync_message(fe_state). No local doc merge, no metadata diffing, no SyncUpdate. - GetDocBytes: skips virtual sync handshake in pipe mode. The frontend gets doc bytes from the daemon via send_request(GetDocBytes), not from this command. Handshake preserved for runtimed-py. This changes the sync topology from three Automerge peers (frontend ↔ relay ↔ daemon) to two (frontend ↔ daemon) with the relay as a transparent forwarder. Eliminates these audit findings for the Tauri path: - sync_to_daemon() broadcast dropping (relay doesn't call it) - Triple-merge divergence risk (two peers now) - Virtual sync handshake 10-iteration limit - try_send drops on changes_tx (no SyncUpdate in pipe mode) - biased select starvation (no merge delays) - receive_and_relay_sync_message wrong peer state runtimed-py retains the full peer mode unchanged.
This was referenced Mar 8, 2026
Closed
rgbkrk
added a commit
that referenced
this pull request
Mar 8, 2026
In pipe mode (#608), the Automerge sync path doesn't deliver output changes — the daemon's sync state tracks the relay peer, not the WASM peer, so all sync frames arrive with changed=false. materializeCells never runs after execution, and outputs never render. Re-enable the broadcast-driven output path (appendOutput via onOutput callback). The broadcast pipeline works correctly — outputs arrive, blob manifests resolve, and the external store updates. No duplicate risk: since sync frames have changed=false, materializeCells doesn't run after execution, so there's only one source of output updates (broadcasts). The proper fix is to align the sync states so the daemon talks directly to the WASM through the pipe (skip do_initial_sync in pipe mode). Tracked as a follow-up.
rgbkrk
added a commit
that referenced
this pull request
Mar 8, 2026
In pipe mode (#608), the Automerge sync path doesn't deliver output changes — the daemon's sync state tracks the relay peer, not the WASM peer, so all sync frames arrive with changed=false. materializeCells never runs after execution, and outputs never render. Re-enable the broadcast-driven output path (appendOutput via onOutput callback). The broadcast pipeline works correctly — outputs arrive, blob manifests resolve, and the external store updates. No duplicate risk: since sync frames have changed=false, materializeCells doesn't run after execution, so there's only one source of output updates (broadcasts). The proper fix is to align the sync states so the daemon talks directly to the WASM through the pipe (skip do_initial_sync in pipe mode). Tracked as a follow-up.
rgbkrk
added a commit
that referenced
this pull request
Mar 8, 2026
…ing frames (#613) (#616) * fix(runtimed): prevent stream corruption in pipe mode (#613) In pipe mode, ReceiveFrontendSyncMessage was writing sync frames directly to client.stream inside the select! command handler. If the daemon was sending data at the same time, the select! would drop the pending socket read future, then the command handler would write to the stream, corrupting the framing. The daemon would then read payload bytes as a length prefix, producing bogus frame sizes (observed: 1.15 GB). Fix: buffer outgoing pipe frames in a VecDeque and flush them at the top of the loop BEFORE entering select!. This ensures writes only happen when no read is pending on the socket. The queue is drained synchronously before the next select! iteration. Full peer mode (runtimed-py) is unaffected — its writes go through sync_to_daemon() which owns the read/write sequence. * fix(notebook): re-enable broadcast-driven output rendering for pipe mode In pipe mode (#608), the Automerge sync path doesn't deliver output changes — the daemon's sync state tracks the relay peer, not the WASM peer, so all sync frames arrive with changed=false. materializeCells never runs after execution, and outputs never render. Re-enable the broadcast-driven output path (appendOutput via onOutput callback). The broadcast pipeline works correctly — outputs arrive, blob manifests resolve, and the external store updates. No duplicate risk: since sync frames have changed=false, materializeCells doesn't run after execution, so there's only one source of output updates (broadcasts). The proper fix is to align the sync states so the daemon talks directly to the WASM through the pipe (skip do_initial_sync in pipe mode). Tracked as a follow-up.
This was referenced Mar 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #600.
The Tauri relay (
NotebookSyncClient) was a full Automerge peer — it maintained its ownAutoCommitdoc, merged sync messages from both frontend and daemon, and generated new sync messages for each peer. This created a three-peer topology (frontend ↔ relay ↔ daemon) that was the root cause of 6+ correctness findings from the protocol audit.What changed
Step 1: Daemon requests for doc bytes and metadata
New
NotebookRequestvariants —GetDocBytes,GetRawMetadata { key },SetRawMetadata { key, value }— let the Tauri app read/write the daemon's canonical Automerge doc directly via the request/response protocol, without going through the relay's local replica.Step 2: Reroute Tauri callers
get_automerge_doc_bytes,get_metadata_snapshot,set_metadata_snapshot,get_raw_metadata_additional,set_raw_trust_in_metadata— all now usesend_request()instead of relay-localget_metadata()/set_metadata()/get_doc_bytes().Step 3: Byte pipe in
run_sync_taskWhen
raw_sync_txis present (Tauri mode), the relay is now a transparent byte forwarder:sync_to_daemon()→ re-encodeAutomergeSyncframegenerate_sync_message(fe_state)AutomergeSyncframe payloadGetDocBytesAutoCommit+ virtual sync handshakeSyncUpdateuseSyncExternalStore)runtimed-pyretains the full peer mode (noraw_sync_tx) — unchanged behavior.Audit findings resolved
For the Tauri path, these protocol correctness findings are eliminated:
sync_to_daemon()drops broadcast frames during ack wait — relay no longer callssync_to_daemon()try_senddrops sync updates silently — noSyncUpdate/changes_txin pipe modereceive_and_relay_sync_messageuses wrong peer state — no relay peer state in pipe modeHandle methods still used from Tauri
send_request()receive_frontend_sync_message()notebook_id()No relay-local doc methods (
get_metadata,set_metadata,get_doc_bytes,get_cells, etc.) are called fromlib.rs.Test plan
cargo test -p runtimed --lib— 234 passedcargo test -p runtimed --test '*'— 15 integration tests passedcargo test -p notebook --lib— 137 passed