Skip to content

fix(engine): move bootstrap file I/O off the event loop#403

Merged
jalehman merged 1 commit into
Martian-Engineering:mainfrom
jetd1:fix/bootstrap-sync-io
Apr 11, 2026
Merged

fix(engine): move bootstrap file I/O off the event loop#403
jalehman merged 1 commit into
Martian-Engineering:mainfrom
jetd1:fix/bootstrap-sync-io

Conversation

@jetd1

@jetd1 jetd1 commented Apr 11, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Convert readFileSegment, readLastJsonlEntryBeforeOffset, and the statSync calls in bootstrap() / refreshBootstrapState() to fs/promises, so a multi-MB session JSONL can no longer block the Node.js event loop for minutes during the append-only bootstrap path (freezing every gateway session, Control UI, CLI, and WebSocket connection).
  • In readLastJsonlEntryBeforeOffset, only read a new chunk when the current carry has no more newlines to peel. The original re-read a chunk on every iteration even while still processing existing lines, which wasted I/O and amplified the implicit O(n^2) prepended-carry pattern.
  • Short-circuit the append-only fast path before the backward scan when latestDbHash !== bootstrapState.lastProcessedEntryHash. That is the common case during active sessions — the DB frontier advances past the checkpoint between bootstraps, and the matcher can never find a matching tail entry in that state, so we skip straight to the async full-read slow path.

A patch-level changeset is included per RELEASING.md.

Test plan

  • npm test — full vitest suite passes (695 tests, 40 files) on this branch
  • npx tsc --noEmit — no new type errors introduced in touched files (unrelated pre-existing errors elsewhere are unchanged)
  • bootstrap-message-only.test.ts updated to await the now-async readLastJsonlEntryBeforeOffset; all 12 cases still pass
  • Soak test on a real large session: bootstrap no longer freezes the gateway; a setTimeout callback fires within ~100 ms while bootstrap is reconciling
  • Maintainer review of the fast-path short-circuit to confirm no correctness regression when checkpoint hashes legitimately match

Co-Authored-By: WoCha wocha@jetd.one via Claude Code

readFileSegment, readLastJsonlEntryBeforeOffset, and the statSync calls
in bootstrap() / refreshBootstrapState() all used synchronous Node.js fs
APIs. On multi-MB session JSONL files the backward scan in
readLastJsonlEntryBeforeOffset could block the event loop for minutes,
freezing every gateway session, the Control UI, CLI, and WebSocket
connections.

Convert those functions to fs/promises (open / FileHandle.read /
FileHandle.stat / stat). readAppendedLeafPathMessages becomes async
transitively. The backward scan now also only reads a new chunk when the
current carry has no more newlines to peel, instead of re-reading on
every iteration (which both wasted I/O and amplified the implicit O(n^2)
prepended-carry pattern).

The bootstrap append-only fast path additionally short-circuits before
the expensive backward scan when latestDbHash !== lastProcessedEntryHash.
That is the common case during active sessions (the DB frontier advances
past the checkpoint between bootstraps), and the matcher can never find a
matching tail entry in that state, so we skip straight to the async
full-read slow path.

Tests in bootstrap-message-only.test.ts are updated to await the
now-async function; full vitest suite (695 tests, 40 files) stays green.

Co-Authored-By: WoCha <wocha@jetd.one> via Claude Code
@jalehman

Copy link
Copy Markdown
Contributor

Thank you!

@jalehman jalehman merged commit ea7d532 into Martian-Engineering:main Apr 11, 2026
1 check passed
@jetd1 jetd1 deleted the fix/bootstrap-sync-io branch April 12, 2026 06:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants