Skip to content

[bootstrap] restore bulk-insert fast path + TTL stale diagnostics (H/8)#578

Open
100yenadmin wants to merge 4 commits into
Martian-Engineering:mainfrom
electricsheephq:pr8/bootstrap-perf-and-diagnostics-ttl
Open

[bootstrap] restore bulk-insert fast path + TTL stale diagnostics (H/8)#578
100yenadmin wants to merge 4 commits into
Martian-Engineering:mainfrom
electricsheephq:pr8/bootstrap-perf-and-diagnostics-ttl

Conversation

@100yenadmin

Copy link
Copy Markdown
Collaborator

Summary

Two bootstrap-side followups to v0.9.3 (closes #568, refs #489 / #510 / #512 / #495):

D1 — bootstrap per-message ingest perf regression (#510)

PR #510 replaced bulk createMessagesBulk + appendContextMessages with N×ingestSingle inside a single withTransaction so the interceptor pipeline (large files, inline images, oversized raw payloads, native image blocks, large tool results) runs on first import. Correct for sessions whose transcripts contain interceptor-triggering content; slow for the common case where thousands of plain text messages each pay summary-store + conversation-store round-trips inside one mega-transaction.

Fix. A cheap pre-scan (bootstrapMessageWouldTriggerInterceptor) inspects each bootstrap message:

  • image blocks (any block with type: "image")
  • inline-image markers ([media attached:) or long bare-base64 runs
  • file blocks (<file ...>) — interceptor decides on size
  • approximate byte size ≥ largeFileTokenThreshold (upper-bound on token count)

If none of the bootstrap messages would trigger any interceptor, restore the v0.9.2 bulk-insert fast path. Otherwise, retain per-message ingest with a Promise.resolve() yield every K=100 messages so the worker doesn't monopolise the loop. False positives on the pre-scan are acceptable (we just take the slower path); false negatives would skip externalization, so the heuristic errs conservative.

F10 — recentBootstrapImportsByConversation diagnostics TTL (#512)

PR #512's recentBootstrapImportsByConversation Map records {importedMessages, reason, observedAt} per conversation, evicted only by an LRU cap at 100 conversations. So a stale "7 imports" tag could trail a long-tail conversation indefinitely — operator log lines kept showing import counts hours/days after the fact.

Fix. Add BOOTSTRAP_IMPORT_OBSERVATION_TTL_MS = 30 * 60_000 and gate the surfaced fields in formatOverflowDiagnosticsForLog. When the entry is stale, both recentBootstrapImportCount and recentBootstrapImportReason are emitted as null instead of the misleading old value.

Test plan

  • npm test — 833 → 837 passing (+4 new tests)
  • npm run build clean
  • D1: existing test renamed to "uses the bulk-insert fast path when no bootstrap message would trigger an interceptor" (asserts bulkSpy called, singleSpy not). New "falls through to per-message ingest" test confirms the slow path still runs when a <file ...> block is present.
  • D1 perf guard: 400 plain messages bootstrap completes in <15s on the test runner, asserts createMessagesBulk called exactly once. Bound is wildly generous (typical local <1s) — point is to catch regression to per-message ingest, which scales O(N) summary-store appends.
  • F10: stale entry (31m old) test asserts recentBootstrapImportCount and recentBootstrapImportReason both surface as null.
  • F10: fresh entry (1m old) test asserts both surface unchanged.

Copilot AI review requested due to automatic review settings May 3, 2026 14:23

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Restores a bootstrap bulk-insert fast path for transcripts that don’t require first-import interceptor processing, and adds a TTL so recentBootstrapImportsByConversation overflow diagnostics don’t report stale import counts long after the fact.

Changes:

  • Add a bootstrap pre-scan to choose between bulk insert vs per-message ingestSingle, plus a periodic yield during per-message ingest.
  • Gate recentBootstrapImport* overflow diagnostics behind a 30-minute observation TTL.
  • Expand/adjust engine tests to cover fast/slow bootstrap paths, TTL behavior, and a perf regression guard.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/engine.ts Adds bootstrap pre-scan + bulk insert fast path, per-message yield, and TTL-gated overflow diagnostics fields.
test/engine.test.ts Updates bootstrap ingest-path tests; adds TTL stale/fresh tests and a perf regression guard for bulk insert.
.changeset/pr8-bootstrap-perf-and-diagnostics-ttl.md Adds release notes describing the bootstrap perf + diagnostics TTL changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/engine.ts
Comment thread .changeset/pr8-bootstrap-perf-and-diagnostics-ttl.md Outdated
Comment thread src/engine.ts Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/engine.ts
Comment thread src/engine.ts Outdated
Comment thread .changeset/pr8-bootstrap-perf-and-diagnostics-ttl.md Outdated
Comment thread src/engine.ts

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/engine.ts
Comment thread src/engine.ts Outdated
Comment thread src/engine.ts Outdated
Comment thread .changeset/pr8-bootstrap-perf-and-diagnostics-ttl.md Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .changeset/pr8-bootstrap-perf-and-diagnostics-ttl.md Outdated
Comment thread src/engine.ts
Comment thread src/engine.ts Outdated
Comment thread src/engine.ts Outdated
100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request May 3, 2026
…geset

Address review on Martian-Engineering#578:

* `bootstrapMessageWouldTriggerInterceptor`: also force per-message
  ingest when (a) the message carries any structural block from
  STRUCTURAL_BLOCK_TYPES (tool_use / tool_result / function_call /
  thinking / reasoning / image / etc.) — bulk-insert only writes
  the `messages` row and leaves `message_parts` unpopulated, which
  would silently drop those blocks on replay; (b) role is `tool` /
  `toolResult` (separate inline-image-in-tool / tool-parts pipeline);
  (c) content has a `<file>` / `<file ` / `<file\n` open-tag (the
  prior `includes("<file ")` missed `<file>` and `<file\n…>`); and
  the byte/code-unit comment is corrected (string.length is UTF-16
  code units, not bytes — still a safe upper bound on tokens).

* per-message yield: `await Promise.resolve()` was a microtask and
  could still starve timer/IO callbacks on long bootstraps.  Replaced
  with `await new Promise(setImmediate)` which yields a macrotask
  boundary every K=100 messages so other event-loop work runs between
  batches.

* changeset: rewrote the bullet to match the actual implementation —
  per-message ingest still runs inside one `withTransaction`, with a
  setImmediate-yield every K=100 messages.  No "K=100 transactions"
  claim.

Tests: new `forces per-message ingest when bootstrap messages carry
tool_use / tool_result / reasoning blocks` regression in
`test/engine.test.ts` — verifies the bulk path is NOT taken for
structural-block content even with a high `largeFileTokenThreshold`,
and that `message_parts` rows get populated through the per-message
path.  838 tests pass.
@100yenadmin 100yenadmin requested a review from Copilot May 3, 2026 17:43

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/engine.test.ts
Comment thread src/engine.ts Outdated
@100yenadmin 100yenadmin changed the title [bootstrap] restore bulk-insert fast path + TTL stale diagnostics [bootstrap] restore bulk-insert fast path + TTL stale diagnostics (H/8) May 3, 2026
100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request May 3, 2026
…ral-block test

Address review on Martian-Engineering#578:

* `bootstrapMessageWouldTriggerInterceptor`: replace `content.length`
  (UTF-16 code units) with `estimateTokens` for the size gate.
  `estimateTokens` weights CJK characters at ~1.5 tokens per code
  point, so long CJK text can have MORE tokens than code units —
  meaning the previous code-unit 'upper bound' was actually a LOWER
  bound for CJK content and could produce false negatives that let
  oversized CJK messages slip into the bulk-insert fast path.  Now
  the pre-scan matches the production token-count surface.

* `forces per-message ingest when bootstrap messages carry tool_use /
  reasoning blocks` regression test: dropped the `role: "tool"`
  message from the test fixture.  Tool/toolResult roles are forced
  to per-message ingest unconditionally by an earlier branch in the
  pre-scan, so a tool-role message would mask a regression in the
  STRUCTURAL_BLOCK_TYPES detection for assistant/user roles.  All
  three messages in the fixture are now `role: "assistant"`
  carrying tool_use, thinking, and function_call blocks — so the
  per-message path is forced ONLY by the structural-block detection.
  Renamed the test accordingly.

838 tests pass.
@100yenadmin 100yenadmin requested a review from Copilot May 4, 2026 09:58

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/engine.ts Outdated
Comment on lines +1123 to +1138
const STRUCTURAL_BLOCK_TYPES = new Set([
"tool_use",
"toolUse",
"toolCall",
"tool-use",
"tool_result",
"toolResult",
"tool-result",
"function_call",
"functionCall",
"function_call_output",
"thinking",
"redacted_thinking",
"reasoning",
"image",
]);
Comment thread src/engine.ts
Comment on lines +5208 to +5216
const threshold = this.config.largeFileTokenThreshold;
const requiresPerMessageIngest = bootstrapMessages.some((message) =>
bootstrapMessageWouldTriggerInterceptor(message, threshold),
);

let importedMessages = 0;
for (const message of bootstrapMessages) {
const result = await this.ingestSingle({
sessionId: params.sessionId,
sessionKey: params.sessionKey,
message,
if (!requiresPerMessageIngest) {
const nextSeq =
(await this.conversationStore.getMaxSeq(conversationId)) + 1;
@jalehman jalehman force-pushed the pr8/bootstrap-perf-and-diagnostics-ttl branch from 11ad1ba to 50e0274 Compare May 18, 2026 18:55
jalehman pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request May 18, 2026
…geset

Address review on Martian-Engineering#578:

* `bootstrapMessageWouldTriggerInterceptor`: also force per-message
  ingest when (a) the message carries any structural block from
  STRUCTURAL_BLOCK_TYPES (tool_use / tool_result / function_call /
  thinking / reasoning / image / etc.) — bulk-insert only writes
  the `messages` row and leaves `message_parts` unpopulated, which
  would silently drop those blocks on replay; (b) role is `tool` /
  `toolResult` (separate inline-image-in-tool / tool-parts pipeline);
  (c) content has a `<file>` / `<file ` / `<file\n` open-tag (the
  prior `includes("<file ")` missed `<file>` and `<file\n…>`); and
  the byte/code-unit comment is corrected (string.length is UTF-16
  code units, not bytes — still a safe upper bound on tokens).

* per-message yield: `await Promise.resolve()` was a microtask and
  could still starve timer/IO callbacks on long bootstraps.  Replaced
  with `await new Promise(setImmediate)` which yields a macrotask
  boundary every K=100 messages so other event-loop work runs between
  batches.

* changeset: rewrote the bullet to match the actual implementation —
  per-message ingest still runs inside one `withTransaction`, with a
  setImmediate-yield every K=100 messages.  No "K=100 transactions"
  claim.

Tests: new `forces per-message ingest when bootstrap messages carry
tool_use / tool_result / reasoning blocks` regression in
`test/engine.test.ts` — verifies the bulk path is NOT taken for
structural-block content even with a high `largeFileTokenThreshold`,
and that `message_parts` rows get populated through the per-message
path.  838 tests pass.
jalehman pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request May 18, 2026
…ral-block test

Address review on Martian-Engineering#578:

* `bootstrapMessageWouldTriggerInterceptor`: replace `content.length`
  (UTF-16 code units) with `estimateTokens` for the size gate.
  `estimateTokens` weights CJK characters at ~1.5 tokens per code
  point, so long CJK text can have MORE tokens than code units —
  meaning the previous code-unit 'upper bound' was actually a LOWER
  bound for CJK content and could produce false negatives that let
  oversized CJK messages slip into the bulk-insert fast path.  Now
  the pre-scan matches the production token-count surface.

* `forces per-message ingest when bootstrap messages carry tool_use /
  reasoning blocks` regression test: dropped the `role: "tool"`
  message from the test fixture.  Tool/toolResult roles are forced
  to per-message ingest unconditionally by an earlier branch in the
  pre-scan, so a tool-role message would mask a regression in the
  STRUCTURAL_BLOCK_TYPES detection for assistant/user roles.  All
  three messages in the fixture are now `role: "assistant"`
  carrying tool_use, thinking, and function_call blocks — so the
  per-message path is forced ONLY by the structural-block detection.
  Renamed the test accordingly.

838 tests pass.
@jalehman jalehman force-pushed the pr8/bootstrap-perf-and-diagnostics-ttl branch from 50e0274 to be7e008 Compare May 18, 2026 19:14
Eva and others added 4 commits May 18, 2026 12:15
…vations

Two bootstrap-side followups to v0.9.3:

* D1: PR Martian-Engineering#510 replaced bulk createMessagesBulk + appendContextMessages
  with N×ingestSingle inside a single withTransaction so the interceptor
  pipeline runs on first import.  Correct for sessions whose transcripts
  contain interceptor-triggering content (oversized files/images/tool-
  results); slow for the common case.  Pre-scan now restores the bulk-
  insert fast path when no message would trigger an interceptor;
  otherwise chunked transactions of K=100.

* F10: PR Martian-Engineering#512's recentBootstrapImportsByConversation Map had no time-
  based expiry — stale "7 imports" tags trailed conversations
  indefinitely (LRU only evicts at 100 conversations).  Add 30-minute
  TTL gated in formatOverflowDiagnosticsForLog so operator log lines
  reflect recent activity only.

Closes Martian-Engineering#568.  Refs Martian-Engineering#489, Martian-Engineering#510, Martian-Engineering#512, Martian-Engineering#495.
…geset

Address review on Martian-Engineering#578:

* `bootstrapMessageWouldTriggerInterceptor`: also force per-message
  ingest when (a) the message carries any structural block from
  STRUCTURAL_BLOCK_TYPES (tool_use / tool_result / function_call /
  thinking / reasoning / image / etc.) — bulk-insert only writes
  the `messages` row and leaves `message_parts` unpopulated, which
  would silently drop those blocks on replay; (b) role is `tool` /
  `toolResult` (separate inline-image-in-tool / tool-parts pipeline);
  (c) content has a `<file>` / `<file ` / `<file\n` open-tag (the
  prior `includes("<file ")` missed `<file>` and `<file\n…>`); and
  the byte/code-unit comment is corrected (string.length is UTF-16
  code units, not bytes — still a safe upper bound on tokens).

* per-message yield: `await Promise.resolve()` was a microtask and
  could still starve timer/IO callbacks on long bootstraps.  Replaced
  with `await new Promise(setImmediate)` which yields a macrotask
  boundary every K=100 messages so other event-loop work runs between
  batches.

* changeset: rewrote the bullet to match the actual implementation —
  per-message ingest still runs inside one `withTransaction`, with a
  setImmediate-yield every K=100 messages.  No "K=100 transactions"
  claim.

Tests: new `forces per-message ingest when bootstrap messages carry
tool_use / tool_result / reasoning blocks` regression in
`test/engine.test.ts` — verifies the bulk path is NOT taken for
structural-block content even with a high `largeFileTokenThreshold`,
and that `message_parts` rows get populated through the per-message
path.  838 tests pass.
…ral-block test

Address review on Martian-Engineering#578:

* `bootstrapMessageWouldTriggerInterceptor`: replace `content.length`
  (UTF-16 code units) with `estimateTokens` for the size gate.
  `estimateTokens` weights CJK characters at ~1.5 tokens per code
  point, so long CJK text can have MORE tokens than code units —
  meaning the previous code-unit 'upper bound' was actually a LOWER
  bound for CJK content and could produce false negatives that let
  oversized CJK messages slip into the bulk-insert fast path.  Now
  the pre-scan matches the production token-count surface.

* `forces per-message ingest when bootstrap messages carry tool_use /
  reasoning blocks` regression test: dropped the `role: "tool"`
  message from the test fixture.  Tool/toolResult roles are forced
  to per-message ingest unconditionally by an earlier branch in the
  pre-scan, so a tool-role message would mask a regression in the
  STRUCTURAL_BLOCK_TYPES detection for assistant/user roles.  All
  three messages in the fixture are now `role: "assistant"`
  carrying tool_use, thinking, and function_call blocks — so the
  per-message path is forced ONLY by the structural-block detection.
  Renamed the test accordingly.

838 tests pass.
@jalehman jalehman force-pushed the pr8/bootstrap-perf-and-diagnostics-ttl branch from be7e008 to 1b33a4d Compare May 18, 2026 19:28
@100yenadmin 100yenadmin added bug Something isn't working priority:P2 Important bug, compatibility, performance, or reliability issue labels May 31, 2026
@100yenadmin

Copy link
Copy Markdown
Collaborator Author

Maintainer review: needs rebase before deeper review.

Summary
@100yenadmin, this PR is still potentially useful, but the current head 1b33a4d09596acc59f68b688ba45a90f0c0ef0ce is CONFLICTING / DIRTY against main, so it cannot be reviewed for merge-readiness yet.

Next step
Please rebase or replace this branch against current main, then refresh CI. After it is clean, we can do the actual bug/priority review and decide whether it should stay in the P1/P2 lane.

Evidence checked
  • PR title: [bootstrap] restore bulk-insert fast path + TTL stale diagnostics (H/8)
  • Live state: open, mergeable=CONFLICTING, mergeStateStatus=DIRTY.
  • Current labels: bug, priority:P2
  • No existing maintainer-mode rebase marker was present before this comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working priority:P2 Important bug, compatibility, performance, or reliability issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bootstrap] perf regression risk + diagnostics observability followups

3 participants