Skip to content

fix: cap first bootstrap parent import#255

Merged
jalehman merged 1 commit into
mainfrom
fix/bootstrap-context-budget
Apr 3, 2026
Merged

fix: cap first bootstrap parent import#255
jalehman merged 1 commit into
mainfrom
fix/bootstrap-context-budget

Conversation

@jalehman

@jalehman jalehman commented Apr 3, 2026

Copy link
Copy Markdown
Contributor

What

This PR caps the amount of raw parent history imported during first-time LCM bootstrap. When a new conversation is seeded from a large parent session, lossless-claw now keeps only the newest messages that fit within a bootstrap token budget instead of importing the full raw parent transcript.

Why

Issue #249 was caused by first-time bootstrap bulk-importing all recoverable parent messages into a brand-new conversation. On large unsummarized parent sessions, that could immediately bloat the child conversation and blow the model context before compaction had a chance to help.

Changes

  • Add bootstrapMaxTokens config support
  • Derive sensible default from leafChunkTokens
  • Trim bootstrap imports to newest budgeted messages
  • Expose schema in openclaw.plugin.json
  • Add config and engine regression coverage
  • Add patch changeset for release notes

Testing

  • pnpm vitest run --exclude '.worktrees/**' test/config.test.ts test/engine.test.ts
  • Expected: 2 passed, including the bootstrap token-cap regression in test/engine.test.ts

Add a bootstrap token budget for first-time conversation imports so a newly
forked session only seeds LCM with the newest slice of raw parent history.
Expose the budget in runtime config and plugin schema, derive a sensible
default from leaf chunk size, and cover the behavior with config and engine
regression tests. Add a patch changeset because this changes user-visible
runtime behavior for forked session bootstraps.

Regeneration-Prompt: |
  User wanted the lossless-claw side of issue #249 cleaned up and made
  shippable after earlier investigation. The existing local patch already
  added a bootstrapMaxTokens setting and a regression for first-time bootstrap
  importing too much parent history, but it was still only a working-tree
  change and the branch had drifted behind origin/main.

  Rebase the fix onto current origin/main, keeping newer plugin schema updates.
  Preserve the intended behavior: when a brand-new conversation bootstraps from
  a large parent transcript, do not bulk-import the full raw history into LCM.
  Instead keep only the newest messages that fit within a bootstrap token
  budget, with a default derived from leafChunkTokens and an explicit env/plugin
  override. Update plugin schema/config tests and add a changeset because this
  is a user-visible bugfix in fork/bootstrap behavior.
@jalehman jalehman merged commit a1bda9b into main Apr 3, 2026
2 checks passed
@github-actions github-actions Bot mentioned this pull request Apr 3, 2026
100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request Apr 3, 2026
Two bugs in the bootstrap budget cap introduced in Martian-Engineering#255:

1. A single oversized tail message bypasses the budget entirely.
   The trim loop condition 'if (kept.length > 0 && ...)' means the
   first message (newest) is always kept regardless of size. A 50K-token
   tool result as the last message will bypass a 6K budget. Fix: after
   the loop, check if the single kept message exceeds budget and return
   empty instead of silently bypassing.

2. NaN propagates through all numeric env config parsing.
   parseInt('oops', 10) returns NaN, which is not nullish, so
   ?? fallback never fires. Invalid env like LCM_LEAF_CHUNK_TOKENS=oops
   propagates NaN through leafChunkTokens, bootstrapMaxTokens, and every
   derived config value — effectively disabling all token budgets.

   Fix: add parseFiniteInt/parseFiniteNumber helpers that return undefined
   for non-finite results. Replace all 16 raw parseInt/parseFloat calls
   in resolveLcmConfig() with the safe helpers.

Both bugs were found and reproduced with minimal scripts during
adversarial review of a production incident.
jalehman added a commit that referenced this pull request Apr 3, 2026
#258)

* fix: harden bootstrap budget against oversized messages and NaN config

Two bugs in the bootstrap budget cap introduced in #255:

1. A single oversized tail message bypasses the budget entirely.
   The trim loop condition 'if (kept.length > 0 && ...)' means the
   first message (newest) is always kept regardless of size. A 50K-token
   tool result as the last message will bypass a 6K budget. Fix: after
   the loop, check if the single kept message exceeds budget and return
   empty instead of silently bypassing.

2. NaN propagates through all numeric env config parsing.
   parseInt('oops', 10) returns NaN, which is not nullish, so
   ?? fallback never fires. Invalid env like LCM_LEAF_CHUNK_TOKENS=oops
   propagates NaN through leafChunkTokens, bootstrapMaxTokens, and every
   derived config value — effectively disabling all token budgets.

   Fix: add parseFiniteInt/parseFiniteNumber helpers that return undefined
   for non-finite results. Replace all 16 raw parseInt/parseFloat calls
   in resolveLcmConfig() with the safe helpers.

Both bugs were found and reproduced with minimal scripts during
adversarial review of a production incident.

* test: cover bootstrap and env fallback regressions

Add focused regression tests for the oversized singleton bootstrap tail case and invalid numeric env parsing fallback behavior. Add a patch changeset because this PR changes runtime behavior and should be reflected in release notes.

Regeneration-Prompt: |
  The open PR fixed two production regressions but still lacked the release and test follow-through needed to merge. Add targeted regression coverage instead of broad refactors: one config test that proves invalid numeric env values like LCM_LEAF_CHUNK_TOKENS=oops fall back through plugin/default resolution, and one bootstrap test that proves a single oversized tail message is dropped instead of bypassing bootstrapMaxTokens. Also add a patch changeset because the PR changes runtime behavior visible to users and maintainers expect release notes coverage for that.

---------

Co-authored-by: Eva <eva@100yen.org>
Co-authored-by: Josh Lehman <josh@martian.engineering>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant