fix: cap first bootstrap parent import#255
Merged
Merged
Conversation
Add a bootstrap token budget for first-time conversation imports so a newly forked session only seeds LCM with the newest slice of raw parent history. Expose the budget in runtime config and plugin schema, derive a sensible default from leaf chunk size, and cover the behavior with config and engine regression tests. Add a patch changeset because this changes user-visible runtime behavior for forked session bootstraps. Regeneration-Prompt: | User wanted the lossless-claw side of issue #249 cleaned up and made shippable after earlier investigation. The existing local patch already added a bootstrapMaxTokens setting and a regression for first-time bootstrap importing too much parent history, but it was still only a working-tree change and the branch had drifted behind origin/main. Rebase the fix onto current origin/main, keeping newer plugin schema updates. Preserve the intended behavior: when a brand-new conversation bootstraps from a large parent transcript, do not bulk-import the full raw history into LCM. Instead keep only the newest messages that fit within a bootstrap token budget, with a default derived from leafChunkTokens and an explicit env/plugin override. Update plugin schema/config tests and add a changeset because this is a user-visible bugfix in fork/bootstrap behavior.
Merged
100yenadmin
pushed a commit
to electricsheephq/lossless-claw-test
that referenced
this pull request
Apr 3, 2026
Two bugs in the bootstrap budget cap introduced in Martian-Engineering#255: 1. A single oversized tail message bypasses the budget entirely. The trim loop condition 'if (kept.length > 0 && ...)' means the first message (newest) is always kept regardless of size. A 50K-token tool result as the last message will bypass a 6K budget. Fix: after the loop, check if the single kept message exceeds budget and return empty instead of silently bypassing. 2. NaN propagates through all numeric env config parsing. parseInt('oops', 10) returns NaN, which is not nullish, so ?? fallback never fires. Invalid env like LCM_LEAF_CHUNK_TOKENS=oops propagates NaN through leafChunkTokens, bootstrapMaxTokens, and every derived config value — effectively disabling all token budgets. Fix: add parseFiniteInt/parseFiniteNumber helpers that return undefined for non-finite results. Replace all 16 raw parseInt/parseFloat calls in resolveLcmConfig() with the safe helpers. Both bugs were found and reproduced with minimal scripts during adversarial review of a production incident.
jalehman
added a commit
that referenced
this pull request
Apr 3, 2026
#258) * fix: harden bootstrap budget against oversized messages and NaN config Two bugs in the bootstrap budget cap introduced in #255: 1. A single oversized tail message bypasses the budget entirely. The trim loop condition 'if (kept.length > 0 && ...)' means the first message (newest) is always kept regardless of size. A 50K-token tool result as the last message will bypass a 6K budget. Fix: after the loop, check if the single kept message exceeds budget and return empty instead of silently bypassing. 2. NaN propagates through all numeric env config parsing. parseInt('oops', 10) returns NaN, which is not nullish, so ?? fallback never fires. Invalid env like LCM_LEAF_CHUNK_TOKENS=oops propagates NaN through leafChunkTokens, bootstrapMaxTokens, and every derived config value — effectively disabling all token budgets. Fix: add parseFiniteInt/parseFiniteNumber helpers that return undefined for non-finite results. Replace all 16 raw parseInt/parseFloat calls in resolveLcmConfig() with the safe helpers. Both bugs were found and reproduced with minimal scripts during adversarial review of a production incident. * test: cover bootstrap and env fallback regressions Add focused regression tests for the oversized singleton bootstrap tail case and invalid numeric env parsing fallback behavior. Add a patch changeset because this PR changes runtime behavior and should be reflected in release notes. Regeneration-Prompt: | The open PR fixed two production regressions but still lacked the release and test follow-through needed to merge. Add targeted regression coverage instead of broad refactors: one config test that proves invalid numeric env values like LCM_LEAF_CHUNK_TOKENS=oops fall back through plugin/default resolution, and one bootstrap test that proves a single oversized tail message is dropped instead of bypassing bootstrapMaxTokens. Also add a patch changeset because the PR changes runtime behavior visible to users and maintainers expect release notes coverage for that. --------- Co-authored-by: Eva <eva@100yen.org> Co-authored-by: Josh Lehman <josh@martian.engineering>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
This PR caps the amount of raw parent history imported during first-time LCM bootstrap. When a new conversation is seeded from a large parent session, lossless-claw now keeps only the newest messages that fit within a bootstrap token budget instead of importing the full raw parent transcript.
Why
Issue #249 was caused by first-time bootstrap bulk-importing all recoverable parent messages into a brand-new conversation. On large unsummarized parent sessions, that could immediately bloat the child conversation and blow the model context before compaction had a chance to help.
Changes
bootstrapMaxTokensconfig supportleafChunkTokensopenclaw.plugin.jsonTesting
pnpm vitest run --exclude '.worktrees/**' test/config.test.ts test/engine.test.ts2 passed, including the bootstrap token-cap regression intest/engine.test.ts