fix: bound compactFullSweep so a compaction sweep cannot hang the turn#712
Merged
jalehman merged 3 commits intoMay 19, 2026
Merged
Conversation
A threshold full sweep ran an unbounded leaf-pass loop (one pass per raw chunk) synchronously on the turn-critical path — drained inside the next turn's assemble() under deferred compaction. With a slow or rate-limited summarizer each pass burns its full summaryTimeoutMs, so a large conversation (observed: 16 passes on 308K tokens) left the agent unresponsive for tens of minutes. Bound compactFullSweep with both a hard per-pass iteration cap (maxSweepIterations, default 12) and a wall-clock deadline (sweepDeadlineMs, default 120000), shared across Phase 1 and Phase 2 so the total sweep is bounded. On hitting either limit the sweep stops before starting another pass and returns the consistent partial result, logging a clear warning. Remaining context pressure is picked up by the next sweep. The deadline also time-boxes the inline assemble() deferred-debt drain. Yield the Node event loop (setImmediate) between the synchronous node:sqlite scans of consecutive passes so a long sweep cannot freeze the gateway for its whole duration. Both limits are configurable via plugin config or the LCM_MAX_SWEEP_ITERATIONS / LCM_SWEEP_DEADLINE_MS env vars.
7 tasks
Collaborator
Author
added 2 commits
May 19, 2026 20:51
PR Martian-Engineering#712 bounded a single compactFullSweep with a per-pass iteration cap and a wall-clock deadline (sweepDeadlineMs), but compactUntilUnder wraps a `for round = 1..maxRounds` loop around sweeps. Each compactFullSweep invocation re-initializes its own sweepDeadlineAt, so Martian-Engineering#712's deadline resets every round. Worst case for the codex automatic compaction path (force:false, compactionTarget:"budget") is maxRounds × sweepDeadlineMs ≈ 10 × 120s ≈ 20 minutes — a real stall Martian-Engineering#712 does not catch. Bound the whole compactUntilUnder operation with its own wall-clock budget. _compactUntilUnderImpl computes one operationDeadlineAt at the start and (a) threads it into every round's compactFullSweep via a new optional operationDeadlineAt input, so a sweep stops at whichever is sooner — its own sweepDeadlineMs or the operation deadline — and (b) checks it in the `for round` loop, bailing before the next round with the consistent partial result and a `compactUntilUnder stopped at …` warning. The total budget is a separate knob, compactUntilUnderDeadlineMs (default 300000), not a reuse of sweepDeadlineMs: a single legitimate round's sweep can use most of sweepDeadlineMs, so reusing it as the operation-wide budget would let the first round alone exhaust it and break multi-round compaction. 5 minutes leaves room for a few full-deadline sweeps while capping the worst case well below 20 minutes. Configurable via plugin config or LCM_COMPACT_UNTIL_UNDER_DEADLINE_MS.
This was referenced May 19, 2026
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On OpenClaw 2026.5.18 + lossless-claw 0.10.0, a context compaction can leave the
agent unresponsive for tens of minutes. Three compounding LCM-side causes:
compactFullSweepwas unbounded. The Phase 1 leaf-pass loop insrc/compaction.ts(while (true)) had no max-iteration cap and nowall-clock deadline — one leaf pass per raw-message chunk. Large
conversations produce many passes (observed: 16 passes on a 308K-token
conversation).
proactiveThresholdCompactionMode: "deferred", recorded compaction debt isdrained inside the next turn's
assemble(), which runscompactFullSweepbefore returning the assembled prompt. Each leaf pass also does a synchronous
node:sqlitescan that blocks the Node event loop.compactUntilUnderwas unbounded across rounds. Bounding one sweep isnot enough:
compactUntilUnder(src/compaction.ts) wraps afor (round = 1; round <= maxRounds; round++)loop around sweeps, callingcompact()→compactFullSweeponce per round withmaxRounds = 10. EachcompactFullSweepinvocation re-initializes its ownsweepStartedAt/sweepDeadlineAt, so the per-sweepsweepDeadlineMsresets every round.Worst case is
maxRounds × sweepDeadlineMs ≈ 10 × 120s ≈ 20 minutes. This isthe codex automatic compaction path (
force:false, compactionTarget:"budget") — the common one — so the per-sweep bound alonestill leaves a real ~20-minute stall.
When the summarizer is slow or rate-limited, each pass burns its full
summaryTimeoutMs(default 60s, often configured 180s) before falling back;16 passes × that = 15–48 minutes of a hung turn.
Fix
compactFullSweepnow enforces both a hard per-passiteration cap (
maxSweepIterations, default 12) and a wall-clock deadline(
sweepDeadlineMs, default 120000). The counter and deadline are sharedacross Phase 1 (leaf) and Phase 2 (condensed), so the total sweep stays
bounded. On hitting either limit the sweep stops before starting another
pass and returns the consistent partial result, logging a clear
compactFullSweep stopped at …warning. Remaining context pressure ispicked up by the next sweep.
compactUntilUnderoperation.compactUntilUndernowcomputes one operation-wide wall-clock deadline at its start and (a) threads
it into every round's
compactFullSweepvia a new optionaloperationDeadlineAtinput, so a sweep stops at whichever is sooner — itsown
sweepDeadlineMsor the operation deadline — and (b) checks it in thefor roundloop, bailing before the next round with the consistent partialresult and a
compactUntilUnder stopped at …warning. The total budget is aseparate knob,
compactUntilUnderDeadlineMs(default 300000), not areuse of
sweepDeadlineMs: a single legitimate round's sweep can use most ofsweepDeadlineMs, so reusing it as the operation-wide budget would let thefirst round alone exhaust it and break multi-round compaction. 5 minutes
leaves room for a few full-deadline sweeps while capping the worst case well
below 20 minutes.
awaits a macrotask(
setImmediate) between the synchronousnode:sqlitescans of consecutivepasses, so a long sweep cannot freeze the gateway for its whole duration.
assemble()deferred-debt drain reachescompactFullSweepthroughcompact(), so the deadline above also boundswhat
assemble()can do inline — the turn is no longer held hostage by afull sweep.
All three limits are configurable via plugin config (
maxSweepIterations,sweepDeadlineMs,compactUntilUnderDeadlineMs) or theLCM_MAX_SWEEP_ITERATIONS/LCM_SWEEP_DEADLINE_MS/LCM_COMPACT_UNTIL_UNDER_DEADLINE_MSenvironment variables, following theexisting config conventions.
This is complementary to the
interceptCompactionhandoff work (#665) and thethreshold hard floor (#619): it bounds the sweep itself, orthogonal to those.
The host-side half of this fix — giving the OpenClaw host a safety timeout
so it does not
awaita slow plugin-owned compaction forever — is inopenclaw/openclaw#84083. The two PRs together close the stall: this one
keeps an individual sweep bounded; that one bounds the host's wait on the
plugin regardless.
Related: #584 (
withTimeoutdoesn't abort the underlying summarizer call). ThisPR does not change
withTimeout— bounding the sweep already caps total timeeven when a single pass burns its full timeout. Aborting the underlying call is
a separate, larger change.
Tests
compactFullSweep boundssuite intest/lcm-integration.test.ts: a sweepthat would exceed the iteration cap stops cleanly with a consistent partial
result; raising the cap genuinely runs more passes (probe-verified: 14 passes
uncapped vs 2 capped — directly reproduces the 16-pass bug); a sweep that
exceeds the wall-clock deadline stops cleanly; a bounded sweep returns within
a small multiple of the deadline.
compactUntilUnder boundssuite intest/lcm-integration.test.ts: amulti-round
compactUntilUnderthat would otherwise runmaxRounds × sweepDeadlineMsstops at the operation deadline instead (total wall-clockbounded to a small multiple of
compactUntilUnderDeadlineMs, notmaxRounds × sweepDeadlineMs); it returns a consistent partial result on thedeadline; and a generous deadline does not cut a legitimate fast multi-round
run short.
test/config.test.ts: default, plugin-config, and env-override coverage forall three new settings.
test/circuit-breaker.test.ts: the cooldown test now fakes onlyDate(its cooldown is
Date.now()-based) so the new in-sweepsetImmediateyieldstill runs under that test.
npm run build(the CI build gate) passes; the local vitest suite passes forall touched and adjacent files;
tsc --noEmitintroduces zero new errorsversus the prior branch state.
Fixes #711
Takeover hardening update
During takeover review, an additional edge case was found and fixed in
eb53bcf: if the sweep deadline expired while selecting a leaf chunk or condensation candidate,compactFullSweepcould still start one more summarizer pass after the budget was gone.The branch now rechecks the sweep budget after selection and before starting either the leaf or condensed summarizer pass. Two regression tests cover deadline expiry during selection for both paths.
Validation run from
/Volumes/LEXAR/repos/lossless-claw-fix-sweep:./node_modules/.bin/vitest run test/lcm-integration.test.ts test/config.test.ts test/circuit-breaker.test.ts --maxWorkers=1-> 160/160 passingnpm run build-> passing