fix(compaction): preserve partial summary on mid-chain chunk failure#82952
Conversation
|
Codex review: needs maintainer review before merge. Latest ClawSweeper review: 2026-05-24 05:24 UTC / May 24, 2026, 1:24 AM ET. Workflow note: Future ClawSweeper reviews update this same comment in place. How this review workflow works
PR Surface View PR surface stats
Summary Reproducibility: yes. at source level. Current main lets a later non-abort PR rating What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. Real behavior proof Next step before merge Security Review detailsBest possible solution: Land the focused compaction fallback after maintainer review and required checks so completed chunk summaries are preserved with an explicit incomplete-summary marker. Do we have a high-confidence way to reproduce the issue? Yes, at source level. Current main lets a later non-abort Is this the best way to solve the issue? Yes. The PR keeps the repair inside the existing compaction fallback path, preserves abort/timeout and first-chunk behavior, keeps oversized-message retry ordering, and avoids new config or provider API surface. Label justifications:
What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against d6c9387c0fbb. |
…#82952) Signed-off-by: Sebastien Tardif <sebtardif@ncf.ca>
|
ClawSweeper PR egg ✨ Hatched: 🥚 common Sunspot Signal Puff Hatch commandComment Hatchability rules:
Rarity: 🥚 common. What is this egg doing here?
|
d79619f to
2095d34
Compare
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
2095d34 to
a85144d
Compare
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
2579a99 to
046216e
Compare
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper re-review |
77e403e to
2d6271e
Compare
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
When summarizing multiple chunks, if a chunk fails after at least one chunk has already succeeded, return the partial summary instead of propagating the error and losing all summarization progress. Abort and timeout errors still propagate immediately. First-chunk failures still rethrow so the existing fallback path runs. Signed-off-by: Sebastien Tardif <sebtardif@ncf.ca>
…dated AgentMessage type
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
Problem
summarizeChunksinsrc/agents/compaction.tsprocesses conversation chunks sequentially through the LLM, building a rolling summary. If any chunk fails after retries, the entire summarization throws and all previously accumulated summary progress is lost. For a 10-chunk conversation where chunks 1-7 succeed and chunk 8 fails, the user loses all 7 successful summaries.The existing
summarizeWithFallbackcaller only handles the total-failure case with a "Context contained N messages" fallback, which discards all semantic content from the successfully summarized chunks.Fix
Wrap the per-chunk
retryAsynccall in a try/catch with three branches:!hasGeneratedChunk): Rethrow sosummarizeWithFallbackcan run its existing fallback path.summarizeWithFallbackchoose the best recovery path. The fallback first tries the oversized-message retry (which may recover more content), then falls back to the partial summary from the successful chunks.This preserves maximum context: if 7 of 10 chunks succeeded, the user keeps a summary covering those 7 chunks instead of losing everything.
Production diff is +29 lines changed in
compaction.ts, plus 198 lines of test coverage in a new test file.Real behavior proof
Behavior addressed: When a multi-chunk compaction fails mid-chain (e.g., LLM API error on chunk 3 of 5), the entire summarization threw and lost all previously accumulated summary progress. The fix returns the partial summary from completed chunks instead of discarding everything.
Real environment tested: Linux (Ubuntu 24.04, WSL2), Node.js v22.16.0, OpenClaw built from source (commit 1d5b5db) at
/tmp/fix-82952.Exact steps or command run after this patch:
Evidence after fix: terminal output copied below.
Compiled production code verification
The three-branch error handling is present in the compiled compaction bundle:
The
hasGeneratedChunktracking variable gates the behavior:Line 773 initializes the flag, line 783 sets it after each successful chunk, and line 786 checks it to decide between rethrow (first-chunk failure) and partial return (mid-chain failure).
Live gateway startup proof
The patched gateway starts, loads the compaction module with partial summary recovery, and runs cleanly:
Vitest partial summary test suite (12 tests, injected failure)
The test suite exercises all three error branches with injected
generateSummaryfailures:generateSummarysucceeds for chunk 1 (returns "Summary of chunk 1"), then throws on chunk 2. Test verifies the returned summary is from chunk 1 (not thrown, not the generic fallback).generateSummarythrows on the very first chunk. Test verifies the error propagates (caller's fallback handles it).generateSummarythrows an abort error mid-chain. Test verifies it propagates even though a previous chunk succeeded.These tests exercise the actual
summarizeChunksfunction through its production code path withvi.mockinjecting the failure at the@earendil-works/pi-coding-agentdependency boundary.Fault injection proof: partial summary marker in compiled production code
The partial summary recovery was triggered by injecting a failure into the compiled
generateSummaryfunction in the production bundle. The mock succeeds for the first chunk (returns a summary string), then throws on all subsequent calls (simulating an LLM API quota failure). The compiledsummarizeChunksfunction runs with 50 messages, splits them into chunks, and when chunk 2 fails, returns the partial summary from chunk 1 with the incomplete-summary marker.Steps:
generateSummary$1in the compiled code with a counter-based mock: succeed for call 1, throw"INJECTED: LLM API quota exceeded"for all subsequent calls.summarizeInStagesfunction from a Node.js script.parts: 1(single-stage, multi-chunk).Terminal output (fault injection against compiled production code, current head):
The updated flow shows both the throw-based recovery and the fallback ordering:
partial summary available:summarizeChunksthrows with the partial summary attached (instead of returning it directly, preserving the oversized-message retry path).Full summarization failed:summarizeWithFallbackcatches the error and extracts the partial summary.[Partial summary: chunks 1-1 of 50]marker is present in the output.Line-by-line analysis:
chunk summarization failed after retries; returning partial summary: Thelog.warnin the catch handler at line 788 fired. The second chunk failed after 3 retry attempts, andhasGeneratedChunkwas true (chunk 1 succeeded), so the catch returns the partial summary instead of throwing.Summary of chunk 1 covering messages about lorem ipsum topics: The content from the first successful chunk (mock LLM response).[Partial summary: chunks 1-1 of 50 were summarized. Chunks 2-50 could not be processed.]: The incomplete-summary marker appended by the fix. This tells the model that chunks 2-50 were lost, so it knows the summary is incomplete and should not treat it as covering the full conversation.Before this fix, the same failure would discard "Summary of chunk 1" entirely and fall through to the generic "Context contained N messages" fallback, losing all semantic content from the successfully summarized chunk.
Observed result after fix: The compiled production
summarizeChunksfunction returns a partial summary with the[Partial summary: chunks 1-N of M]marker when a mid-chain chunk failure occurs. The marker is present in the output from the compiled production bundle with an injectedgenerateSummaryfailure. The gateway starts cleanly. All 12 tests pass. Before the fix, any chunk failure discarded all previously accumulated summary progress.What was not tested: Triggering the partial summary during a live multi-chunk compaction with a real LLM API key. The fault injection exercises the exact same compiled
summarizeChunkscode path with a controlled mock at thegenerateSummarycall boundary.