feat(core): reactive compression on context overflow#329
Conversation
Adapted from QwenLM#3879 (+QwenLM#3985 hardening). When a provider rejects a request because the prompt exceeds the context window, force-compress the chat history once and retry the request — instead of surfacing a hard failure. Because our geminiChat has diverged substantially from upstream (≈1/3 the size, bespoke overflow handling, compression owned by client.tryCompressChat rather than a geminiChat.tryCompress method), this is an adaptation, not a patch port: - New `utils/contextLengthError.ts`: provider-agnostic `getContextLengthExceededInfo` that classifies an arbitrary error (incl. nested causes / embedded JSON) as a context-overflow vs. a timeout, and extracts actual/limit token counts. Ported near-verbatim (self-contained) with full test coverage. - geminiChat streaming retry loop: on a classified overflow, compress the LIVE chat via `ChatCompressionService` (force=true) — NOT client.tryCompressChat, which calls startChat() and would swap the chat instance out from under the in-flight generator. Install the compressed history with setHistory, mirror the essential side effects (FileReadCache.clear + telemetry token count), refresh the request contents, and retry once. A one-shot flag prevents any compress/retry loop if the request is still too large after compression. Tests: 25 classifier cases; 3 geminiChat reactive cases (compress→retry succeeds; non-overflow errors don't compress; at-most-once guard). 39 geminiChat + 51 compression-service tests pass; core builds; eslint clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
👀 Quinn is reviewing — verdict (PASS / WARN / FAIL) + findings to follow. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (4)
WalkthroughThis PR adds reactive compression to GeminiChat: when the provider rejects a request due to context-window overflow, the system automatically compresses the chat history via a new utility, updates internal state, and retries exactly once. The implementation includes provider-agnostic error classification and comprehensive test coverage. ChangesReactive Compression on Context Overflow
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint install failed: one or more packages not found in the registry. Comment |
There was a problem hiding this comment.
QA Audit — PR #329 | feat(core): reactive compression on context overflow
VERDICT: PENDING — CI not yet terminal
CI Status (in_progress)
- Lint: in_progress
- CodeQL: in_progress
Diff Review
This PR adds reactive context-window overflow handling to geminiChat.ts. The key logic (lines 582–648 implied):
- On provider rejection,
getContextLengthExceededInfoclassifies the error (context overflow vs. timeout vs. unrelated) — distinguishes it from timeout, which must NOT compress. - If overflow:
ChatCompressionService.compress(force=true)is called directly onself, not viaclient.tryCompressChat, avoiding stream-swap race. setHistory+FileReadCache.clear()+ telemetry side effects mirrorclient.tryCompressChatpost-conditions.reactiveCompressionAttemptedflag ensures at-most-once compression (no compress/retry loop).- Three new test cases cover: success path, non-overflow skip, and at-most-once guard.
Early Observations
- LOW:
clawpatchunavailable at this time (checkout cache SHA mismatch onmain). Full structural analysis deferred; diff-level review only. - LOW: The diff was truncated at 200 lines — the full
geminiChat.tsrewrite for the streaming loop (lines ~583–648) is not visible. Recommend verifying the retry loop logic,refreshRequestContents()call, and theRETRYsignal emission are all correct before merging. - Positive: Test coverage is thorough: 25 classifier cases + 3 reactive geminiChat cases. The at-most-once guard prevents accidental loops.
- Positive: The
getContextLengthErrorapproach (provider-agnostic, handles nested causes + embedded JSON) is a robust abstraction.
Formal verdict (APPROVE / CHANGES_REQUESTED) will be issued once CI reaches a terminal state.
— Quinn, QA Engineer
|
CI is still running (Lint and CodeQL both in_progress). I've submitted an initial COMMENT review with observations and noted that the formal verdict is pending CI terminal state. The autonomous merge loop will need to trigger a re-review once CI completes. Submitted COMMENT review on #329 (awaiting CI terminal state for PASS/FAIL verdict). |
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
Adapted from QwenLM/qwen-code#3879 (+QwenLM#3985 hardening).
What
When a provider rejects a request because the prompt exceeds the context window, the chat now force-compresses history once and retries instead of surfacing a hard failure. This complements our existing proactive (pre-send) compression with a reactive safety net for the cases where the estimate was off or history grew faster than expected.
Why this is an adaptation, not a patch-port
Our
geminiChat.tshas diverged hard from upstream (≈1/3 the size, bespoke overflow handling, and compression owned byclient.tryCompressChatrather than an upstream-stylegeminiChat.tryCompressmethod). So I ported the reusable piece verbatim and adapted the wiring to our structure.Changes
utils/contextLengthError.ts— provider-agnosticgetContextLengthExceededInfo(error)that classifies an arbitrary error value (including nestedcauses and embedded JSON error bodies) as a context-overflow vs. a timeout (which must not trigger compression), and extracts actual/limit token counts from OpenAI/Anthropic/etc. message shapes. Self-contained; ported near-verbatim with full test coverage (25 cases).geminiChat.tsstreaming retry loop — on a classified overflow, compress the live chat viaChatCompressionService(force=true), install the result withsetHistory, mirror the essential side effects (FileReadCache.clear()+ telemetry token count), refresh request contents, and retry once.selfdirectly rather than callingclient.tryCompressChat, which runsstartChat()and would swap the chat instance out from under the in-flight stream generator.reactiveCompressionAttemptedflag prevents any compress/retry loop if the request is still too large after compression.Tests
Notes
CompactTrigger/originalTokenCountOverrideplumbing — ourChatCompressionService.compress(force=true)already covers the force path and derives its own trigger.contextLengthError.tsis used internally (no package-index export); can be exported later if needed.🤖 Generated with Claude Code
Summary by CodeRabbit