feat(core): add reactive compression on context overflow#3879
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a reactive recovery path for provider context-window overflow errors in the Gemini chat flow: when a request fails due to context length, the chat attempts a forced compression of the current conversation and retries the failed turn once with the compressed context (with a single-attempt guard).
Changes:
- Introduces a context-length overflow classifier (
getContextLengthExceededInfo) that extracts nested messages (including embedded JSON) and parses token counts when available. - Extends chat compression to accept an explicit trigger (
manualvsauto) so forced compressions can still be reported as automatic when appropriate. - Adds reactive compression + retry-once behavior in
GeminiChat.sendMessageStream, with tests covering success, NOOP, throw, and single-retry guard behavior.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| packages/core/src/utils/contextLengthError.ts | New utility to classify context-length overflow errors and extract token counts from provider messages. |
| packages/core/src/utils/contextLengthError.test.ts | Unit tests for overflow classification, timeout exclusions, token parsing, and nested JSON/object extraction. |
| packages/core/src/services/chatCompressionService.ts | Adds CompactTrigger and plumbs an explicit trigger into Pre/Post compact hook reporting. |
| packages/core/src/core/geminiChat.ts | Adds reactive compression-on-overflow and one-time retry behavior during streaming sends. |
| packages/core/src/core/geminiChat.test.ts | Integration tests validating reactive compression behavior and send-lock release on compression failure. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Force-compress chat history after a provider rejects a request for exceeding the context window, then retry the request once with refreshed history. Add provider-agnostic context overflow classification and focused retry coverage. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
834b032 to
031d52a
Compare
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — DeepSeek/deepseek-v4-pro via Qwen Code /review
… auto-compaction redesign - OOM reproduction report: root cause confirmed as structuredClone() positive feedback loop during auto-compaction (#3735, #3879), with real debug log evidence from crash session. - Runtime diagnostics benchmark: process-tree RSS sampling results comparing installed CLI vs local rebuilt bundle. - Auto-compaction threshold redesign: proposal for replacing the fixed 70% token threshold with RSS-aware graduated strategy.
… auto-compaction redesign - OOM reproduction report: root cause confirmed as structuredClone() positive feedback loop during auto-compaction (#3735, #3879), with real debug log evidence from crash session. - Runtime diagnostics benchmark: process-tree RSS sampling results comparing installed CLI vs local rebuilt bundle. - Auto-compaction threshold redesign: proposal for replacing the fixed 70% token threshold with RSS-aware graduated strategy.
Adapted from QwenLM#3879 (+QwenLM#3985 hardening). When a provider rejects a request because the prompt exceeds the context window, force-compress the chat history once and retry the request — instead of surfacing a hard failure. Because our geminiChat has diverged substantially from upstream (≈1/3 the size, bespoke overflow handling, compression owned by client.tryCompressChat rather than a geminiChat.tryCompress method), this is an adaptation, not a patch port: - New `utils/contextLengthError.ts`: provider-agnostic `getContextLengthExceededInfo` that classifies an arbitrary error (incl. nested causes / embedded JSON) as a context-overflow vs. a timeout, and extracts actual/limit token counts. Ported near-verbatim (self-contained) with full test coverage. - geminiChat streaming retry loop: on a classified overflow, compress the LIVE chat via `ChatCompressionService` (force=true) — NOT client.tryCompressChat, which calls startChat() and would swap the chat instance out from under the in-flight generator. Install the compressed history with setHistory, mirror the essential side effects (FileReadCache.clear + telemetry token count), refresh the request contents, and retry once. A one-shot flag prevents any compress/retry loop if the request is still too large after compression. Tests: 25 classifier cases; 3 geminiChat reactive cases (compress→retry succeeds; non-overflow errors don't compress; at-most-once guard). 39 geminiChat + 51 compression-service tests pass; core builds; eslint clean. Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…sion, SDK canUseTool timeout) (#331) Documents behavior shipped in 0.51.x–0.52.0 that lacked coverage: - guides/goal.md: the evaluator's three verdicts (met / not met / impossible), the conservative impossible→abandoned terminal state, and how `/goal` status reports an abandoned goal. (QwenLM#4230) - explanation/agent-harness.md: new "Reactive compression" section — proactive vs. reactive compression and the one-shot force-compress-and-retry on a provider context-overflow rejection. (QwenLM#3879) - reference/sdk-api.md + contributing/sdk-typescript.md: document the `timeout.canUseTool` option and that the SDK now forwards it to the CLI so the control-plane timeout matches the callback's. (QwenLM#4491) (The `--max-tool-calls` / `--max-wall-time` budgets were already documented in guides/run-headless.md when those features shipped.) Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Validation
reactive-compression-verify-211430:context_length_exceeded, prompt-too-long token-count errors, DashScope input-length errors, timeout/deadline negatives, compression success, compression NOOP, compression exceptions, repeated overflow after retry, and avoiding duplicate retry events after reactive compression.Interactive tmux verification
context_length_exceededfor the first streaming chat completion, returns a non-stream compression/utility summary for non-stream requests, then returns a streamed assistant answer for the retried request.人工验证
qwen-code截图


server截图
Scope / Risk
Testing Matrix
Testing matrix notes:
Linked Issues / Bugs