fix: bound chat context in pi's compaction layer (#3852) + clear-cache label (#3589)#3854
Merged
Merged
Conversation
…window (#3852) Chat injected the last 40 turns into every prompt (the #3636 always-inject contract) with no size budget, and never clamped individual turn text. A long chat, a huge pasted message, or a big tool result could push the assembled prompt past the model's context window and hard-fail with `413 ... prompt is too long: 206134 tokens > 200000 maximum` instead of degrading gracefully. Extract the (previously duplicated) history-block assembly into a single, unit-tested helper that budgets the injected history in characters against the preset's existing `maxContextChars` setting: it drops the oldest turns first and clamps any single oversized turn, keeping the most recent context. Both the queued path and the main send path now route through it. The #3636 contract is preserved (recent history is always injected when present); only its size is now bounded. Verified by 13 unit tests including a repro of the ~800k-char history that produced the 413. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Clear Cache card, its description, and the confirm dialog all say "clear cache", but the button itself said "scan" — the lone inconsistency the issue reports. The button is the sole entry point to the clear flow (it scans to build the deletion preview, then the dialog confirms and deletes), so "clear" matches the standard pattern and the surrounding copy. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…end (#3852) Moves the #3852 fix out of the frontend and into pi's own context/compaction layer, where it can be token-accurate and where context management belongs. Root cause: pi's built-in compaction summarizes ACROSS messages but cuts at message boundaries, so it can never shrink a SINGLE message that is itself larger than the context window. The chat re-injects recent history as one big <conversation_history> user message every send (#3636); on a long chat that single message overflows and hard-fails with `413 prompt is too long`, and the overflow-retry cannot recover. The context-pruning extension (already installed in every chat + pipe session) now clamps any single oversized message to <=50% of the model's real ctx.model.contextWindow, inside its `context` hook (pi's transformContext, run before every LLM call; returning {messages} replaces what is sent). For the injected history block it drops the oldest turns and always preserves the user's actual message; generic huge payloads keep head + tail. This complements built-in compaction instead of duplicating it. Reverts the earlier frontend char-budget (standalone-chat.tsx back to baseline; deletes lib/chat-context-budget.ts and its test) so context management lives in one place. Tests: 22 cases drive the real shipped extension handlers (imported via a new @screenpipe-ext vitest alias, so no ported copy can drift) across edge cases — the ~800k-char #3852 repro bounded under the window with the question intact, small-window models, old-tool-result pruning, and tool_result feedback. The wdio e2e suite mocks pi (no subprocess) so it cannot exercise the extension; these handler tests are the coverage. next build compiles cleanly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Collaborator
Author
|
@Anshgrover23 @divanshu-go anyone can test the compaction works and is as good UX as codex/claude across pipes and chat? |
louis030195
pushed a commit
that referenced
this pull request
Jun 7, 2026
Strengthens the unit suites for the three Tier-1 fixes just merged: - media-file-path (#3845): malformed percent-escapes, no-extension / empty / backtick inputs, case-insensitive ext, Windows forward-slash path in text, Unix audio-chunk path with spaces+parens; audio/media classifier casing; already-wrapped markdown link, > escaping, Windows path in a markdown link. - context-pruning (#3854): malformed <conversation_history> blocks (unterminated, close-before-open) fall back to generic head+tail; multiple oversized text blocks clamped while non-text blocks untouched; non-string/non-array message content ignored; zero window from getContextUsage falls back to default. - notification-toggle (#3794): notify rule detected across a blank line in the deny block; enable keeps permissions+allow children when only the notify deny rule existed; deny block retained when a non-notify rule remains. Test-only. Full vitest suite green (486 tests). These functions are pure modules; the wdio e2e suite mocks pi and drives whole-app UI, so the deterministic edge-case layer for them is vitest (the context-pruning suite is already documented as the extension's real e2e coverage). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Two fixes from the user-feedback triage.
413 prompt is too longon long conversations. Now fixed in pi's own context/compaction layer.1. Bound oversized messages in pi's compaction layer (closes #3852)
Root cause. pi's built-in compaction summarizes across messages but cuts at message boundaries, so it can never shrink a single message that is itself larger than the context window. The chat re-injects recent history as one big
<conversation_history>user message on every send (the #3636 contract). On a long chat (or with a huge pasted/tool turn) that single message exceeds the window, the request 413s, and pi's overflow-retry cannot recover because no amount of summarizing other turns shrinks that one message.Fix. The
context-pruning.tsextension already runs in every chat and pipe session. Itscontexthook is pi'stransformContextslot: it runs before every LLM call and the array it returns is what actually gets sent. The hook now clamps any single message whose text exceeds 50% of the model's realctx.model.contextWindow. For the injected history block it drops the oldest turns and always preserves the user's actual message; other oversized payloads keep head and tail. This complements built-in compaction (which then handles cross-message totals) rather than duplicating it.flowchart LR A["chat injects last 40 turns as one user message (#3636)"] --> C["pi context hook, before every LLM call"] C --> D{"any single message > 50% of window?"} D -->|"before #3852"| E["built-in compaction can't split one message"] --> F["413 prompt too long"] D -->|"after"| G["clamp it: keep recent turns + the question"] --> H["compaction handles the rest, no 413"]This replaces an earlier attempt that bounded the block on the frontend with a character budget. Per review that was the wrong layer (it bypassed pi's compaction and was a blunt char clamp), so it is reverted:
standalone-chat.tsxis back to baseline and thechat-context-budget.tshelper is removed.2. Clear-cache button label (closes #3589)
The card title, description, and confirm dialog all say "clear cache"; the button said "scan". Changed it to "clear".
Verification
bunx vitest run(22 tests, all green) drive the real shipped extension handlers, imported via a new@screenpipe-extvitest alias so there is no ported copy to drift: the pure helpers (including a throwinggetContextUsage), thecontexthandler end to end (empty/short no-op, the ~800k-char [bug] Chat fails with 413 "prompt is too long" (>200k tokens) instead of trimming context to the model window #3852 repro bounded under the window with the question preserved, small-window models, old-tool-result pruning still intact, combined cases), and thetool_resulthandler.next buildcompiles and type-checks clean (the one warning is a pre-existingunpdfdependency note, unrelated).<conversation_history>injection contract (Bug: Chat Suddenly Loses Conversation Context #3636), which the revert leaves unchanged.Scope
Other live issues from the triage (transcription backlog / proxy resets #3850, model 404s #3786, timezone scheduling #3851) need staging or manual verification and are tracked separately.
🤖 Generated with Claude Code