fix: bound chat context in pi's compaction layer (#3852) + clear-cache label (#3589) by louis030195 · Pull Request #3854 · screenpipe/screenpipe

louis030195 · 2026-06-05T16:54:03Z

What

Two fixes from the user-feedback triage.

[bug] Chat fails with 413 "prompt is too long" (>200k tokens) instead of trimming context to the model window #3852 chat hard-fails with 413 prompt is too long on long conversations. Now fixed in pi's own context/compaction layer.
[bug] [cosmetic] Clear cache button says "Scan" #3589 the clear-cache button said "scan" instead of "clear".

1. Bound oversized messages in pi's compaction layer (closes #3852)

Root cause. pi's built-in compaction summarizes across messages but cuts at message boundaries, so it can never shrink a single message that is itself larger than the context window. The chat re-injects recent history as one big <conversation_history> user message on every send (the #3636 contract). On a long chat (or with a huge pasted/tool turn) that single message exceeds the window, the request 413s, and pi's overflow-retry cannot recover because no amount of summarizing other turns shrinks that one message.

Fix. The context-pruning.ts extension already runs in every chat and pipe session. Its context hook is pi's transformContext slot: it runs before every LLM call and the array it returns is what actually gets sent. The hook now clamps any single message whose text exceeds 50% of the model's real ctx.model.contextWindow. For the injected history block it drops the oldest turns and always preserves the user's actual message; other oversized payloads keep head and tail. This complements built-in compaction (which then handles cross-message totals) rather than duplicating it.

flowchart LR
  A["chat injects last 40 turns as one user message (#3636)"] --> C["pi context hook, before every LLM call"]
  C --> D{"any single message > 50% of window?"}
  D -->|"before #3852"| E["built-in compaction can't split one message"] --> F["413 prompt too long"]
  D -->|"after"| G["clamp it: keep recent turns + the question"] --> H["compaction handles the rest, no 413"]

This replaces an earlier attempt that bounded the block on the frontend with a character budget. Per review that was the wrong layer (it bypassed pi's compaction and was a blunt char clamp), so it is reverted: standalone-chat.tsx is back to baseline and the chat-context-budget.ts helper is removed.

2. Clear-cache button label (closes #3589)

The card title, description, and confirm dialog all say "clear cache"; the button said "scan". Changed it to "clear".

Verification

bunx vitest run (22 tests, all green) drive the real shipped extension handlers, imported via a new @screenpipe-ext vitest alias so there is no ported copy to drift: the pure helpers (including a throwing getContextUsage), the context handler end to end (empty/short no-op, the ~800k-char [bug] Chat fails with 413 "prompt is too long" (>200k tokens) instead of trimming context to the model window #3852 repro bounded under the window with the question preserved, small-window models, old-tool-result pruning still intact, combined cases), and the tool_result handler.
next build compiles and type-checks clean (the one warning is a pre-existing unpdf dependency note, unrelated).
Honest limitation. The wdio e2e suite mocks pi (no real subprocess), so it cannot exercise this extension. The handler-level tests above are the coverage. e2e still covers the frontend <conversation_history> injection contract (Bug: Chat Suddenly Loses Conversation Context #3636), which the revert leaves unchanged.

Scope

Other live issues from the triage (transcription backlog / proxy resets #3850, model 404s #3786, timezone scheduling #3851) need staging or manual verification and are tracked separately.

🤖 Generated with Claude Code

…window (#3852) Chat injected the last 40 turns into every prompt (the #3636 always-inject contract) with no size budget, and never clamped individual turn text. A long chat, a huge pasted message, or a big tool result could push the assembled prompt past the model's context window and hard-fail with `413 ... prompt is too long: 206134 tokens > 200000 maximum` instead of degrading gracefully. Extract the (previously duplicated) history-block assembly into a single, unit-tested helper that budgets the injected history in characters against the preset's existing `maxContextChars` setting: it drops the oldest turns first and clamps any single oversized turn, keeping the most recent context. Both the queued path and the main send path now route through it. The #3636 contract is preserved (recent history is always injected when present); only its size is now bounded. Verified by 13 unit tests including a repro of the ~800k-char history that produced the 413. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The Clear Cache card, its description, and the confirm dialog all say "clear cache", but the button itself said "scan" — the lone inconsistency the issue reports. The button is the sole entry point to the clear flow (it scans to build the deletion preview, then the dialog confirms and deletes), so "clear" matches the standard pattern and the surrounding copy. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…end (#3852) Moves the #3852 fix out of the frontend and into pi's own context/compaction layer, where it can be token-accurate and where context management belongs. Root cause: pi's built-in compaction summarizes ACROSS messages but cuts at message boundaries, so it can never shrink a SINGLE message that is itself larger than the context window. The chat re-injects recent history as one big <conversation_history> user message every send (#3636); on a long chat that single message overflows and hard-fails with `413 prompt is too long`, and the overflow-retry cannot recover. The context-pruning extension (already installed in every chat + pipe session) now clamps any single oversized message to <=50% of the model's real ctx.model.contextWindow, inside its `context` hook (pi's transformContext, run before every LLM call; returning {messages} replaces what is sent). For the injected history block it drops the oldest turns and always preserves the user's actual message; generic huge payloads keep head + tail. This complements built-in compaction instead of duplicating it. Reverts the earlier frontend char-budget (standalone-chat.tsx back to baseline; deletes lib/chat-context-budget.ts and its test) so context management lives in one place. Tests: 22 cases drive the real shipped extension handlers (imported via a new @screenpipe-ext vitest alias, so no ported copy can drift) across edge cases — the ~800k-char #3852 repro bounded under the window with the question intact, small-window models, old-tool-result pruning, and tool_result feedback. The wdio e2e suite mocks pi (no subprocess) so it cannot exercise the extension; these handler tests are the coverage. next build compiles cleanly. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

louis030195 · 2026-06-05T18:02:53Z

@Anshgrover23 @divanshu-go anyone can test the compaction works and is as good UX as codex/claude across pipes and chat?

Strengthens the unit suites for the three Tier-1 fixes just merged: - media-file-path (#3845): malformed percent-escapes, no-extension / empty / backtick inputs, case-insensitive ext, Windows forward-slash path in text, Unix audio-chunk path with spaces+parens; audio/media classifier casing; already-wrapped markdown link, > escaping, Windows path in a markdown link. - context-pruning (#3854): malformed <conversation_history> blocks (unterminated, close-before-open) fall back to generic head+tail; multiple oversized text blocks clamped while non-text blocks untouched; non-string/non-array message content ignored; zero window from getContextUsage falls back to default. - notification-toggle (#3794): notify rule detected across a blank line in the deny block; enable keeps permissions+allow children when only the notify deny rule existed; deny block retained when a non-notify rule remains. Test-only. Full vitest suite green (486 tests). These functions are pure modules; the wdio e2e suite mocks pi and drives whole-app UI, so the deterministic edge-case layer for them is vitest (the context-pruning suite is already documented as the extension's real e2e coverage). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Louis Beaumont and others added 3 commits June 5, 2026 09:53

louis030195 changed the title ~~fix(chat): budget injected conversation history to the context window (#3852) + clear-cache label (#3589)~~ fix: bound chat context in pi's compaction layer (#3852) + clear-cache label (#3589) Jun 5, 2026

louis030195 merged commit e128ba1 into main Jun 7, 2026
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: bound chat context in pi's compaction layer (#3852) + clear-cache label (#3589)#3854

fix: bound chat context in pi's compaction layer (#3852) + clear-cache label (#3589)#3854
louis030195 merged 3 commits into
mainfrom
claude/sweet-varahamihira-cf7819

louis030195 commented Jun 5, 2026 •

edited

Loading

Uh oh!

louis030195 commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

louis030195 commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

1. Bound oversized messages in pi's compaction layer (closes #3852)

2. Clear-cache button label (closes #3589)

Verification

Scope

Uh oh!

louis030195 commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

louis030195 commented Jun 5, 2026 •

edited

Loading