Skip to content

fix: bound chat context in pi's compaction layer (#3852) + clear-cache label (#3589)#3854

Merged
louis030195 merged 3 commits into
mainfrom
claude/sweet-varahamihira-cf7819
Jun 7, 2026
Merged

fix: bound chat context in pi's compaction layer (#3852) + clear-cache label (#3589)#3854
louis030195 merged 3 commits into
mainfrom
claude/sweet-varahamihira-cf7819

Conversation

@louis030195

@louis030195 louis030195 commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator

What

Two fixes from the user-feedback triage.

  1. [bug] Chat fails with 413 "prompt is too long" (>200k tokens) instead of trimming context to the model window #3852 chat hard-fails with 413 prompt is too long on long conversations. Now fixed in pi's own context/compaction layer.
  2. [bug] [cosmetic] Clear cache button says "Scan" #3589 the clear-cache button said "scan" instead of "clear".

1. Bound oversized messages in pi's compaction layer (closes #3852)

Root cause. pi's built-in compaction summarizes across messages but cuts at message boundaries, so it can never shrink a single message that is itself larger than the context window. The chat re-injects recent history as one big <conversation_history> user message on every send (the #3636 contract). On a long chat (or with a huge pasted/tool turn) that single message exceeds the window, the request 413s, and pi's overflow-retry cannot recover because no amount of summarizing other turns shrinks that one message.

Fix. The context-pruning.ts extension already runs in every chat and pipe session. Its context hook is pi's transformContext slot: it runs before every LLM call and the array it returns is what actually gets sent. The hook now clamps any single message whose text exceeds 50% of the model's real ctx.model.contextWindow. For the injected history block it drops the oldest turns and always preserves the user's actual message; other oversized payloads keep head and tail. This complements built-in compaction (which then handles cross-message totals) rather than duplicating it.

flowchart LR
  A["chat injects last 40 turns as one user message (#3636)"] --> C["pi context hook, before every LLM call"]
  C --> D{"any single message > 50% of window?"}
  D -->|"before #3852"| E["built-in compaction can't split one message"] --> F["413 prompt too long"]
  D -->|"after"| G["clamp it: keep recent turns + the question"] --> H["compaction handles the rest, no 413"]
Loading

This replaces an earlier attempt that bounded the block on the frontend with a character budget. Per review that was the wrong layer (it bypassed pi's compaction and was a blunt char clamp), so it is reverted: standalone-chat.tsx is back to baseline and the chat-context-budget.ts helper is removed.

2. Clear-cache button label (closes #3589)

The card title, description, and confirm dialog all say "clear cache"; the button said "scan". Changed it to "clear".

Verification

  • bunx vitest run (22 tests, all green) drive the real shipped extension handlers, imported via a new @screenpipe-ext vitest alias so there is no ported copy to drift: the pure helpers (including a throwing getContextUsage), the context handler end to end (empty/short no-op, the ~800k-char [bug] Chat fails with 413 "prompt is too long" (>200k tokens) instead of trimming context to the model window #3852 repro bounded under the window with the question preserved, small-window models, old-tool-result pruning still intact, combined cases), and the tool_result handler.
  • next build compiles and type-checks clean (the one warning is a pre-existing unpdf dependency note, unrelated).
  • Honest limitation. The wdio e2e suite mocks pi (no real subprocess), so it cannot exercise this extension. The handler-level tests above are the coverage. e2e still covers the frontend <conversation_history> injection contract (Bug: Chat Suddenly Loses Conversation Context #3636), which the revert leaves unchanged.

Scope

Other live issues from the triage (transcription backlog / proxy resets #3850, model 404s #3786, timezone scheduling #3851) need staging or manual verification and are tracked separately.

🤖 Generated with Claude Code

Louis Beaumont and others added 3 commits June 5, 2026 09:53
…window (#3852)

Chat injected the last 40 turns into every prompt (the #3636 always-inject
contract) with no size budget, and never clamped individual turn text. A long
chat, a huge pasted message, or a big tool result could push the assembled
prompt past the model's context window and hard-fail with
`413 ... prompt is too long: 206134 tokens > 200000 maximum` instead of
degrading gracefully.

Extract the (previously duplicated) history-block assembly into a single,
unit-tested helper that budgets the injected history in characters against the
preset's existing `maxContextChars` setting: it drops the oldest turns first
and clamps any single oversized turn, keeping the most recent context. Both
the queued path and the main send path now route through it.

The #3636 contract is preserved (recent history is always injected when
present); only its size is now bounded. Verified by 13 unit tests including a
repro of the ~800k-char history that produced the 413.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The Clear Cache card, its description, and the confirm dialog all say
"clear cache", but the button itself said "scan" — the lone inconsistency
the issue reports. The button is the sole entry point to the clear flow
(it scans to build the deletion preview, then the dialog confirms and
deletes), so "clear" matches the standard pattern and the surrounding copy.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…end (#3852)

Moves the #3852 fix out of the frontend and into pi's own context/compaction
layer, where it can be token-accurate and where context management belongs.

Root cause: pi's built-in compaction summarizes ACROSS messages but cuts at
message boundaries, so it can never shrink a SINGLE message that is itself
larger than the context window. The chat re-injects recent history as one big
<conversation_history> user message every send (#3636); on a long chat that
single message overflows and hard-fails with `413 prompt is too long`, and the
overflow-retry cannot recover.

The context-pruning extension (already installed in every chat + pipe session)
now clamps any single oversized message to <=50% of the model's real
ctx.model.contextWindow, inside its `context` hook (pi's transformContext, run
before every LLM call; returning {messages} replaces what is sent). For the
injected history block it drops the oldest turns and always preserves the
user's actual message; generic huge payloads keep head + tail. This complements
built-in compaction instead of duplicating it.

Reverts the earlier frontend char-budget (standalone-chat.tsx back to baseline;
deletes lib/chat-context-budget.ts and its test) so context management lives in
one place.

Tests: 22 cases drive the real shipped extension handlers (imported via a new
@screenpipe-ext vitest alias, so no ported copy can drift) across edge cases —
the ~800k-char #3852 repro bounded under the window with the question intact,
small-window models, old-tool-result pruning, and tool_result feedback. The
wdio e2e suite mocks pi (no subprocess) so it cannot exercise the extension;
these handler tests are the coverage. next build compiles cleanly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@louis030195 louis030195 changed the title fix(chat): budget injected conversation history to the context window (#3852) + clear-cache label (#3589) fix: bound chat context in pi's compaction layer (#3852) + clear-cache label (#3589) Jun 5, 2026
@louis030195

Copy link
Copy Markdown
Collaborator Author

@Anshgrover23 @divanshu-go anyone can test the compaction works and is as good UX as codex/claude across pipes and chat?

@louis030195 louis030195 merged commit e128ba1 into main Jun 7, 2026
18 checks passed
louis030195 pushed a commit that referenced this pull request Jun 7, 2026
Strengthens the unit suites for the three Tier-1 fixes just merged:

- media-file-path (#3845): malformed percent-escapes, no-extension /
  empty / backtick inputs, case-insensitive ext, Windows forward-slash
  path in text, Unix audio-chunk path with spaces+parens; audio/media
  classifier casing; already-wrapped markdown link, > escaping, Windows
  path in a markdown link.
- context-pruning (#3854): malformed <conversation_history> blocks
  (unterminated, close-before-open) fall back to generic head+tail;
  multiple oversized text blocks clamped while non-text blocks untouched;
  non-string/non-array message content ignored; zero window from
  getContextUsage falls back to default.
- notification-toggle (#3794): notify rule detected across a blank line
  in the deny block; enable keeps permissions+allow children when only
  the notify deny rule existed; deny block retained when a non-notify
  rule remains.

Test-only. Full vitest suite green (486 tests). These functions are pure
modules; the wdio e2e suite mocks pi and drives whole-app UI, so the
deterministic edge-case layer for them is vitest (the context-pruning
suite is already documented as the extension's real e2e coverage).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant