fix: preserve context on compression failures#26051
Draft
Enragedsaturday wants to merge 173 commits into
Draft
Conversation
(cherry picked from commit f139fd79bb1ab40cdbbfd84586451619f9e67512)
15 tasks
added 8 commits
May 15, 2026 19:11
(cherry picked from commit f139fd79bb1ab40cdbbfd84586451619f9e67512) (cherry picked from commit d803138)
Keep Galt gateway coordination fixes, Discord progress helpers, FTD/Hindsight planning artifacts, and regression coverage before upstream integration.
…26-05-21 # Conflicts: # agent/codex_responses_adapter.py
(cherry picked from commit f139fd79bb1ab40cdbbfd84586451619f9e67512) (cherry picked from commit d803138)
(cherry picked from commit 9b5136b)
Keep Galt gateway coordination fixes, Discord progress helpers, FTD/Hindsight planning artifacts, and regression coverage before upstream integration. (cherry picked from commit ea4489e)
(cherry picked from commit ccd2b84)
(cherry picked from commit 80445a1)
(cherry picked from commit a7e85db)
(cherry picked from commit f3c70a5)
Add an official, production-grade WhatsApp integration via Meta's Business Cloud API as a complement to the existing Baileys bridge. No bridge subprocess, no QR codes, no account-ban risk — at the cost of a Meta Business account and a public HTTPS webhook URL. Setup is fully wizard-driven: 'hermes whatsapp-cloud' walks through every credential with paste-time validation (catches the NousResearch#1 trap of pasting a phone number into the Phone Number ID field), generates a verify token, and ends with copy-paste instructions for the cloudflared / Meta-dashboard / Business Manager pieces that can't be automated. The wizard also points users at Meta's Business Manager for setting the bot's display name and profile picture. Feature set: - Inbound: text, images (with native-vision routing), voice notes (STT), documents (small text inlined, larger cached), reply context. - Outbound: text with WhatsApp-flavored markdown conversion, images, videos, documents, opus voice notes via ffmpeg with MP3 fallback. - Native interactive buttons for clarify, dangerous-command approval, and slash-command confirmation flows — matches the Telegram / Discord UX, graceful degrades to plain text. - Read receipts (blue double-checkmarks) and typing indicator, using Meta's combined endpoint so they fire in a single API call. - Webhook security: X-Hub-Signature-256 HMAC verification (raw body, constant-time), wamid deduplication, group-shaped-message refusal (groups deferred to v2 — Baileys still covers them). - Full integration with the gateway's session, cron, display-tier, prompt-hint, and auth-allowlist systems. Cloud and Baileys can run side-by-side against different phone numbers. Also wires STT (speech-to-text) through Nous's managed audio gateway for Nous subscribers — previously the default stt.provider=local required a separate faster-whisper install. New subscribers now get voice-note transcription out of the box. Docs: 418-line user guide at website/docs/user-guide/messaging/ whatsapp-cloud.md, sidebar entry, environment-variables reference, ADDING_A_PLATFORM.md updated with the optional interactive-UX contract for future adapter authors. Tests: 100 dedicated tests for the adapter, 32 for the setup wizard, 20 for the Nous subscription STT wiring, plus regression coverage across display_config, prompt_builder, and the cron scheduler. Known limitations (deferred until clear demand signal): - Group chats — use the Baileys bridge if you need them. - Message templates for 24-hour-window outside-conversation sends — reactive chat is unaffected; cron / delegate_task with gaps > 24h will fail with a clear error. The agent's system prompt warns the model about this so it knows to mention it when scheduling delayed messages.
…dalone summaries regardless of role When the compression summary lands as an assistant-role message (head ends with user), the end marker was not appended. Models may regurgitate the summary text as their own visible output when there's no clear boundary signal (NousResearch#33256). The end marker was already appended for user-role summaries (NousResearch#11475, NousResearch#14521) but the assistant-role path was missed in the original fix. This ensures ALL standalone summary messages carry the boundary marker, preventing summary text from leaking into user-visible chat output.
…ip it on rehydration Follow-up to the NousResearch#33346 cherry-pick: - the marker string was duplicated at both insertion sites (standalone + merged-into-tail); hoist to a module constant - _strip_summary_prefix now also strips a trailing end marker so a rehydrated handoff body doesn't leak the boundary directive into the iterative-update summarizer prompt (it is re-appended on insertion)
…-clone-all (NousResearch#45246) --clone-all copied the source profile's state.db, sessions/, backups/, state-snapshots/, and checkpoints/ into the new profile. These are per-profile history: a 49GB copy in practice (15GB snapshots + 11GB backup archives + 16GB state.db + 6.4GB sessions), and restoring a copied backup inside the clone would resurrect the SOURCE profile's state. A clone is a fresh workspace; history stays with the source. New _CLONE_ALL_HISTORY_EXCLUDE_ROOT set, applied at root level for ANY source profile (named profiles accumulate the same artifacts), unlike the default-gated infrastructure excludes. Nested same-name dirs still copy. Docs and the post-create CLI message updated to match; profile export / hermes backup remain the full-history paths.
…summary + label handoffs in WebUI (NousResearch#29824) Two-pronged fix for the WebUI "context compaction block in place of last assistant response" regression. Agent layer (the real fix). ``_find_tail_cut_by_tokens`` already had ``_ensure_last_user_message_in_tail`` to keep the most recent user request out of the compressed middle (NousResearch#10896), but no symmetric anchor for the assistant side. When the conversation has an oversized recent tool result or a long stretch of tool-call/result pairs *after* the assistant's last visible reply, the token-budget walk can stop with the previously-visible reply on the wrong side of ``cut_idx``. The summariser then rolls it into the single ``[CONTEXT COMPACTION — REFERENCE ONLY]`` block persisted as ``role="user"`` or ``role="assistant"``, and from the operator's perspective the WebUI session viewer (``web/src/pages/SessionsPage.tsx``) and the TUI chat panel both suddenly show the opaque "Context compaction" block in the slot where they were just reading the actual answer: User: "i cant see the output of the last message you sent, i did see it previously, however now see 'context compaction'" Added ``_ensure_last_assistant_message_in_tail`` mirror of the user-side anchor. It looks for the most recent assistant message with non-empty text content (skipping tool-call-only assistant "stubs" which the UI renders as small "calling tool X" indicators rather than a readable bubble) and walks ``cut_idx`` back through the standard ``_align_boundary_backward`` so we don't split a tool_call/result group that immediately precedes it. The two anchors are chained — each only walks ``cut_idx`` backward, so the tail can only grow. Falls back to "most recent assistant of any kind" only when no content-bearing reply exists in the compressible region (fresh multi-step tool sequence with no prior reply) — in that case the agent-side fix is effectively a no-op and the existing user-message anchor carries the load. WebUI layer (clarity). Added ``isCompactionMessage`` detector that recognises the ``[CONTEXT COMPACTION — REFERENCE ONLY]`` (current) and ``[CONTEXT SUMMARY]:`` (legacy) prefixes from ``agent/context_compressor.py``, and a new ``compaction`` entry in ``MessageBubble``'s ``ROLE_STYLES`` map. Compaction blocks now render as muted, italicised system-style rows labelled ``Context handoff`` — clearly metadata, not the assistant's actual reply — so an operator scrolling back through a long session can't mistake the summary for a real answer. Keeping the detected prefixes inline (rather than importing them) because the WebUI bundle has no Python interop. A guardrail comment points readers at the source-of-truth constants in ``agent/context_compressor.py``.
…own bubble (NousResearch#29824) The compressor has a "double-collision" fallback path: when the chosen ``summary_role`` collides with the first tail message AND the flipped role would collide with the last head message, it can't emit a standalone summary turn (consecutive same-role messages break Anthropic and friends). It instead prepends the summary + end-of-summary marker to the first tail message's content via ``_merge_summary_into_tail``. With the matching anchor from the previous commit, that first tail message is now usually the user's previously-visible assistant reply — so the persisted assistant turn ends up shaped as ``[CONTEXT COMPACTION ...] ... --- END OF CONTEXT SUMMARY --- ... THE ACTUAL REPLY``. Without splitting it, the session viewer renders one big "Context handoff" bubble and the reply text is buried inside the metadata blob — which is exactly the "can't see the last reply" experience NousResearch#29824 reports, just one layer deeper. Added ``splitCompactionContent`` that detects the merge marker (kept in sync with ``--- END OF CONTEXT SUMMARY — respond to the message below, not the summary above ---`` in ``agent/context_compressor.py``) and ``MessageBubble`` now recurses on the two halves: the prefix half renders as the muted "Context handoff" row, the remainder half renders with the original assistant styling. Pure (non-merged) summary messages hit the no-remainder branch and still render as a single "Context handoff" row, preserving the original behaviour.
…paction rollup (NousResearch#29824) 21 cases pinning the new ``_ensure_last_assistant_message_in_tail`` anchor and its interaction with the existing tail-cut path: * ``TestFindLastAssistantMessageIdx`` — helper contract: prefers a content-bearing assistant message, skips ``tool_calls``-only stubs, multimodal text-block content counts, falls back to "any assistant" when no content-bearing reply exists, honours ``head_end``, returns -1 when there's none. * ``TestEnsureLastAssistantMessageInTail`` — direct: no-op when already in the tail, walks ``cut_idx`` back when the reply is in the compressed middle, never crosses into the head region, re-aligns through a preceding ``tool_call`` / ``tool_result`` group instead of orphaning it. * ``TestFindTailCutByTokensAnchorsAssistant`` — integration: reporter repro (long tool-output run after the visible reply) now preserves the reply; user and assistant anchors compose in a single tail-cut call; a soft-ceiling-overrunning oversized tool result no longer strands the prior reply. * ``TestCompactionRollupReproduction`` — end-to-end through ``compress()`` with a stubbed ``_generate_summary``: the visible reply text survives either as its own standalone assistant message (normal path) or concatenated onto the merged summary tail (double-collision path the WebUI then re-splits). The standalone-summary case is asserted strictly (exactly one summary row, exactly one separate assistant row carrying the reply) — that's the dominant path and any drift there reintroduces the original bug. * ``TestSourceGuardrail`` — static asserts on ``agent/context_compressor.py``: the helper exists, the anchor is wired into ``_find_tail_cut_by_tokens`` AFTER the user-message anchor (so chaining is monotonic), the content-bearing preference is preserved, and the issue number is referenced so future bisects can find this fix.
…te (NousResearch#45247) Profiles created before NousResearch#44792 have no .env. Now that the Channels/Keys endpoints are profile-scoped (no os.environ fallback), those profiles would show everything as unconfigured. hermes update now copies the default install's .env into each named profile that lacks one (0600, never overwrites, placeholder fallback when the root has no .env), so existing users keep the credentials they were effectively running with.
…ton (NousResearch#45263) Strict sticky-bottom autoscroll for the chat thread: while the viewport is parked at the bottom, the tail follows content growth (streaming tokens, late measurement, Shiki re-highlight) via a useLayoutEffect keyed on the virtualizer's own size signal, pinned in the same pre-paint pass as its scrollToFn so the two never rubber-band. The gate is a single boolean — one upward pixel (scroll/wheel/touch) disarms follow until the user returns to the bottom. Adds a floating jump-to-bottom control that appears once scrolled ~10px away (above the dim threshold so a sub-pixel settle never flashes it), positioned above the composer with respect to the status stack, with a subtle scale + slide in/out animation that honours prefers-reduced-motion. The button bridges to the virtualizer's re-arm + pin path through a small nanostore emitter. Supersedes NousResearch#43624.
…fixes Group recents as parent-repo → worktree → sessions using local git metadata (probed over IPC, with a path-name heuristic fallback for remote backends). Single-worktree repos collapse to one level. Sessions order by creation time and never reshuffle on new messages. Also: fuse the status stack to the composer border, restore icon actions in the queue panel, fix sidebar label truncation and drag styling, hide sticky-message attachments while pinned, and bump the terminal font.
Mirror the session row: the repo/worktree header's leading glyph (repo mark, or a new git-branch mark for worktrees) swaps to a grabber on hover/drag instead of carrying a separate handle on the right — freeing header width for the label and + button.
…he cursor Follow-up to the NousResearch#44837 clamp: a min() clamp only fixes cursor overshoot past the new end of the list. When repair_message_sequence drops/merges messages at indexes below the cursor, the clamp leaves the cursor pointing past unflushed rows and the turn-end flush silently skips them. Extract repair_message_sequence_with_cursor(): snapshot the flushed prefix by object identity before repair, then recompute the cursor as the count of surviving flushed messages. Falls back to the clamp when no snapshot is available. Keeps the safety guard in _flush_messages_to_session_db. Adds targeted tests for overshoot, before-cursor compaction, no-repair, bare-agent, and the flush guard.
… rows The repo and worktree header rows were ~identical after the handle move. Fold them into one WorkspaceHeader (emphasis flag for the repo level) plus a small WorkspaceAddButton, so the toggle/handle/count/+ wiring lives in one place.
…NousResearch#38389) Summary messages (standalone insertion and merge-into-tail) now carry a metadata flag so frontends (CLI, Desktop, gateway, TUI) can distinguish them from real assistant/user messages without content-prefix heuristics. Re-applied from PR NousResearch#38434 onto current main (conflicted with the _SUMMARY_END_MARKER hoist). Key renamed from the PR's 'is_compressed_summary' to '_compressed_summary': the wire sanitizers strip underscore-prefixed message keys, so the flag stays in-process and can never reach strict gateways (Fireworks/Mistral/Kimi reject unknown keys with 'Extra inputs are not permitted').
…rderableList Every reorderable surface (repos, worktrees, sessions, pins) now drops in a single ReorderableList that owns its own DndContext, so a drag only ever collides with that list's own items — nesting "just works" without leaking into the lists around or inside it. This replaces the shared DndContext + id-prefix dispatch (parent:/group:) whose closestCenter collisions resolved to a different-typed droppable and silently no-op'd worktree/repo drags. - Delete groupDndId/parentDndId/parse* helpers and the monolithic handleAgentDragEnd/handlePinnedDragEnd; each list persists its new id order via a direct typed write (reorderParents/reorderWorktree/reorderSessions/ reorderPinned). - Sessions inside repos/worktrees are date-ordered and static (no drag), matching the "never reorder on new messages" rule. - Add setPinnedSessionOrder; drop now-unused reorderPinnedSession.
The terminal looked soft/heavy on every platform because the xterm Terminal was built with allowTransparency: true, which drops the WebGL renderer's opaque fast-path and bakes glyphs as grayscale-alpha coverage for compositing over a see-through canvas. Our surface (--ui-bg-chrome) is opaque and withSurface already paints it, so transparency was pure blur for no benefit — VS Code keeps it off too. Also drop the Medium (500) base weight for normal/bold (400/700) to match VS Code's metrics, and remove the now-unused JetBrains Mono Medium face + woff2.
…idebar-workspace-dedup
…w user bubble Streaming auto-follow chased content growth while parked at the bottom, which rubber-banded — the tail pin and the virtualizer's own measurement adjustments fought for scrollTop. Drop it; the one-time new-turn jump already lands a fresh message in view and the viewport stays put after. Attachments rendered inside the editable user bubble and were collapsed via an IntersectionObserver + [data-stuck] CSS hack while the bubble was pinned. Render them as a flow sibling BELOW the sticky bubble instead, so they scroll away behind it naturally — no observer, no collapse. Image refs still render as thumbnails, file refs as chips; no border. Removes the now-unused useStuckToTop hook and its CSS.
…rkspace-dedup feat(desktop): worktree-aware sidebar grouping + composer/sidebar UX fixes
During a token stream $messages is replaced ~30x/s. Subscribing the whole chat view to it re-rendered the composer, runtime boundary, and every message on every delta. - Derive coarse facts (empty thread? tail is user?) via nanostores `computed` atoms so per-token flushes don't re-render their consumers. - Move the $messages subscription + runtime wiring into a dedicated ChatRuntimeBoundary; the composer reads $messages imperatively. - Drive message rows off stable useAuiState selectors and a lazy getMessageText getter instead of eagerly materialized text. - Feed ResizeObserver entry sizes into measureClamp / FadeText and dedupe the style writes, killing the read-write-read reflow cascade.
Re-parsing the full message markdown every reveal frame is O(N^2) over a long answer and dominated stream CPU. - Throttle useSmoothReveal commits to ~1 frame (REVEAL_MIN_COMMIT_MS). - Memoize block parsing with an LRU keyed on source text so only changed blocks re-parse. - Replace Streamdown's full-text parseIncompleteMarkdown with a tail-bounded remend: scan to the last top-level boundary outside fences/math and repair only the trailing open block. New remend-tail.ts is proven render-equivalent to full remend at every streaming prefix (remend-tail.test.ts), minus an intentional, documented divergence on cross-block dangling openers.
- Resume: fire the REST transcript prefetch and the session.resume RPC in parallel, and skip the redundant message conversion + reconciliation when the prefetch already hydrated the transcript. - Haptics: web-haptics builds its AudioContext lazily on first trigger, paying the ~850ms CoreAudio spin-up on the first streamStart haptic as the first token paints. Open/close a throwaway context at idle so the real one connects to an already-warm audio service.
Adding remend changed package-lock.json, so the flake's pinned npm deps hash went stale and `nix flake check` failed. Bump it to match.
* perf(desktop): isolate streaming re-renders & cut layout thrash During a token stream $messages is replaced ~30x/s. Subscribing the whole chat view to it re-rendered the composer, runtime boundary, and every message on every delta. - Derive coarse facts (empty thread? tail is user?) via nanostores `computed` atoms so per-token flushes don't re-render their consumers. - Move the $messages subscription + runtime wiring into a dedicated ChatRuntimeBoundary; the composer reads $messages imperatively. - Drive message rows off stable useAuiState selectors and a lazy getMessageText getter instead of eagerly materialized text. - Feed ResizeObserver entry sizes into measureClamp / FadeText and dedupe the style writes, killing the read-write-read reflow cascade. * perf(desktop): incremental markdown rendering during streams Re-parsing the full message markdown every reveal frame is O(N^2) over a long answer and dominated stream CPU. - Throttle useSmoothReveal commits to ~1 frame (REVEAL_MIN_COMMIT_MS). - Memoize block parsing with an LRU keyed on source text so only changed blocks re-parse. - Replace Streamdown's full-text parseIncompleteMarkdown with a tail-bounded remend: scan to the last top-level boundary outside fences/math and repair only the trailing open block. New remend-tail.ts is proven render-equivalent to full remend at every streaming prefix (remend-tail.test.ts), minus an intentional, documented divergence on cross-block dangling openers. * perf(desktop): faster session resume & warm AudioContext at idle - Resume: fire the REST transcript prefetch and the session.resume RPC in parallel, and skip the redundant message conversion + reconciliation when the prefetch already hydrated the transcript. - Haptics: web-haptics builds its AudioContext lazily on first trigger, paying the ~850ms CoreAudio spin-up on the first streamStart haptic as the first token paints. Open/close a throwaway context at idle so the real one connects to an already-warm audio service.
…re (NousResearch#45354) The diffusion placeholder read `--dt-*` tokens via `getComputedStyle().getPropertyValue()`, but those resolve through `var()` chains into `color-mix(in srgb, …)` — returned verbatim and unparseable, so every token fell to a hardcoded light fallback (white card). In dark mode the placeholder rendered as a white square. Resolve each token through a throwaway probe element's `color` so the browser computes it to a concrete color, and teach `parseColor` Chromium's `color(srgb r g b / a)` serialization. Re-resolve on theme repaint via a MutationObserver rather than per animation frame.
# Conflicts: # .github/workflows/tests.yml # agent/context_compressor.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
openai-codexand preserve replayablecompaction_summaryitems.Test plan
python -m pytest tests/agent/test_auxiliary_client.py tests/agent/test_context_compressor.py tests/run_agent/test_run_agent_codex_responses.py -q→ 300 passed locally.Notes