Skip to content

fix(context): live mid-turn context bar + visible auto-compression status#126

Merged
OmarB97 merged 2 commits into
mainfrom
fix/live-context-estimate
Jun 9, 2026
Merged

fix(context): live mid-turn context bar + visible auto-compression status#126
OmarB97 merged 2 commits into
mainfrom
fix/live-context-estimate

Conversation

@OmarB97

@OmarB97 OmarB97 commented Jun 9, 2026

Copy link
Copy Markdown
Owner

Why

Two halves of the "live compression / streaming updates mid-session" bug. (1) The desktop context bar froze during long tool batches: tool.start/complete events carried context_compressor.last_prompt_tokens, a snapshot from the previous API call that never moves while tool results accumulate, so the bar only jumped at the next API call. (2) Auto-compression was invisible in the desktop app: the agent and gateway already emit status.update lifecycle events for preflight compaction, but the desktop event handler dropped them on the floor — sessions just looked hung for the duration.

What changed

  • agent/context_compressor.py: track last_prompt_messages_len alongside last_prompt_tokens; new live_context_tokens(messages) returns the provider-exact base plus a rough estimate of only the messages appended since the snapshot, with max(base, rough) fallback and a post-compression guard that prefers the fresh rough estimate while awaiting_real_usage_after_compression.
  • agent/conversation_loop.py: record the messages-length snapshot at every last_prompt_tokens write site (real-usage update + both preflight seeds); emit a "📦 Compression complete: ~X → ~Y tokens." lifecycle status after the preflight compaction loop.
  • agent/tool_executor.py: the four copy-pasted usage-dict blocks collapse into one _context_usage_for_tool_events() helper that uses the live estimate (net −60 lines).
  • apps/desktop: new $sessionActivityStatus atom + status.update handling in use-message-stream.ts (kinds lifecycle/compressing/process/status; ready and any stream activity clear it); statusbar renders the transient status with a spinner; GatewayEventPayload gains kind.
  • cli.py: stale standalone-CLI compression threshold default 0.50 → 0.85, matching hermes_cli/config.py:1125.

How to review

  1. agent/context_compressor.pylive_context_tokens (the core math) and the update_from_response signature change.
  2. agent/tool_executor.py — confirm the helper is behavior-identical to the four removed blocks apart from the live estimate.
  3. agent/conversation_loop.py — the three snapshot writes and the compression-complete emit.
  4. Desktop: store/session.tsuse-message-stream.tsuse-statusbar-items.tsx in that order; the new tests in use-message-stream.test.tsx document the lifecycle.

Evidence

  • New tests pin the previously-broken behavior: tests/agent/test_live_context_estimate.py::TestLiveContextTokens::test_prices_tail_appended_since_snapshot and test_grows_monotonically_as_results_append fail on main's frozen-snapshot behavior by construction.
  • Desktop: use-message-stream.test.tsx "status.update events" suite (3 tests) covers surface/ignore/clear.

Verification

  • tests/agent/test_live_context_estimate.py — 9 passed.
  • tests/agent/test_turn_context.py test_compression_logging_session_context.py test_context_engine.py — 26 passed; tests/tui_gateway/test_protocol.py — 59 passed; tests/agent/test_tool_dispatch_helpers.py test_tool_guardrails.py test_display_tool_failure.py — 64 passed.
  • Desktop vitest on use-message-stream.test.tsx — 5 passed (3 new). Full desktop suite: 444 passed / 12 failed — the 12 failures and 22 tsc -b errors are byte-identical on clean origin/main (verified via stash baseline; pre-existing yolo-session store exports missing, tracked separately).
  • tests/agent/test_context_compressor.py hangs in this sandbox on a network-bound OpenRouter live-fetch test — reproduced identically on clean origin/main with changes stashed, environmental not regression.

Risks / gaps

  • Rough tail estimate (~4 chars/token) can over/undercount vs provider tokenizers — low risk, it only feeds the display and a real update_from_response corrects it at the next API call.
  • Pre-existing fork-main desktop breakage (missing $desktopYoloDefault store exports → 22 tsc errors, 12 vitest failures) is untouched here; task filed/linked in the mesh backlog (hermes-desktop-yolo-store-exports).
  • codex_runtime.py calls update_from_response without messages_len and keeps fallback behavior — accepted scope, the codex app-server runtime tracks context via its own turn metadata.

Collaborators

  • @OmarB97 (operator)
  • Claude Fable 5 (Claude Code)

… desktop statusbar

The context bar froze during long tool batches: tool.start/complete events
carried context_compressor.last_prompt_tokens, a snapshot from the previous
API call that never moves while tool results pile up mid-turn.

- agent/context_compressor.py: track last_prompt_messages_len alongside
  last_prompt_tokens; new live_context_tokens(messages) prices the tail
  appended since the snapshot on top of the provider-exact base, falls back
  to max(base, rough) and prefers the rough estimate right after compression
- agent/conversation_loop.py: record the snapshot at every
  last_prompt_tokens write (real usage + both preflight seeds); emit a
  'Compression complete' lifecycle status after preflight compaction
- agent/tool_executor.py: extract the four copy-pasted usage blocks into
  _context_usage_for_tool_events() which uses the live estimate
- apps/desktop: handle status.update gateway events (previously dropped on
  the floor) in a new $sessionActivityStatus atom, cleared by the next
  stream activity; statusbar shows the transient status (auto-compression
  progress finally visible mid-session); GatewayEventPayload gains 'kind'
- cli.py: stale standalone-CLI compression threshold default 0.50 -> 0.85,
  matching hermes_cli/config.py
- tests: 9 new Python tests (live estimate math, snapshot bookkeeping,
  tool-event payloads), 3 new desktop tests (status.update lifecycle)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

🔎 Lint report: fix/live-context-estimate vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 10543 on HEAD, 10542 on base (🆕 +1)

🆕 New issues (1):

Rule Count
unresolved-import 1
First entries
tests/agent/test_live_context_estimate.py:11: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`

✅ Fixed issues: none

Unchanged: 5536 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

…s.io in AUTHOR_MAP

Commit 4351acb raised the default compression threshold to 0.85 but left
the two constructor-default tests asserting 0.50, red on main since June 8.
Update them to the live default and add a dedicated small-context case so
the 64K floor stays exercised. Also map omar@kostudios.io -> OmarB97 so
check-attribution passes on direct (non-merge) commits.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@OmarB97 OmarB97 merged commit a8d7e0b into main Jun 9, 2026
16 of 22 checks passed
OmarB97 pushed a commit that referenced this pull request Jun 10, 2026
… fork consolidation; finish fork-feature ports

Per-cluster restoration with the test suite as the oracle, after comparing
the merged tree's failures against a pristine-upstream run in the same
environment (14 file-level deltas, now zero):

- gateway/run.py: upstream wholesale (fork's monolith had undone the mixin
  decomposition; both real fork deltas re-applied — voice_ack_callback
  **kwargs; the custom-providers context-length fix exists upstream).
- agent/conversation_loop.py + turn_context.py: upstream structure with the
  fork features regrafted at their new homes — sender_device attribution
  (#131), preflight token-usage emission + compression-complete status and
  live-estimate snapshots (#126).
- agent/chat_completion_helpers.py: upstream wholesale (brings the second
  partial-stream-stub routing site and the NousResearch#6600 cancellation fix).
- agent/tool_executor.py: usage= kwarg on tool start/complete callbacks now
  falls back to the bare 3-arg form for legacy receivers.
- tools/approval.py: upstream's resolved-HERMES_HOME rewrite + normalize
  steps restored alongside the fork's self-host kill guard (#128).
- hermes_cli/main.py: desktop install-identity stale-build cluster and the
  post-subcommand global-flag hoister ported from fork main.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant