Skip to content

Bugfix rollup (2026-06-10 session): cron, state DB, agent loop, MCP bridge, gateway, desktop, Windows#44061

Open
AIalliAI wants to merge 82 commits into
NousResearch:mainfrom
AIalliAI:bugfixes/claude-2026-06-10
Open

Bugfix rollup (2026-06-10 session): cron, state DB, agent loop, MCP bridge, gateway, desktop, Windows#44061
AIalliAI wants to merge 82 commits into
NousResearch:mainfrom
AIalliAI:bugfixes/claude-2026-06-10

Conversation

@AIalliAI

@AIalliAI AIalliAI commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Rollup of the bugfixes from my 2026-06-10/11 debugging session — ~35 fixes spanning the cron scheduler, state DB, agent loop, MCP bridge, runners, gateway, CLI, Windows support, and the Desktop app.

Most fixes are also submitted as focused standalone PRs (cross-referenced below) so they can be reviewed and landed independently. This branch exists for anyone who wants the whole batch at once; if the standalone PRs land first, the corresponding commits here become no-ops and I'll keep the branch rebased. Merged with current main (post-#43956 per-job-profile revert) and conflict-free.

Related Issue

Addresses #44030, #44035, #44100, #44116, #44117, #44119, #44135, #44150 (each carries a Fixes tag in its standalone PR).

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

Cron / scheduling

  • _jobs_file_lock now covers create_job/update_job/remove_job (lost updates vs scheduler threads); malformed next_run_at no longer crashes every tick; repeat-limit auto-delete no longer leaks output dirs
  • _EnvMutationGate: workdir jobs are excluded from overlapping parallel-pool jobs that read the same process-global state
  • [SILENT] marker only suppresses delivery when leading/trailing, not mid-report; cron hint no longer mislabeled as user instruction in the skills path
  • HERMES_CRON_SESSION is now a per-job contextvar instead of a sticky process-wide env var that flipped approval semantics for later interactive sessions
  • Desktop ticker defers to the gateway scheduler owner (fix(cron): defer desktop ticker to gateway scheduler owner #44049)

State DB / persistence

Agent loop / providers

MCP bridge / runners

  • WAL-mode commits no longer stall the event bridge; incremental tail fetch instead of full-transcript rehydration per tick; mini_swe_runner reasoning capture, max-iterations warning, default model id; batch_runner exit codes; trajectory_compressor connection-pool leak

Gateway / CLI / Desktop / Windows

How to Test

  1. pytest tests/ -q
  2. Targeted suites: pytest tests/cron tests/hermes_cli tests/gateway tests/tools -q
  3. Each standalone PR linked above has its own focused repro/verification steps.

Checklist

Code

  • I've read the Contributing Guide
  • I searched for existing PRs — overlapping standalone PRs are intentionally cross-referenced above
  • My PR contains only changes related to this fix/feature (bugfix-session rollup; no unrelated commits)
  • I've run pytest tests/ -q — 6078 passed; the 34 failures in tests/tools/ fail identically on current main on this machine (environment-specific: PulseAudio/Docker/file-tool platform assumptions)
  • I've added tests for my changes
  • I've tested on my platform: macOS (Apple silicon)

Documentation & Housekeeping

  • Relevant docs/docstrings updated where behavior changed — or N/A
  • No config keys added/changed — N/A
  • No architecture/workflow changes — N/A
  • Cross-platform impact considered (includes Windows console-flash fixes)
  • Tool descriptions/schemas — N/A

Aðalsteinn Helgason and others added 2 commits June 11, 2026 00:11
Correctness:
- cron/scheduler: [SILENT] marker matched anywhere in a report suppressed
  legitimate deliveries; now only leading/trailing markers suppress.
- cron/scheduler: cron hint was mislabeled as user instruction in the
  skills path (dead `if prompt:` condition).
- cron/scheduler + tools/approval + gateway/session_context:
  HERMES_CRON_SESSION was a process-wide env var set by the first cron job
  and never cleared, flipping approval semantics for all subsequent
  interactive gateway sessions in the same process. Now a per-job
  contextvar with env fallback for standalone schedulers.
- cron/jobs: create_job/update_job/remove_job bypassed _jobs_file_lock
  (lost updates against scheduler threads); one malformed next_run_at
  crashed every tick; repeat-limit auto-delete leaked output dirs.
- hermes_state: session_count missed the _branched_from clause
  (pagination never settles); three readers ran unlocked on the shared
  connection (dirty reads); soft-deleted (rewound) rows leaked into
  search context/anchored views; rewind used a SELECT+IN-list that breaks
  past SQLITE_MAX_VARIABLE_NUMBER.
- conversation_loop: genuine Nous 429 skipped the fallback guard entirely
  (retry_count = max_retries made the while condition false), so the
  fallback chain never ran.
- context_compressor: multimodal content was summarized as a Python repr
  full of base64, dropping the user's actual text.
- tool_executor: concurrent-tool heartbeat reported wrong tool names
  whenever an earlier call in the batch was blocked.
- mcp_serve: WAL-mode commits don't touch state.db mtime, so the event
  bridge missed new messages until a checkpoint; sessions.json change
  detection was a dead condition.
- mini_swe_runner: reasoning was never captured into trajectories
  (<think> path was dead code); false max-iterations warning; default
  model id unusable on the default OpenRouter route.
- batch_runner: fatal errors exited 0 under python-fire; mojibake.
- trajectory_compressor: per-request AsyncOpenAI client leaked connection
  pools; max_retries=0 returned None into trajectories; empty input
  crashed sampling.
- providers: a non-ImportError in one legacy module silently aborted
  discovery of all remaining providers.
- hermes_logging: setup_logging(force=True) couldn't change handler
  level/rotation; run_agent: raw print in cross-thread interrupt().

Efficiency:
- prompt_caching: full deepcopy of the entire history (incl. base64
  images) per API call -> copy only the <=4 marked messages.
- mcp_serve: incremental tail fetch via get_messages(after_id=...) and
  deque event queue instead of full-transcript rehydration per 200ms tick.
- model_tools: tool-defs cache eviction was FIFO mislabeled as LRU.
- hermes_state: rewind/restore now single indexed UPDATEs.

Tests: updated EventBridge mocks for after_id, deque assert, and
per-loop async-client caching semantics.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The Desktop dashboard cron ticker shared only cron/.tick.lock with the
launchd gateway ticker, which gives at-most-once execution but not
deterministic execution provenance: whichever ticker won the lock ran
the job. On macOS, TCC / Full Disk Access depends on process ancestry,
so jobs that won under the dashboard backend lost access to protected
local data.

tick() now accepts defer_to_gateway_owner; the desktop ticker passes
True and skips execution while a gateway holds the per-profile runtime
lock (gateway.status.is_gateway_runtime_lock_active). With no gateway
running, desktop-only setups keep firing jobs as before.

Also includes the env-mutation gate: a reader-writer gate so workdir/
profile jobs (which mutate os.environ / Hermes home) never overlap
parallel-pool jobs reading that same process-global state mid-run.

Tested on macOS: scripts/run_tests.sh over tests/cron, gateway status,
and web_server suites (252 tests passing).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@alt-glitch alt-glitch added invalid This doesn't seem right comp/agent Core agent loop, run_agent.py, prompt builder labels Jun 11, 2026
Adalsteinn Helgason and others added 14 commits June 11, 2026 07:53
… probes

The terminal tool's synchronous probe spawns (sudo NOPASSWD check, docker
and apptainer/singularity availability checks) called subprocess.run
without creationflags, so on Windows each one briefly pops a console
window that also steals focus. The main command-execution path already
passes windows_hide_flags() (tools/environments/local.py), but these
probe sites were missed.

Pass creationflags=windows_hide_flags() at each site. The helper returns
0 on non-Windows, so POSIX behavior is unchanged (creationflags=0 is the
subprocess default).

Fixes NousResearch#43848

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ders

The Desktop model picker gates its Thinking toggle and effort dial on
the capabilities map from build_models_payload(capabilities=True), but
_apply_capabilities() answered a different question — "does this model
reason?" (models.dev) instead of "can Hermes steer its reasoning?".

On the direct DeepSeek route the runtime never sends reasoning controls
(no extra_body.reasoning / reasoning_effort / thinking toggle — see
run_agent._supports_reasoning_extra_body); thinking mode is fixed by
model choice (deepseek-chat vs deepseek-reasoner / deepseek-v4-flash).
models.dev still catalogs those models as reasoning-capable, so the
picker offered a placebo toggle and effort dial.

Report reasoning=False for such routes so the Desktop submenu hides the
controls (its existing gating), and skip the effort meta label on the
current-model row for the same reason.

Fixes NousResearch#44030

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…rop from the transcript

Gateway delivered assistant responses to the platform but never persisted
them to the session DB, so the model saw consecutive "unanswered" user
messages and re-answered all of them on the next turn (NousResearch#44100).

Two layers, one invariant — a delivered final_response must end up in the
session transcript:

1. agent: the partial-stream recovery path (final message empty/thinking-
   only but content already streamed to the user) set final_response and
   broke out of the loop WITHOUT appending an assistant message. The
   turn-end _persist_session then wrote no assistant row — only the user
   message (persisted by the turn-start crash-resilience flush) survived.
   Append the recovered text as a real assistant turn before breaking.

2. gateway: state.db is the canonical transcript store (spec 002), so
   append_to_transcript(..., skip_db=True) is a complete no-op — the
   gateway's "fallback" writes could never backfill anything. When a
   turn's new messages contain no assistant text but a response was
   delivered, write the assistant row with skip_db=False. A response
   generated this turn cannot already be in the loaded history, so the
   NousResearch#860/NousResearch#42039 duplicate-write protection (which concerns the user entry
   and agent-flushed messages) is preserved — covered by regression
   tests.

Fixes NousResearch#44100

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…s status

show_status() read platform_registry.plugin_entries() without ever
running plugin discovery on the 'hermes status' code path, so the
registry was always empty and plugin-registered messaging platforms
were silently omitted. Apply the same idempotent discover_plugins()
pattern used by the other registry consumers (hermes_cli/gateway.py,
gateway/config.py, cron/scheduler.py).

Fixes NousResearch#44119

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
_runtime_model_config persisted the live agent's RESOLVED provider into
the session row's model_config JSON. For any named providers:/
custom_providers: entry, agent.provider is the literal string "custom",
so the entry name was lost (and the api_key is deliberately never
persisted). On session.resume or _reset_session_agent the stored
provider="custom" fed resolve_runtime_provider(requested="custom"),
which cannot match a named entry — the rebuild either raised "No LLM
provider configured" or silently resolved placeholder credentials
against the patched-back base_url.

Persist the REQUESTED/entry identity instead: a new reverse lookup
find_custom_provider_identity(base_url) maps the endpoint URL back to
the canonical custom:<name> menu key. _runtime_model_config stores that
key; _make_agent performs the same recovery for rows persisted before
the fix, falling back to passing the stored base_url as
explicit_base_url so the direct-alias branch still targets the
session's endpoint when no entry matches.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
When the gateway runs without a console on Windows (Desktop app, service),
every console child process (git probes, rg, hooks, pip, taskkill, backend
version probes) opens a visible conhost window that flickers and steals
focus on each agent call.

Pass creationflags=windows_hide_flags() (CREATE_NO_WINDOW; 0 on POSIX) to
the subprocess spawn sites reachable during an agent turn, following the
existing convention in tools/environments/local.py, tools/process_registry.py
and cron/scheduler.py.

Fixes NousResearch#43848

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The desktop sidebar lists every profile's sessions straight from each
profile's on-disk state.db (GET /api/profiles/sessions), but
deleteSession() only passed the owning profile to Electron's backend
router — never to the server. When the serving process is scoped to a
different profile than the desktop believes (the sticky active_profile
file, flipped by "Set as active" on the Profiles page, is honored on a
legacy launch with no --profile flag), the DELETE looks up the session
in the wrong profile's state.db and 404s with "Session not found".

Reads already work because getSessionMessages() passes ?profile=, and
rename works because it puts profile in the PATCH body — delete and
archive were the odd ones out. Align them:

- deleteSession(): append ?profile= to the path (the endpoint already
  accepts it), keeping request.profile for Electron's process routing.
- setSessionArchived(): include profile in the PATCH body, matching
  renameSession() (the endpoint already reads body.profile).
- delete_session_endpoint: resolve the session id via
  resolve_session_id() first, for parity with its GET/PATCH siblings.

Fixes NousResearch#44117

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…on host installs

The managed-files policy treated every non-local dashboard request as a
hosted (Docker) deployment and locked the Files tab to /opt/data. On a
plain host install reached over the LAN, that path doesn't exist (or
isn't readable by the dashboard user), so the Files tab returned a 500:

    {"detail": "Managed files root is unavailable: [Errno 13] Permission denied: '/opt/data'"}

The hosted layout is already detected directly via HERMES_HOME resolving
to /opt/data, so use only that signal to apply the /opt/data lock.
Remote and auth-gated requests on host installs keep a locked root —
the dashboard user's home — preserving the previous confinement posture
without pointing at a Docker-only path.

Fixes NousResearch#44116

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…he composer

A missed compositionend (focus jump, input-source switch, programmatic DOM
swap mid-preedit) left composingRef stuck true, and the stuck flag silently
swallowed every Enter in handleEditorKeyDown and every Send-button submit via
the form onSubmit guard — no error, no RPC, until the composer remounted. For
CJK IME users (where even ASCII typing runs through composition) this read as
"Enter has no effect; messages cannot be sent", degrading composer instance
by composer instance.

Recover in two places, both grounded in invariants Chromium guarantees:

- keydown: every keydown during a genuine composition carries
  isComposing=true, so when the native flag says we're not composing, clear
  the stale ref before the guard reads it.
- blur: a composition never survives focus loss, so clear the flag
  unconditionally — this is what unblocks the Send button path, which has no
  native composition flag to consult.

The genuine-IME protection (NousResearch#37483 class) is untouched: Enter with
isComposing=true is still swallowed.

Fixes NousResearch#44135

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…the subcommand

The stale-dashboard scan in _find_stale_dashboard_pids() matched fixed
substrings ("hermes dashboard", "hermes_cli.main dashboard", ...), so any
invocation with global options between the entrypoint and the subcommand —
e.g. `python -m hermes_cli.main --profile work dashboard --port 9119` —
was invisible to `hermes dashboard --status`, `--stop`, and the
post-update stale-backend cleanup. After `hermes update`, a
profile-scoped dashboard kept serving the old Python backend against the
new JS bundle.

Replace the substring patterns with a tokenized matcher that finds the
hermes entrypoint (binary, -m module, or script path) and then walks
known top-level flags — introspected from the real parser, the same way
hermes_cli.relaunch builds its inherited-flag table — until it hits the
subcommand. Unknown flags and free-text arguments bail out, so cmdlines
that merely mention "dashboard" (`hermes -z "fix my dashboard"`, which
the old substring match would have killed) are never matched.

Fixes NousResearch#44035

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Arabic/Hebrew/Persian/Urdu text in the desktop chat rendered LTR,
scattering mixed RTL/LTR conversations. Resolve base direction per
paragraph from the first strong character (unicode-bidi: plaintext)
across assistant prose, user bubbles, and both composers, so RTL
paragraphs read and align right-to-left while English ones stay LTR.

Inline code and KaTeX output are pinned as isolated LTR runs so paths,
flags, and commands inside an RTL sentence keep their internal order;
fenced code blocks keep the document's LTR direction untouched.

Fixes NousResearch#44150

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Resolves conflicts in cron/jobs.py and cron/scheduler.py against the
per-job profile revert (NousResearch#43956): keeps the _jobs_file_lock CRUD locking,
the _EnvMutationGate cross-pool exclusion, and the gateway scheduler-owner
deferral from this branch, while dropping all per-job profile support to
match upstream.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…cess call

_ensure_docker_available now passes creationflags=windows_hide_flags()
(0 on POSIX) so the Windows console-flash fix is covered by the exact
kwargs assertion instead of failing it.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@AIalliAI AIalliAI changed the title Bugfixes/claude 2026 06 10 Bugfix rollup (2026-06-10 session): cron, state DB, agent loop, MCP bridge, gateway, desktop, Windows Jun 11, 2026
Adalsteinn Helgason and others added 11 commits June 11, 2026 10:17
The 'uses widthOverride from the store when set' test renders its pane
without the resizable prop, but trackForPane only applies a stored
widthOverride to resizable panes — overrides are written exclusively by
the drag-resize handler, which non-resizable panes never render. The
test has expected 320px but received the declared 240px since both it
and the gating landed in the same commit (NousResearch#20059).

Add resizable to the pane so the override applies, and add an inverse
test pinning the intended behavior: a non-resizable pane keeps its
declared width even when a stale override exists in the store.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… in awake time (NousResearch#44183)

threading.Timer's wait elapses in wall-clock time on macOS (no
pthread_condattr_setclock), so a system sleep longer than the 20s
WS-orphan-reap grace made the timer fire at the instant of wake —
before Hermes Desktop's reconnect or session.resume could re-bind a
transport. Every >20s lid-close therefore 404'd the open chat.

Record a time.monotonic() deadline at schedule time (monotonic does not
advance while the host sleeps, so it measures awake time) and, when the
timer fires early relative to it, re-arm for the remainder instead of
reaping. The Desktop now gets the full grace of awake time after wake
to reconnect; genuinely orphaned sessions (browser refresh, NousResearch#38591) are
still reaped after 20s as before.

Fixes NousResearch#44183

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…der base_url

The Anthropic SDK appends /v1/messages to base_url itself, so a custom
provider configured with api_mode: anthropic_messages and a gateway URL
like http://host:3001/v1 sent requests to /v1/v1/messages — a 404 that
surfaced only as an opaque auth/model-detection failure.

The opencode-zen/go and azure-foundry resolvers already normalise this
exact case; this extends the same strip to the three custom-provider
return paths (named custom_providers/providers entries, the custom
credential-pool path, and bare provider: custom with model.api_mode),
so entries sharing a multi-protocol gateway can all use the same /v1
base_url.

Fixes NousResearch#44181

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ding

The /resume inline list (HermesCLI._show_recent_sessions) and the
`hermes sessions list` table (cmd_sessions) pad cells with str.format
width specs and truncate with str slices, both of which count
characters rather than terminal columns. CJK characters are
double-width, so any title or preview containing them shifted every
column to its right.

Add _disp_width/_clip_to_width/_pad_to_width helpers to
hermes_cli/main.py — the same wcwidth-based approach (and -1-clamping
convention) agent/markdown_tables.py already uses for this problem —
and use them in both renderers. wcwidth is already a transitive
guarantee via prompt_toolkit.

Output is byte-identical for pure-ASCII rows, and titles in the
/resume table still render unclipped (NousResearch#14082); only the preview is
clipped there, by columns instead of characters.

Fixes NousResearch#44199

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ions

When a background process started with notify_on_complete=true finishes,
the desktop app now fires a native OS notification (Windows toast /
macOS notification center) so the user doesn't have to keep the chat
window open to learn the result.

Gateway side: the notification poller's status.update (kind=process)
event now carries the structured fields (event_type, command, exit_code)
alongside the agent-facing formatted text, so GUI clients can render
their own surfaces without parsing the [IMPORTANT: ...] prose.

Desktop side: a new process-notifications store mirrors the gateway's
display.background_process_notifications semantics (all/result/error/off,
false → off, unknown → all) and builds the toast content; the message
stream hook routes the event through the existing hermes:notify IPC
bridge. The toast only fires when the user can't see the in-chat
message — window hidden, or the process belongs to a non-active chat.
Watch matches never toast (they can fire many times per process).

Fixes NousResearch#44201

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
before-pack.cjs deliberately wipes appOutDir (release/win-unpacked with
the live Hermes.exe) so packs always stage into a clean tree, but the
Electron download/extract runs after that wipe — so a pack that fails
(corrupt cached zip, blocked download, proxy timeout) destroyed the only
executable the user had, and the desktop shortcut became a dead link.
hermes update triggers exactly this rebuild after git pull, including
from the GUI's Update Now button.

Park the current unpacked app under release/.rebuild-backup/ (cheap
same-volume rename) before invoking the pack, restore it if every retry
fails (or if a zero-exit pack produces no launchable app), and discard
it once a fresh build exists. The holder directory sits outside both the
release/*-unpacked glob that _purge_electron_build_cache clears between
retries and the mac* glob _desktop_packaged_executable uses for
detection, so the parked copy can't be purged mid-retry or mistaken for
a fresh build. Best-effort: if the rename fails, behavior is exactly as
before.

Fixes NousResearch#44225

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…at the user saw

The transform_llm_output hook fired after _persist_session, and its
result was never written back into the assistant message. The user saw
the transformed text for the current turn, but result["messages"], the
JSON session log, and the SQLite session DB all kept the raw model
output — which was then replayed as context on the next turn or after a
session resume.

Move persistence after the transform_llm_output hook and sync the
transformed text into the turn's last assistant text message first.
The sync stops at the turn boundary and never rewrites a tool-call or
non-text message. Persistence stays before post_llm_call, which is
observability-only (its return value is ignored) and whose plugins may
read the session store expecting the completed turn to be present.

Fixes NousResearch#44239

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The turn-start crash-resilience persist in build_turn_context calls
_apply_persist_user_message_override on the same messages list the API
request is later built from. When the ACP adapter sends a multimodal
prompt (text + image parts) it passes a text-only persist_user_message,
so the override replaced the content-part list with a plain string
before the first provider call — every image content block was dropped
end-to-end, regardless of model vision support.

Skip the override when the live content is a content-part list. The
synthetic-prefix cleanup the override exists for only applies to text
turns (voice prefix, ACP steering), and image redaction for the session
DB already happens in _flush_messages_to_session_db.

Fixes NousResearch#44242

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Some CDN edges and middleboxes accept TLS 1.2 handshakes but kill TLS 1.3
ClientHellos, surfacing as [SSL: UNEXPECTED_EOF_WHILE_READING] ~15s into
every request while curl (OS TLS stack) works fine. NousResearch#44365 hit this on
Windows desktop against api.deepseek.com: TLS1.2-only connects in 0.33s,
TLS1.3 dies after 15s.

Setting HERMES_TLS_MAX_VERSION=1.2 now caps the handshake on the primary
chat client. The ssl context is applied to the keepalive HTTPTransport
directly (httpx ignores client-level verify when an explicit transport is
passed) and to the Client so internally-built proxy mounts inherit the
same cap. The context honors the existing CA-bundle overrides
(HERMES_CA_BUNDLE > REQUESTS_CA_BUNDLE > SSL_CERT_FILE). 1.0/1.1 are
deliberately rejected; invalid values log a warning and fall back to
defaults.

Fixes NousResearch#44365

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
liuhao1024 and others added 2 commits June 12, 2026 10:22
Cherry-pick of upstream PR NousResearch#44801 (liuhao1024) — fixes the
test_notification_poller_emits_distinct_watch_matches_once CI flake
(our issue NousResearch#44789): poller threads leaked by earlier tests' _sessions.pop
teardowns kept consuming the global completion_queue. Drop when NousResearch#44801
merges upstream.
GitHub forces actions declaring node20 onto the node24 runtime by
default starting 2026-06-16, and removes node20 from runners on
2026-09-16. Every Tests/Lint run currently emits deprecation
annotations for the node20 pins.

This is a rebase of NousResearch#28333 (credit: daelnom-dev) onto current main.
Roughly half of that PR's bumps have since landed on main piecemeal
(checkout v6.0.2, setup-python v6.2.0, docker login v4.1.0,
build-push v7.1.0, upload/download-artifact v7/v8 in tests.yml,
sigstore v3.3.0, osv-scanner v2.3.8); this picks up the remainder,
keeping that PR's exact verified SHAs:

- astral-sh/setup-uv v5 + v6 -> v8.1.0 (node24)
- actions/upload-artifact v4 -> v7.0.1, download-artifact v4 -> v8.0.1
  (remaining lint/docker-publish/skills-index/pypi sites)
- actions/github-script v7 -> v9.0.0
- actions/setup-node v4 -> v6.4.0
- actions/create-github-app-token v1.9.3 -> v3.2.0 (inputs already
  use the v2+ hyphenated names)
- docker/setup-buildx-action v3 -> v4.0.0
- marocchino/sticky-pull-request-comment v2.9.1 -> v3.0.4
- actions/upload-pages-artifact v3 -> v5.0.0 + deploy-pages v4 ->
  v5.0.0 (documented compatible pair)
- cachix/cachix-action v17 re-tag SHA (tree-identical to current pin)
- comment fix: setup-python pin in lint.yml was already v6.2.0 but
  still labeled v5

Every new SHA was verified to match its tagged release commit in the
action's upstream repo, and each major bump's breaking changes were
checked against this repo's actual usage (inputs, outputs, credential
-dependent push flows in nix-lockfile-fix.yml, Pages staging in
deploy-site.yml, artifact name/pattern downloads) - no workflow
behavior changes required.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
AIalliAI and others added 6 commits June 12, 2026 11:16
Improve Telegram adapter rich-message support: switch the rich limit to a 32,768-character cap, add robust handling for missing-endpoint/capability errors (latch off rich sends), separate capability vs content BadRequest classification, and implement retry/flood-control/backoff logic for sendRichMessage. Add _rich_send_disabled flag and honor metadata.expect_edits to keep messages that will be edited on the legacy (editable) MarkdownV2 path. Carry reply parameters as reply_parameters, expose link_preview options, and ensure drafts only latch off on true capability failures (not per-frame parse errors). Propagate expect_edits from stream consumers and runner progress code so preview/status bubbles remain editable. Update tests to cover new behaviors and adjust docs (including i18n) to note the character limit, latching behavior, and flood-control semantics.
…d path

Parity with the legacy send() loop (dc4de14 / NousResearch#35664): an httpx pool
timeout explicitly never left the process, so the rich path now retries
it in place and surfaces retryable=True instead of silently dropping the
message. Flagged in the NousResearch#44780 review; the PR branch predates the legacy
pool-timeout fix, so it could not mirror it.
Conflicts: hermes_state.py (took upstream's hoisted _LISTABLE_CHILD_SQL +
delegate filter), apps/desktop chat-messages.ts (union of status.update
payload fields), use-message-stream.ts (kept both upstream's status-stack
refresh and the NousResearch#44201 native toast).
teknium1 pushed a commit that referenced this pull request Jun 12, 2026
… runs

The genuine-rate-limit branch set retry_count = max_retries before
continue, intending the top-of-loop Nous guard to handle fallback or
bail cleanly. But the loop condition is retry_count < max_retries, so
the guard never ran: no fallback activation, no clean rate-limit
message — just the generic retry-exhaustion error.

Set retry_count = max(0, max_retries - 1) so the loop body runs exactly
once more and the guard sees the breaker state recorded moments earlier.

Extracted from the #44061 bugfix rollup by @AIalliAI.
jhjaggars-hermes added a commit to jhjaggars/hermes-agent that referenced this pull request Jun 13, 2026
* feat(desktop): composer status stack, live subagent windows, editable prompts (NousResearch#44630)

* feat(desktop): session-scoped status stack + kill new-window theme flash

Stack subagents, background tasks, and the queue into one collapsible
"sink" above the composer, reusing the queue's chrome so every status
reads as one piece. Extracts shared StatusSection / StatusRow /
TerminalOutput primitives and a unified $statusItemsBySession store
(subagents mirrored, background owned here, merged + grouped for render).
Renames BrailleSpinner → GlyphSpinner now that it drives more than braille.

Separately, fix the white flash on every new/cmd-clicked window: macOS
`vibrancy` paints an NSVisualEffectView that follows the OS appearance and
ignores `backgroundColor`, so a dark app on a light-mode Mac flashed white
until the renderer painted over it. Pin `nativeTheme.themeSource` to the
app theme (persisted to userData so cold launches paint right before the
renderer loads), hold windows with `show:false` until `ready-to-show`, and
pre-paint the themed background via an inline script before the bundle runs.

* feat(desktop): dock the slash popover to the composer via one shared fill var

The slash·@ popover (and ? help) now docks onto the composer's edge with the
same chrome as the queue/status stack — rounded outer corners, fused borderless
edge, no shadow — but keeps its own narrow width.

Surface + drawer paint a single --composer-fill var; the state ladder
(rest / scrolled / focused / drawer-open) lives once in styles.css on
[data-slot='composer-root']. The :has() drawer-open rule is last and forces an
opaque fill, since translucent glass sampling different backdrops (thread vs
fade gradient) can never match. This replaces the focus-within !important
override that repainted the surface behind every previous matching attempt.

Also drop the chevron column from the project file tree — the folder open/closed
icon already carries the expand state.

* feat(desktop): base inset for file tree rows (post-chevron alignment)

* feat(desktop): wire the status stack's background tasks to the real process registry

The background group was UI-only (dev-mock seeded). Now it's live e2e:

- tui_gateway: new session-scoped `process.list` (registry snapshot filtered
  by the session's session_key, plus a 4KB output tail for the inline
  terminal viewer) and `process.kill` (single process, ownership-checked —
  unlike process.stop's kill_all).
- Renderer: `reconcileBackgroundProcesses` syncs snapshots into the store
  layout-stably — rows keep their position when state flips (never re-sort),
  new processes append, unchanged rows keep object identity so memoised rows
  skip re-rendering, and a dismissed-set stops the registry's retained
  finished procs from resurrecting X-ed rows.
- Refresh triggers: session open, terminal/process tool.complete,
  status.update(kind=process) from the gateway's notification poller, and a
  5s poll armed only while a running row is visible (catches silent exits).
- Stop = real `process.kill` + optimistic dismiss; Dismiss = client-side
  with resurrection guard.
- Re-keyed the stack to the RUNTIME session id: it was keyed by the stored
  session id, where neither subagent events nor process.list would ever land.
- Deleted dev-status-mocks.ts (__hermesStatusMocks) — no more seed shit.

Reconcile invariants covered in store/composer-status.test.ts.

* feat(desktop): todos + openable subagents in the status stack, self-healing file tree

- todo lists move out of the inline chat panel into the composer status stack
  (checklist icon, dashed ring = pending, spinner = in progress, check = done),
  fed live from todo tool events and seeded from history on session open
- subagent rows carry the child's real session id end-to-end
  (delegate_tool → gateway → renderer) so clicking one opens ITS session window
- status stack publishes its measured height so the thread's bottom clearance
  grows with it; card paints the shared --composer-fill so focused/scrolled
  states match the composer exactly
- file tree self-heals: ENOENT roots retry on a 3s cadence + Try again button,
  and the main process expands ~ in IPC paths (gateway cwds arrive as ~/...)
- composer drag-drop of tree entries inserts inline refs instead of attachments

* fix(desktop): file tree falls back to the workspace dir when a session's cwd is gone

Sessions record their launch cwd; deleted worktrees leave that path dead,
so opening such a session swapped the tree from the default workspace to a
directory that ENOENTs forever — the 3s retry just spun on it. On a root
read error the tree now asks main to sanitize the cwd (prefers the
configured default project dir), displays that fallback, and quietly
re-probes the original path so it switches back if the dir reappears.

* feat(desktop): working restore-checkpoint button on past user prompts

The discard icon on hover of a past user bubble was decorative — clicking
did nothing. It's now a real control: a confirmation dialog explains that
everything after the prompt is removed, then the session rewinds to that
turn and reruns the same prompt (prompt.submit with
truncate_before_user_ordinal, the same mechanism the edit composer uses).
Failures rethrow into the dialog's inline error instead of toasting.

* fix(desktop): show the restore-checkpoint button on the latest user prompt too

Restoring the most recent prompt is just 'retry this turn' — no reason to
exclude it. Stop still takes the slot while the turn is running.

* fix(desktop): finished todo lists clear themselves out of the status stack

A list whose every item is completed/cancelled lingers ~4s so the final
checkmark is visible, then the todo group drops out of the stack. A fresh
active list arriving within the linger cancels the scheduled clear.

* chore(desktop): drop dead editableCheckpoint copy, terser restore confirm

* fix(desktop): rewind clears the abandoned timeline's todos + background

Restoring to (or editing) an earlier prompt rewinds the conversation, but
the todos and background processes spawned by the now-discarded turns kept
showing in the status stack — and the real background processes kept
running. Both rewind paths now clear the session's todo rows and kill +
drop its background processes before the fresh run repopulates them. Also
drops the click-to-edit clamp transition, which flashed a half-expanded
bubble on the way into the edit composer.

* feat(desktop): user messages are always editable; edit/restore revert mid-stream

The bubble is now always click-to-edit — even while a turn streams — instead
of going inert during a run. Sending an edit acts like restore: it rewinds to
that prompt and re-runs with the new text. Both edit and restore can fire
mid-stream now; the gateway refuses prompt.submit while a turn runs (4009
"session busy"), so they interrupt the live turn first and retry the submit
until the cooperative interrupt winds it down. Restore (re-run as-is) shows on
every prompt except the latest running one, which keeps the Stop button.

* fix(desktop): label preview-pane ⌘L selections with the filename, not "zsh"

The terminal owns a global ⌘/Ctrl+L "send selection to composer" shortcut, so
selecting text in the file preview pane and hitting it fell through to the
terminal handler — which imported the right text but labelled the composer ref
"zsh:N lines" off the shell name. When the selection isn't an xterm selection,
label it with the previewed file instead.

* fix(desktop): ⌘L on a preview line selection inserts the @line ref, like dragging

The source preview lets you select lines in the gutter and drag them into the
composer as an @line:path:start-end ref. ⌘/Ctrl+L now does the same when a line
selection is active — it drops the identical ref instead of falling through to
the terminal's global handler (which grabbed the native text selection and sent
a bogus terminal block). Capture-phase + stopPropagation so it wins; with a line
selection there's no native selection, so the terminal handler stays out of it.

* chore: gitignore apps/desktop/demo/ scratch output

The desktop demo prompt writes demo/*.txt during recorded walkthroughs; it's
throwaway, never part of the app. Ignore it so it stops cluttering git status.

* feat(desktop): subagent watch windows, hard stop, sidebar hygiene

Child-session mirror for live subagent windows, delegate sessions tagged
and excluded from the sidebar, composer focus/stop polish, and WS stall
resilience on the gateway transport.

* refactor: DRY delegate SQL + trim status-stack noise

Extract shared listable-child and delegate-delete helpers in hermes_state,
collapse cancelRun busy release, and cut comment bloat in resume/status paths.

* fix(desktop): hide orphaned subagent sessions in sidebar

Cascade-delete all ephemeral children on parent delete (not just tagged rows),
run v16 backfill to tag legacy orphans, and record new delegates as source=subagent.

* fix: restore orphan contract for untagged children + lazy session eviction

Cascade-delete only _delegate_from-tagged rows (v16 backfill covers legacy),
walk marker chains recursively with FK-safe orphaning, gate lazy watch
sessions out of the still-starting eviction exemption via an explicit flag,
pass session_id to _make_agent only when resuming, and hide source=subagent
from session search.

* fix(gateway): gate child mirror off upgraded sessions + age out stale run entries

Review findings: the mirror could interleave synthetic events with a real
native stream once a watch window upgrades (prompt.submit builds an agent),
and a lost subagent.complete left _active_child_runs pinning running=true
forever. Mirror now stops when the live session owns an agent; liveness
reads ignore entries older than an hour.

* fix(gateway): reject prompt.submit into a watch session while its child runs

A lazy watch session's running flag is False (the run lives in the parent
turn), so typing mid-run sailed past the busy guard and built a second agent
racing the in-flight child on the same stored session. Busy error until the
run completes; afterwards the submit upgrades into a normal conversation.

* refactor(gateway): DRY watch-resume payload + compose listable-child SQL

Fold the duplicated child-run busy overlay into one _reuse_live_payload
helper across both resume reuse paths, collapse the twin mirror early-returns,
and build _LISTABLE_CHILD_SQL from _BRANCH_CHILD_SQL instead of restating it.

* fix(desktop): clip horizontal overflow on sidebar scroll areas

Add overflow-x-hidden alongside overflow-y-auto on session list scrollers
and the shared SidebarContent primitive — vertical scroll unchanged.

* fix(desktop): new chat honours the active profile instead of rubberbanding to default (NousResearch#45057)

The top "New Session" button (and /new, the keyboard shortcut) cleared
$newChatProfile to null, meaning "use the live gateway context". But
createBackendSessionForSend turned a null into an omitted `profile` param on
session.create. In global-remote mode one backend serves every profile, so an
omitted profile silently binds the new chat to the launch (default) profile's
home/state.db — the session "rubberbands back to default" even though the rail
still shows the selected profile. The per-profile "+" worked because it sets
$newChatProfile explicitly.

Resolve a null $newChatProfile to the active gateway profile at the single
session-creation chokepoint so session.create always carries the live profile.
Harmless for single-profile and local-pooled users: a backend resolves its own
launch profile to None (_profile_home), so passing it changes nothing.

* docs(website): redirect old automation-templates URL to automation-blueprints

The Automation Blueprints rebrand (NousResearch#44470) renamed the guide page from
guides/automation-templates to guides/automation-blueprints, leaving the
old URL 404ing. The site deploys to static hosting, so server-side
redirects aren't available.

Add @docusaurus/plugin-client-redirects (pinned 3.9.2, same as the other
Docusaurus packages) and a redirect entry for the old slug. The plugin
emits a static HTML page at the old path that meta-refresh/JS-redirects
to the new page, preserving query string and hash, with a canonical link
for SEO. Localized routes are handled automatically (zh-Hans verified).

* feat(desktop): window translucency slider in Appearance settings (NousResearch#45086)

A see-through-window control (0–100, off by default) that maps to the
native window opacity via setOpacity — the desktop shows through the whole
window, the same effect as the Windows shift-scroll trick. macOS + Windows;
a no-op on Linux (no runtime window opacity).

Renderer owns the value (persisted, nanostore) and mirrors it to the main
process over IPC; main persists it to translucency.json so a cold launch
applies it at window creation before the renderer reports in.

* fix(ci): remove pytest-timeout, use per-file timeout only

fix(ci): write a new cache for test durations every time
change(ci): rip out error 4 retries because we found the real bug

* fix(tests): mock subprocess.Popen in all _handle_update_command tests

* fix(tests): guard against real 'hermes update' subprocess spawns in conftest

Extends _live_system_guard in tests/conftest.py to block any subprocess
call that would run 'hermes update' (or 'python -m hermes_cli.main update')
against the real checkout.

These commands run git fetch origin + git pull, overwriting repo files
like pyproject.toml mid-test-run and corrupting every subsequent
subprocess that reads them. The spawned process uses setsid /
start_new_session=True so it's invisible to pytest's process tree
(PPid=1) — the corruption was essentially undetectable without
explicit inotify/SHA watchdogs.

Root cause of NousResearch#43703 CI failures: tests in TestUpdateCommandPlatformGate
called _handle_update_command() with HERMES_MANAGED='' and no Popen mock,
causing the code to fall through and spawn a real 'hermes update --gateway'
that overwrote pyproject.toml with origin/main's content (which still
had '--timeout=30 --timeout-method=thread' in addopts while the PR had
already removed pytest-timeout).

The guard covers all three invocation patterns:
- 'hermes update' / 'hermes update --gateway' (direct or via setsid bash -c)
- 'python -m hermes_cli.main update --gateway'
- '.venv/bin/hermes update' (absolute path variant)

Does not false-positive on: git update-index, apt-get update,
pip install --upgrade, or any command lacking 'hermes'/'hermes_cli'.

* fix(tests): remove no-longer-needed forensics

* fix(ci): only save test durations when tests pass

The save-durations job used `if: always()` which meant it would
run even when the test matrix failed, potentially caching duration
data from a failed/incomplete run. Changed to check
needs.test.result == 'success' so durations are only cached when
all test slices pass cleanly.

* refactor(desktop): use port 0 for ephemeral port discovery instead of PortPool reservation

Replace the PortPool-based port reservation system (9120-9199 range) with OS-assigned ephemeral ports via --port 0.

Before: Desktop probed a hardcoded port range, reserved ports in-process to close TOCTOU races, and passed the chosen port to the dashboard via CLI arg.

After: Desktop spawns dashboard with --port 0, parses the actual port from a stdout announcement line (HERMES_DASHBOARD_READY port=<N>), and uses that for WebSocket connections.

Changes:
- web_server.py: add --port 0 support with SO_REUSEADDR pre-bind + announcement; add EADDRINUSE preflight for explicit ports
- main.cjs: remove PortPool, PORT_FLOOR/CEILING, pickPort(), isPortAvailable(); add waitForDashboardPort() stdout parser
- Delete port-pool.cjs and port-pool.test.cjs (106 lines removed)

Net effect: eliminates the entire TOCTOU-mitigation reservation infrastructure and arbitrary port range constraints. OS handles port allocation natively.

* Update model correctly when updating from dashboard

* Update implementation to make it cleaner

* Skip redundant model switch

* fix(tui): config.yaml wins over env model seed in per-turn sync

Hosted instances set HERMES_INFERENCE_MODEL as a provision-time seed in
the container env. _config_model_target() previously went through
_resolve_model() (env-first), so on hosted VPS the sync target stayed
pinned to the seed and dashboard model changes never reached an open
chat -- the exact scenario the sync exists to fix. The sync target now
reads config.yaml first and only falls back to the env vars when config
has no model. Startup resolution (_resolve_model) is unchanged.

* Add Telegram Bot API 10.1 rich message support

Introduce opportunistic support for Telegram Bot API 10.1 rich messages by sending raw agent Markdown via sendRichMessage and streaming previews via sendRichMessageDraft. Implements a rich-path fast‑path in gateway/platforms/telegram.py (RICH_MESSAGE_MAX_BYTES=32768, feature gate platforms.telegram.extra.rich_messages, bot capability checks, routing/thread handling, and conservative fallback rules: permanent/capability errors fall back to the legacy MarkdownV2 path, transient/network errors are surfaced without legacy-resend). Also add a latch for draft capability failures (_rich_draft_disabled) and preserve legacy chunking and draft behavior when needed. Update agent prompt hints (telegram encourages rich Markdown/tables), add CLI config example option, update English and Chinese docs to describe rich messages and fallbacks, and add/adjust tests for rich send and draft behavior.

* fix: rich messages follow-ups — reply_parameters, send latch, opt-in default

- Use reply_parameters per the sendRichMessage spec instead of the
  undocumented reply_to_message_id scalar (silently ignored -> reply
  anchor quietly dropped).
- Latch rich sends off after an endpoint-capability failure (old PTB /
  server without sendRichMessage) so every later reply doesn't pay a
  doomed extra roundtrip; per-message BadRequests do NOT latch.
- Default rich_messages to OFF (opt-in) while the day-old Bot API 10.1
  endpoint is validated live; revert the prompt-hint table guidance
  until the default flips on.
- Tests: reply_parameters shape, send-latch behavior, BadRequest
  non-latch; rich tests opt in explicitly via extra.

* fix(send): helpful error when --file gets a binary; document MEDIA: attachments (NousResearch#45116)

A user passing an image to `hermes send --file` got a raw
UnicodeDecodeError ('utf-8 codec can't decode byte 0x89...') with no
hint that media delivery goes through the MEDIA:<path> directive.

- send_cmd: catch UnicodeDecodeError separately and print a usage error
  explaining --file is for text bodies, with copy-pasteable MEDIA: and
  [[as_document]] examples using the user's own path
- --file help text + epilog now mention MEDIA:
- docs: new 'Sending images and other media' section on the hermes send
  reference page

* fix: stop Discord typing after replies

* chore: add itsflownium to AUTHOR_MAP

* test: assert typing-stop-before-callback as an invariant, not a call count

The shared _stop_typing_refresh cleanup makes up to two bounded
stop_typing attempts; the old assertion pinned exactly one
typing-stopped event before callback-start.

* fix(agent): re-enter retry loop on genuine Nous 429 so fallback guard runs

The genuine-rate-limit branch set retry_count = max_retries before
continue, intending the top-of-loop Nous guard to handle fallback or
bail cleanly. But the loop condition is retry_count < max_retries, so
the guard never ran: no fallback activation, no clean rate-limit
message — just the generic retry-exhaustion error.

Set retry_count = max(0, max_retries - 1) so the loop body runs exactly
once more and the guard sees the breaker state recorded moments earlier.

Extracted from the NousResearch#44061 bugfix rollup by @AIalliAI.

* test: regression guard for Nous 429 fallback re-entry; AUTHOR_MAP entry

* fix(update): never spawn an interactive polkit prompt when restarting a system-scope gateway (NousResearch#45145)

When hermes update restarts a hermes-gateway system service as a
non-root user, the systemctl reset-failed/start/restart calls trigger
polkit's org.freedesktop.systemd1.manage-units TTY authentication
agent. That prompt runs inside a captured subprocess with a 10-15s
timeout, so it flashes and dies before the user can answer, and the
resulting TimeoutExpired was swallowed silently by the loop's blanket
except — the restart phase just vanished with no output.

- Resolve a manage-units command prefix up front: plain systemctl as
  root, sudo -n systemctl as non-root (with a targeted reset-failed
  probe so least-privilege sudoers entries scoped to hermes-gateway*
  qualify), or None when no non-interactive privilege path exists.
- Add --no-ask-password to every manage-units call in the update
  restart path so polkit can never prompt inside a captured subprocess.
- When unprivileged: after a graceful drain, rely on systemd's own
  RestartSec auto-restart (needs no privileges) with a message about
  the wait; skip the force-restart fallback with clear manual
  instructions instead of racing a doomed polkit prompt.
- Surface TimeoutExpired in the restart loop instead of passing
  silently, and add sudo to the system-scope recovery hints.
- Docs: headless-VM note recommending user service + enable-linger,
  or sudo updates / a scoped NOPASSWD sudoers entry for system
  services.

* fix(delegation): remove the default subagent wall-clock timeout (NousResearch#45149)

Subagents doing legitimate heavy work (deep code reviews, research
fan-outs, slow reasoning models) were routinely killed at the blanket
600s child_timeout_seconds cap while making steady progress (e.g. 36
API calls completed when the axe fell). Failures should come from what
the child is actually doing — API errors, tool errors, iteration
budget — not a delegation-level stopwatch.

- DEFAULT_CHILD_TIMEOUT: 600 -> None; Future.result(timeout=None)
  blocks until the child finishes
- config default delegation.child_timeout_seconds: 600 -> 0
  (0/negative = disabled; positive opts back in, floor 30s unchanged)
- stuck-child protection unchanged: the heartbeat staleness monitor
  still stops refreshing parent activity so the gateway inactivity
  timeout fires on a truly wedged worker; the 0-API-call diagnostic
  dump still works when a cap is configured
- docs updated (EN + zh-Hans)

* fix(dashboard): skill installs from the dashboard silently auto-cancel (NousResearch#45150)

The dashboard's /api/skills/hub/install (and the new-profile hub_skills
path) spawned `hermes skills install <id>` with stdin=DEVNULL but
without --yes. do_install()'s 'Confirm [y/N]' prompt hit EOF, defaulted
to 'n', and printed 'Installation cancelled.' into a background log the
user never sees — every dashboard install no-opped.

Pass --yes on both spawn sites, matching the uninstall endpoint which
already passed --yes. The dashboard install button is the explicit user
consent, same as the TUI/slash-command skip_confirm rationale.

Repro: spawned the exact argv with stdin=DEVNULL against a temp
HERMES_HOME — without --yes it cancels, with --yes the skill installs.

* fix(compression): always append END OF CONTEXT SUMMARY marker to standalone summaries regardless of role

When the compression summary lands as an assistant-role message (head ends
with user), the end marker was not appended. Models may regurgitate the
summary text as their own visible output when there's no clear boundary
signal (NousResearch#33256).

The end marker was already appended for user-role summaries (NousResearch#11475, NousResearch#14521)
but the assistant-role path was missed in the original fix. This ensures ALL
standalone summary messages carry the boundary marker, preventing summary
text from leaking into user-visible chat output.

* refactor(agent): hoist summary end marker to _SUMMARY_END_MARKER; strip it on rehydration

Follow-up to the NousResearch#33346 cherry-pick:
- the marker string was duplicated at both insertion sites (standalone +
  merged-into-tail); hoist to a module constant
- _strip_summary_prefix now also strips a trailing end marker so a
  rehydrated handoff body doesn't leak the boundary directive into the
  iterative-update summarizer prompt (it is re-appended on insertion)

* fix(profiles): exclude session history, backups, and snapshots from --clone-all (NousResearch#45246)

--clone-all copied the source profile's state.db, sessions/, backups/,
state-snapshots/, and checkpoints/ into the new profile. These are
per-profile history: a 49GB copy in practice (15GB snapshots + 11GB
backup archives + 16GB state.db + 6.4GB sessions), and restoring a
copied backup inside the clone would resurrect the SOURCE profile's
state. A clone is a fresh workspace; history stays with the source.

New _CLONE_ALL_HISTORY_EXCLUDE_ROOT set, applied at root level for ANY
source profile (named profiles accumulate the same artifacts), unlike
the default-gated infrastructure excludes. Nested same-name dirs still
copy. Docs and the post-create CLI message updated to match; profile
export / hermes backup remain the full-history paths.

* fix(compressor): keep last visible assistant reply out of compaction summary + label handoffs in WebUI (NousResearch#29824)

Two-pronged fix for the WebUI "context compaction block in place of
last assistant response" regression.

Agent layer (the real fix). ``_find_tail_cut_by_tokens`` already had
``_ensure_last_user_message_in_tail`` to keep the most recent user
request out of the compressed middle (NousResearch#10896), but no symmetric
anchor for the assistant side. When the conversation has an
oversized recent tool result or a long stretch of tool-call/result
pairs *after* the assistant's last visible reply, the token-budget
walk can stop with the previously-visible reply on the wrong side
of ``cut_idx``. The summariser then rolls it into the single
``[CONTEXT COMPACTION — REFERENCE ONLY]`` block persisted as
``role="user"`` or ``role="assistant"``, and from the operator's
perspective the WebUI session viewer
(``web/src/pages/SessionsPage.tsx``) and the TUI chat panel both
suddenly show the opaque "Context compaction" block in the slot
where they were just reading the actual answer:

    User:  "i cant see the output of the last message you sent,
            i did see it previously, however now see 'context
            compaction'"

Added ``_ensure_last_assistant_message_in_tail`` mirror of the
user-side anchor. It looks for the most recent assistant message
with non-empty text content (skipping tool-call-only assistant
"stubs" which the UI renders as small "calling tool X" indicators
rather than a readable bubble) and walks ``cut_idx`` back through
the standard ``_align_boundary_backward`` so we don't split a
tool_call/result group that immediately precedes it. The two
anchors are chained — each only walks ``cut_idx`` backward, so
the tail can only grow.

Falls back to "most recent assistant of any kind" only when no
content-bearing reply exists in the compressible region (fresh
multi-step tool sequence with no prior reply) — in that case the
agent-side fix is effectively a no-op and the existing
user-message anchor carries the load.

WebUI layer (clarity). Added ``isCompactionMessage`` detector that
recognises the ``[CONTEXT COMPACTION — REFERENCE ONLY]`` (current)
and ``[CONTEXT SUMMARY]:`` (legacy) prefixes from
``agent/context_compressor.py``, and a new ``compaction`` entry
in ``MessageBubble``'s ``ROLE_STYLES`` map. Compaction blocks
now render as muted, italicised system-style rows labelled
``Context handoff`` — clearly metadata, not the assistant's
actual reply — so an operator scrolling back through a long
session can't mistake the summary for a real answer.

Keeping the detected prefixes inline (rather than importing them)
because the WebUI bundle has no Python interop. A guardrail comment
points readers at the source-of-truth constants in
``agent/context_compressor.py``.

* fix(webui): split merge-into-tail compaction so reply renders as its own bubble (NousResearch#29824)

The compressor has a "double-collision" fallback path: when the
chosen ``summary_role`` collides with the first tail message AND
the flipped role would collide with the last head message, it can't
emit a standalone summary turn (consecutive same-role messages
break Anthropic and friends). It instead prepends the summary +
end-of-summary marker to the first tail message's content via
``_merge_summary_into_tail``.

With the matching anchor from the previous commit, that first tail
message is now usually the user's previously-visible assistant
reply — so the persisted assistant turn ends up shaped as
``[CONTEXT COMPACTION ...] ... --- END OF CONTEXT SUMMARY --- ...
THE ACTUAL REPLY``. Without splitting it, the session viewer
renders one big "Context handoff" bubble and the reply text is
buried inside the metadata blob — which is exactly the
"can't see the last reply" experience NousResearch#29824 reports, just one
layer deeper.

Added ``splitCompactionContent`` that detects the merge marker
(kept in sync with ``--- END OF CONTEXT SUMMARY — respond to the
message below, not the summary above ---`` in
``agent/context_compressor.py``) and ``MessageBubble`` now
recurses on the two halves: the prefix half renders as the muted
"Context handoff" row, the remainder half renders with the
original assistant styling. Pure (non-merged) summary messages
hit the no-remainder branch and still render as a single
"Context handoff" row, preserving the original behaviour.

* test(compressor): regression coverage for assistant-tail anchor + compaction rollup (NousResearch#29824)

21 cases pinning the new ``_ensure_last_assistant_message_in_tail``
anchor and its interaction with the existing tail-cut path:

* ``TestFindLastAssistantMessageIdx`` — helper contract: prefers a
  content-bearing assistant message, skips ``tool_calls``-only
  stubs, multimodal text-block content counts, falls back to
  "any assistant" when no content-bearing reply exists, honours
  ``head_end``, returns -1 when there's none.

* ``TestEnsureLastAssistantMessageInTail`` — direct: no-op when
  already in the tail, walks ``cut_idx`` back when the reply is
  in the compressed middle, never crosses into the head region,
  re-aligns through a preceding ``tool_call`` / ``tool_result``
  group instead of orphaning it.

* ``TestFindTailCutByTokensAnchorsAssistant`` — integration:
  reporter repro (long tool-output run after the visible reply)
  now preserves the reply; user and assistant anchors compose
  in a single tail-cut call; a soft-ceiling-overrunning oversized
  tool result no longer strands the prior reply.

* ``TestCompactionRollupReproduction`` — end-to-end through
  ``compress()`` with a stubbed ``_generate_summary``: the
  visible reply text survives either as its own standalone
  assistant message (normal path) or concatenated onto the
  merged summary tail (double-collision path the WebUI then
  re-splits). The standalone-summary case is asserted strictly
  (exactly one summary row, exactly one separate assistant
  row carrying the reply) — that's the dominant path and any
  drift there reintroduces the original bug.

* ``TestSourceGuardrail`` — static asserts on
  ``agent/context_compressor.py``: the helper exists, the
  anchor is wired into ``_find_tail_cut_by_tokens`` AFTER the
  user-message anchor (so chaining is monotonic), the
  content-bearing preference is preserved, and the issue
  number is referenced so future bisects can find this fix.

* fix(profiles): backfill .env for pre-existing profiles on hermes update (NousResearch#45247)

Profiles created before NousResearch#44792 have no .env. Now that the Channels/Keys
endpoints are profile-scoped (no os.environ fallback), those profiles
would show everything as unconfigured. hermes update now copies the
default install's .env into each named profile that lacks one (0600,
never overwrites, placeholder fallback when the root has no .env), so
existing users keep the credentials they were effectively running with.

* feat(desktop): follow streaming output at bottom + jump-to-bottom button (NousResearch#45263)

Strict sticky-bottom autoscroll for the chat thread: while the viewport is
parked at the bottom, the tail follows content growth (streaming tokens, late
measurement, Shiki re-highlight) via a useLayoutEffect keyed on the
virtualizer's own size signal, pinned in the same pre-paint pass as its
scrollToFn so the two never rubber-band. The gate is a single boolean — one
upward pixel (scroll/wheel/touch) disarms follow until the user returns to the
bottom.

Adds a floating jump-to-bottom control that appears once scrolled ~10px away
(above the dim threshold so a sub-pixel settle never flashes it), positioned
above the composer with respect to the status stack, with a subtle
scale + slide in/out animation that honours prefers-reduced-motion. The button
bridges to the virtualizer's re-arm + pin path through a small nanostore
emitter.

Supersedes NousResearch#43624.

* feat(desktop): worktree-aware sidebar grouping + composer/sidebar UX fixes

Group recents as parent-repo → worktree → sessions using local git
metadata (probed over IPC, with a path-name heuristic fallback for
remote backends). Single-worktree repos collapse to one level. Sessions
order by creation time and never reshuffle on new messages.

Also: fuse the status stack to the composer border, restore icon actions
in the queue panel, fix sidebar label truncation and drag styling, hide
sticky-message attachments while pinned, and bump the terminal font.

* feat(desktop): move workspace/worktree drag handle into the leading icon

Mirror the session row: the repo/worktree header's leading glyph (repo
mark, or a new git-branch mark for worktrees) swaps to a grabber on
hover/drag instead of carrying a separate handle on the right — freeing
header width for the label and + button.

* fix(agent): preserve recent turns during compression

* fix(agent): clamp flush cursor after repair_message_sequence compaction (NousResearch#44837)

* fix(agent): rewind flush cursor exactly when repair compacts before the cursor

Follow-up to the NousResearch#44837 clamp: a min() clamp only fixes cursor overshoot
past the new end of the list. When repair_message_sequence drops/merges
messages at indexes below the cursor, the clamp leaves the cursor pointing
past unflushed rows and the turn-end flush silently skips them.

Extract repair_message_sequence_with_cursor(): snapshot the flushed prefix
by object identity before repair, then recompute the cursor as the count
of surviving flushed messages. Falls back to the clamp when no snapshot is
available. Keeps the safety guard in _flush_messages_to_session_db.

Adds targeted tests for overshoot, before-cursor compaction, no-repair,
bare-agent, and the flush guard.

* refactor(desktop): extract shared WorkspaceHeader for repo + worktree rows

The repo and worktree header rows were ~identical after the handle move.
Fold them into one WorkspaceHeader (emphasis flag for the repo level) plus
a small WorkspaceAddButton, so the toggle/handle/count/+ wiring lives in
one place.

* fix(skills): run youtube transcript helper through uv

* fix(agent): add metadata flag to context compression summary messages (NousResearch#38389)

Summary messages (standalone insertion and merge-into-tail) now carry a
metadata flag so frontends (CLI, Desktop, gateway, TUI) can distinguish
them from real assistant/user messages without content-prefix heuristics.

Re-applied from PR NousResearch#38434 onto current main (conflicted with the
_SUMMARY_END_MARKER hoist). Key renamed from the PR's
'is_compressed_summary' to '_compressed_summary': the wire sanitizers
strip underscore-prefixed message keys, so the flag stays in-process and
can never reach strict gateways (Fireworks/Mistral/Kimi reject unknown
keys with 'Extra inputs are not permitted').

* test: compressed-summary metadata flag set in-process, stripped on wire

* chore: add Kimi K2.7 code catalog slug (NousResearch#45283)

* refactor(desktop): collapse sidebar drag-reorder into one generic ReorderableList

Every reorderable surface (repos, worktrees, sessions, pins) now drops in a
single ReorderableList that owns its own DndContext, so a drag only ever
collides with that list's own items — nesting "just works" without leaking
into the lists around or inside it. This replaces the shared DndContext +
id-prefix dispatch (parent:/group:) whose closestCenter collisions resolved
to a different-typed droppable and silently no-op'd worktree/repo drags.

- Delete groupDndId/parentDndId/parse* helpers and the monolithic
  handleAgentDragEnd/handlePinnedDragEnd; each list persists its new id order
  via a direct typed write (reorderParents/reorderWorktree/reorderSessions/
  reorderPinned).
- Sessions inside repos/worktrees are date-ordered and static (no drag),
  matching the "never reorder on new messages" rule.
- Add setPinnedSessionOrder; drop now-unused reorderPinnedSession.

* fix(desktop): crisp terminal text via opaque xterm canvas

The terminal looked soft/heavy on every platform because the xterm
Terminal was built with allowTransparency: true, which drops the WebGL
renderer's opaque fast-path and bakes glyphs as grayscale-alpha coverage
for compositing over a see-through canvas. Our surface (--ui-bg-chrome)
is opaque and withSurface already paints it, so transparency was pure
blur for no benefit — VS Code keeps it off too. Also drop the Medium
(500) base weight for normal/bold (400/700) to match VS Code's metrics,
and remove the now-unused JetBrains Mono Medium face + woff2.

* fix(desktop): stop streaming autoscroll bounce; move attachments below user bubble

Streaming auto-follow chased content growth while parked at the bottom,
which rubber-banded — the tail pin and the virtualizer's own measurement
adjustments fought for scrollTop. Drop it; the one-time new-turn jump
already lands a fresh message in view and the viewport stays put after.

Attachments rendered inside the editable user bubble and were collapsed
via an IntersectionObserver + [data-stuck] CSS hack while the bubble was
pinned. Render them as a flow sibling BELOW the sticky bubble instead, so
they scroll away behind it naturally — no observer, no collapse. Image
refs still render as thumbnails, file refs as chips; no border. Removes
the now-unused useStuckToTop hook and its CSS.

* perf(desktop): isolate streaming re-renders & cut layout thrash

During a token stream $messages is replaced ~30x/s. Subscribing the whole
chat view to it re-rendered the composer, runtime boundary, and every
message on every delta.

- Derive coarse facts (empty thread? tail is user?) via nanostores
  `computed` atoms so per-token flushes don't re-render their consumers.
- Move the $messages subscription + runtime wiring into a dedicated
  ChatRuntimeBoundary; the composer reads $messages imperatively.
- Drive message rows off stable useAuiState selectors and a lazy
  getMessageText getter instead of eagerly materialized text.
- Feed ResizeObserver entry sizes into measureClamp / FadeText and dedupe
  the style writes, killing the read-write-read reflow cascade.

* perf(desktop): incremental markdown rendering during streams

Re-parsing the full message markdown every reveal frame is O(N^2) over a
long answer and dominated stream CPU.

- Throttle useSmoothReveal commits to ~1 frame (REVEAL_MIN_COMMIT_MS).
- Memoize block parsing with an LRU keyed on source text so only changed
  blocks re-parse.
- Replace Streamdown's full-text parseIncompleteMarkdown with a
  tail-bounded remend: scan to the last top-level boundary outside
  fences/math and repair only the trailing open block. New remend-tail.ts
  is proven render-equivalent to full remend at every streaming prefix
  (remend-tail.test.ts), minus an intentional, documented divergence on
  cross-block dangling openers.

* perf(desktop): faster session resume & warm AudioContext at idle

- Resume: fire the REST transcript prefetch and the session.resume RPC in
  parallel, and skip the redundant message conversion + reconciliation
  when the prefetch already hydrated the transcript.
- Haptics: web-haptics builds its AudioContext lazily on first trigger,
  paying the ~850ms CoreAudio spin-up on the first streamStart haptic as
  the first token paints. Open/close a throwaway context at idle so the
  real one connects to an already-warm audio service.

* build(nix): refresh npmDepsHash for the remend dependency

Adding remend changed package-lock.json, so the flake's pinned npm deps
hash went stale and `nix flake check` failed. Bump it to match.

* fix(desktop): theme the image-gen placeholder instead of a white square (NousResearch#45354)

The diffusion placeholder read `--dt-*` tokens via
`getComputedStyle().getPropertyValue()`, but those resolve through `var()`
chains into `color-mix(in srgb, …)` — returned verbatim and unparseable, so
every token fell to a hardcoded light fallback (white card). In dark mode the
placeholder rendered as a white square.

Resolve each token through a throwaway probe element's `color` so the browser
computes it to a concrete color, and teach `parseColor` Chromium's
`color(srgb r g b / a)` serialization. Re-resolve on theme repaint via a
MutationObserver rather than per animation frame.

* fix(desktop): stop stranding queued prompts across backend bounces

A prompt typed mid-turn ("ghost bubble") could stick forever and never
send when the backend restarted/reconnected during the turn. Two fragile
assumptions in the composer queue drain caused it:

1. Drain fired ONLY on an observed busy true→false edge. A remount/
   reconnect resets `previousBusyRef` to the current busy value, so the
   settle edge is swallowed and the queue never drains. Replace
   `shouldAutoDrainOnSettle` with the edge-independent `shouldAutoDrain`
   (idle + non-empty), driven on the settle edge, on mount/reconnect, and
   after a re-key. The drain lock still serializes sends.

2. The queue is keyed by `queueSessionKey || sessionId`. When a backend
   resume mints a new runtime session id for the same conversation, the
   entry strands under the dead key. Pass the *stable* stored id as
   `queueSessionKey` so the composer can tell runtime churn from a real
   session switch, and `migrateQueuedPrompts` re-keys pending entries on a
   runtime-id change only (never on a deliberate switch).

Also make the drain resilient to a thrown/rejected onSubmit (e.g. a stale-
session 404): the entry stays queued and is retried on the next idle, with
a per-entry attempt cap (MAX_AUTO_DRAIN_ATTEMPTS) to avoid spin-loops and a
quiet toast once it gives up. A manual send clears the backoff.

Tests: composer-queue covers edge-free drain + re-key migration;
use-prompt-actions covers rejected-drain-keeps-entry + idle retry sends.

* fix(desktop): keep queued drains quiet on transient "session busy"

A queued drain firing on the settle edge can race a not-yet-wound-down
turn and get a transient 4009 "session busy". Previously that appended a
red "session busy" error bubble (and toast) per attempt. For fromQueue
submits, swallow the busy error: release busy, keep the entry queued, and
let the composer's bounded auto-drain retry on the next idle.

* fix(desktop): never surface "session busy" — retry every submit past it

"Session busy" (4009) is the gateway's concurrency guard, not a user-facing
error. The queue already covers the deliberate "type while busy" case, so
the only leak was a submit racing the settle edge. Generalize the rewind
path's busy-retry into a shared `withSessionBusyRetry` and wrap every
`prompt.submit` (fresh send, session-resume resubmit, and rewind) so a
transient busy is ridden out within a bounded deadline and the call lands
silently. The fromQueue swallow stays as a backstop for the pathological
>deadline case.

* chore: uptick

* fix(desktop): keep recents sorted unless manually reordered (NousResearch#45404)

* Sync homelab/main to upstream/main with minimal carried patches

Rebased onto upstream/main (9c50521) and reapplied the minimal homelab
patch set:
- Dockerfile: add iproute2 + GitHub CLI (gh) from official apt repo
- pyproject.toml: add langfuse optional extra
- plugins/observability/langfuse/__init__.py: Responses API serialization
- plugins/platforms/discord/adapter.py: role-mention invocation support
- tests/gateway/test_discord_role_mentions.py: role-mention test coverage
- tests/plugins/test_langfuse_plugin.py: Responses API test coverage
- .github/workflows/build.yml: GHCR image publish workflow

---------

Co-authored-by: brooklyn! <brooklyn.bb.nicholson@gmail.com>
Co-authored-by: SHL0MS <SHL0MS@users.noreply.github.com>
Co-authored-by: ethernet <arilotter@gmail.com>
Co-authored-by: IAvecilla <ignacio.avecilla@lambdaclass.com>
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
Co-authored-by: ITheEqualizer <ali.zakaee.1997@gmail.com>
Co-authored-by: Flownium <157689911+itsflownium@users.noreply.github.com>
Co-authored-by: Aðalsteinn Helgason <adalsteinnhelgason@Aalsteinns-MacBook-Pro-3.local>
Co-authored-by: Tranquil-Flow <tranquil_flow@protonmail.com>
Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com>
Co-authored-by: konsisumer <der@konsi.org>
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>
Co-authored-by: Hermes Agent <hermes-agent@users.noreply.github.com>
AIalliAI and others added 5 commits June 13, 2026 09:22
…daccb755)

Upstream consolidated the feature (05b9c84 + 652dd9c, incl. our
pool-timeout retryable fix), so the cherry-picks are superseded. Revert
before merge-sync to take upstream's implementation wholesale.
# Conflicts:
#	agent/context_compressor.py
#	tui_gateway/server.py
…o HISTORICAL_TASK_HEADING

The 06-13 main merge took agent/context_compressor.py from upstream (which
renamed the '## Active Task' summary heading to the HISTORICAL_TASK_HEADING
constant) but left these two test files at their pre-rename bugfixes state,
so they asserted the literal '## Active Task' string and failed on CI test
shards 1 and 3. Both files are now byte-identical to upstream main, matching
the compressor they exercise.
Cherry-pick of upstream PR NousResearch#45525 (hanzckernel). The post-commit
file-length invariant (_check_file_length_invariant) fired on
intermediate checkpoint states in WAL mode, where the main DB file
legitimately lags the header page count while data still lives in the
-wal, raising a spurious 'torn-extend' on a healthy DB under concurrent
worker bursts. Skip the check in WAL mode (crash safety comes from
synchronous=FULL + WAL recovery), keep it for DELETE journal mode, and
restore SQLite's default wal_autocheckpoint=1000 to reduce checkpoint
churn. Verified: negative control shows the 3 WAL-skip/autocheckpoint
tests fail on the pre-fix source and pass after; 217 kanban_db tests green.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder invalid This doesn't seem right

Projects

None yet

Development

Successfully merging this pull request may close these issues.