test: hermeticize env-sensitive inference and CLI cases#2
Conversation
…sResearch#25828) Adds references/template-integrity.md covering safe conversion of the official comfyui-workflow-templates package from editor format to API format — Reroute bypass via link tracing, dotted dynamic-input keys (values.a, resize_type.width) that must NOT be flattened, server-error "patch don't rebuild" loop, Cloud quirks (302 redirect to signed GCS URL, free-tier 1 concurrent job, 1920x1080 OOM on RTX 5090), and a Discord-compatible ffmpeg stitch recipe (yuv420p + xfade/acrossfade). SKILL.md lists the new reference so the agent loads it when starting from an official template. purzbeats added to author list and to scripts/release.py AUTHOR_MAP. Co-authored-by: purzbeats <97489706+purzbeats@users.noreply.github.com>
…agent dispatch (NousResearch#25845) WhatsApp pseudo-chats (Status updates / Stories, Channels / Newsletters, broadcast lists) were being routed through the full agent pipeline. A user's gateway.log showed the agent replying to a contact's Story ('status@broadcast') with 345 chars plus title-generation cost, which also shows up in the contact's status feed. Drop these JIDs at _should_process_message() before the policy gate so they're filtered regardless of dm_policy or allowlist state. Covers: - status@broadcast (Stories) - *@newsletter (Channels) - *@broadcast (broadcast lists, future-proofing) The bridge.js already filters these on the fromMe outbound path, but inbound events on self-chat mode skipped that check. Tests: - status@broadcast dropped on open policy - broadcast filter wins over allowlisted senders - real DMs still pass through - helper unit cases (case-insensitive, whitespace-tolerant) 26/26 tests/gateway/test_whatsapp_group_gating.py pass; 59/59 adjacent WhatsApp test suites pass.
…r-check-unblock fix(ci): unblock shared PR checks
… not skipped When the final streamed text is identical to the last plain-text edit, stream_consumer._send_or_edit short-circuits and never calls adapter.edit_message(finalize=True). For Telegram, this skips the plain-text → MarkdownV2 conversion, leaving raw Markdown syntax visible to the user. Set REQUIRES_EDIT_FINALIZE = True on TelegramAdapter so the finalize edit is always delivered, matching the existing DingTalk pattern. Fixes NousResearch#25710
`hermes config set gateway.streaming.*` writes the streaming block nested under a `gateway:` key in config.yaml, but the config loader only checked for a top-level `streaming:` key — silently ignoring the nested variant. Fall back to `yaml_cfg['gateway']['streaming']` when the top-level key is absent, matching the pattern already used for other nested config sections. Closes NousResearch#25676
…iled When the stream consumer's got_done handler successfully delivers the final response content via _send_or_edit but the subsequent edit (e.g. cursor removal) fails, final_response_sent remains False even though the user has already received the final answer. The gateway's fallback send path then re-delivers the same content, causing the user to see the response twice on Telegram. Introduce a new _final_content_delivered flag on the stream consumer, set by the got_done handler when the final content has reached the user. The _run_agent suppression logic now treats this flag as an additional signal (alongside final_response_sent and response_previewed) that final delivery is already complete. This preserves the existing behavior for intermediate-text-only streams (where already_sent=True but no final content has been delivered) — those still receive the gateway's fallback send, matching the test expectation in test_partial_stream_output_does_not_set_already_sent. Adds TestFinalContentDeliveredSuppression with two cases covering both the suppression (content delivered + edit failed) and the non-suppression (intermediate text only) branches.
NousResearch#25929) When codex app-server fails outside the OAuth-classified path (non-auth turn/start errors, plain TimeoutErrors, generic turn-ended status, subprocess silently exits, hard deadline timeout), the user got a bare 'Internal error' / 'turn/start failed: ...' with no context. Diagnosing config/provider/auth-bridge issues forced a re-run with verbose codex flags. Add a _format_error_with_stderr helper that appends the last few stderr lines via agent.redact.redact_sensitive_text(force=True), and use it at every catch-all error site: - ensure_started() failures (codex init / thread/start) now return a TurnResult.error with should_retire=True instead of bubbling - non-OAuth turn/start CodexAppServerError / TimeoutError - subprocess-died branch (previously dumped raw stderr_blob[-300:] with no redaction — a leak risk) - turn ended with non-completed status - hard turn-timeout deadline OAuth-classified failures and the post-tool quiet watchdog already produce clean hints and stay unchanged. The redactor catches sk-*, gh*_*, Authorization: Bearer, query-string tokens, JWTs, private keys, etc., so provider error payloads can't leak into chat output or trajectories. Inspired by openclaw#80718, adapted for our app-server transport.
…#25967) The spinner already shows tool activity visually; the 1.2 kHz tone on every tool.started event was unwanted noise (especially on WSL2, where each beep also triggers Windows Terminal's bell notification). Removed the play_beep call in _on_tool_progress entirely. Record start/stop beeps (gated by voice.beep_enabled) are unaffected.
…ize (NousResearch#25975) When the terminal shrinks, already-printed box-drawing rules (response, reasoning, streaming TTS, background-task Panels) reflow into multiple narrower rows — visible as duplicated horizontal separators / ghost lines in scrollback. Similarly, prompt_toolkit redraws a fresh status bar on SIGWINCH on top of one the terminal just reflowed, producing double-bar artifacts on column shrink. Two surgical changes: 1. Decorative scrollback boxes now use a new `HermesCLI._scrollback_box_width()` helper that clamps to `max(32, min(width, 56))`. The live TUI footer is unaffected and still uses the full width. Covers: streaming response box (open + close), reasoning box (open + close, both streaming and post-stream paths), streaming-TTS box close, final-response Rich Panel, and the background-task Rich Panel. 2. `_recover_after_resize()` now also sets a new `_status_bar_suppressed_after_resize` flag so the dynamic status bar and both input separator rules stay hidden until the next user input. The flag is cleared in the process loop the moment the user submits their next prompt, restoring chrome cleanly. Tests: - New `test_input_rules_hide_after_resize_until_next_input` covers the flag's effect on rule heights. - New `test_scrollback_box_width_caps_to_resize_safe_value` covers the helper at floor / cap / mid-range / overflow. - Existing resize-recovery test extended to assert the flag flips. Refs: NousResearch#18449 NousResearch#19280 NousResearch#22976 Salvage of NousResearch#24403. Co-authored-by: Szymonclawd <szymonclawd@mac.home>
- Treat same-dimension resize events in alt-screen mode as a repaint signal, because terminal hosts can reflow or restore the physical buffer without changing columns/rows. - Ensure pending resize erases are emitted even when the virtual diff is empty, so stale physical glyphs are still cleared. - Extract alt-screen resize repaint into prepareAltScreenResizeRepaint() for readability. - Add defensive clearTimeout in prepareAltScreenResizeRepaint so rapid resize bursts don't stack redundant delayed repaints. - Add a focused regression test for same-dimension alt-screen resize healing. Addresses NousResearch#18449 Related to NousResearch#17961
…esearch#25969) Adds 'hermes proxy start' — a local HTTP server that lets external apps (OpenViking, Karakeep, Open WebUI, ...) use a Hermes-managed provider subscription as their LLM endpoint. The proxy attaches the user's real OAuth-resolved credentials to each forwarded request, refreshing them automatically; the client can send any bearer (it gets stripped). Ships with one adapter — Nous Portal. The UpstreamAdapter ABC and registry in hermes_cli/proxy/adapters/ are designed for additional OAuth providers to plug in by name without server changes. Commands: hermes proxy start [--provider nous] [--host 127.0.0.1] [--port 8645] hermes proxy status hermes proxy providers Allowed Portal paths: /v1/chat/completions, /v1/completions, /v1/embeddings, /v1/models. Anything else returns 404 with a clear error pointing at the allowed list. aiohttp is gated like gateway/platforms/api_server.py (try-import, clean runtime error if missing). No new core dependency. Tests: 24 unit tests + 1 separate E2E that spawns the real subprocess and verifies the upstream receives the right bearer with the client's header stripped.
Adds optional channel-context backfill for Discord shared-channel sessions so the agent can see recent messages it missed between its own turns (typically when require_mention=true filters out most traffic). Previously the agent only saw the @mention message that triggered it, which led to disorienting replies in active multi-user channels where the conversation context was invisible. With backfill enabled, a configurable number of recent messages are fetched per-turn and prepended to the trigger message as a context block, kept separate from sender-prefix logic so attribution remains clean. This re-opens the work from NousResearch#13063 (approved by @OutThisLife on 2026-04-20, closed when I closed the branch to address the simpolism:main head-branch issue plus an ordering bug I caught later in live use). Filing against the freshly-rewritten problem statement in NousResearch#13054 so the design is grounded in the failure mode rather than the implementation shape. The implementation follows the **push-mode last-self-anchored** design from the two options laid out in NousResearch#13054. See the issue for the trade-off discussion vs pull-mode (NousResearch#13120 was an earlier closed PR using that shape). Treating this as a reference implementation — happy to rewrite as last-trigger anchoring or as a hybrid with NousResearch#13120 if maintainers prefer. Changes: - gateway/platforms/discord.py: - new `_discord_history_backfill()` / `_discord_history_backfill_limit()` helpers (config.extra > env > default), mirroring the existing `_discord_require_mention()` shape - new `_fetch_channel_context()` that scans `channel.history()` backwards from the trigger to the bot's last message (or limit), formats as `[Recent channel messages] / [name] msg / ...`, respects DISCORD_ALLOW_BOTS, skips system messages - per-channel `_last_self_message_id` cache to narrow the fetch window on hot paths (avoids full history scan when the bot has spoken recently) - **IMPORTANT**: passes `oldest_first=False` explicitly to `channel.history()`. discord.py 2.x silently flips the default to True when `after=` is supplied, which would select the EARLIEST N messages after our last response instead of the LATEST N before the trigger. In high-traffic windows this would return stale tool traces and drop the actual final answer the user is asking about. See regression test below. Caught in live use during a Codex tool-trace burst on May 13 2026. - gateway/config.py: discord_history_backfill + discord_history_backfill_limit settings + yaml→env bridge - gateway/platforms/base.py: channel_context field on MessageEvent - gateway/run.py: prepend channel_context after sender-prefix so the [sender name] tag applies to the trigger message alone, not to the backfill - hermes_cli/config.py: defaults for new discord.history_backfill and discord.history_backfill_limit keys - cli-config.yaml.example: documented defaults - tests/gateway/test_discord_free_response.py: 7 new tests covering cold-start backfill, self-message stop boundary, other-bot filtering, cache hot-path narrowing, stale-cache fallback, shared-channel + per-user backfill paths, and the ordering regression test (`test_fetch_channel_context_cache_uses_latest_window_when_after_set`) - tests/gateway/test_config.py: yaml→env bridge tests - tests/gateway/test_session.py: prefix-order edge cases - website/docs/user-guide/messaging/discord.md: env vars + config keys + usage docs Tested on Ubuntu 24.04 — empirically validated in my own multi-bot Discord research server for the past three weeks. Fixes NousResearch#13054 Supersedes NousResearch#13063 (closed)
Follow-up to snav's PR NousResearch#25463 contribution: flip default to on, broaden scope so backfill fires whenever require_mention gates the bot (not just shared-session channels). Why: - The mention-gate creates a session-transcript gap regardless of whether the channel is shared or per-user. In per-user sessions, Alice's session is still missing other participants' messages and her own pre-mention messages — backfill fills both gaps. - Threads naturally scope to thread-only history because discord.py's channel.history() on a thread returns only that thread's messages. - DMs still skip — every DM triggers the bot, so the session transcript is already complete. Changes: - hermes_cli/config.py: discord.history_backfill default → true - gateway/platforms/discord.py: drop the _is_shared gate, keep _is_dm skip and _needed_mention gate; env var DISCORD_HISTORY_BACKFILL default → 'true' - cli-config.yaml.example + website docs: update defaults and prose; add the DISCORD_HISTORY_BACKFILL / _LIMIT env var rows that were documented in the PR description but missing from the env-var table - tests/gateway/test_discord_free_response.py: - flip test_discord_per_user_channel_does_not_backfill → test_discord_per_user_channel_backfills_too (new behavior) - add test_discord_dm_does_not_backfill (DM skip is invariant) - give FakeThread a no-op history() so existing thread tests don't hit a fake discord.Forbidden when backfill now fires on threads too Tests: 160/160 in target files; 400/400 across all tests/gateway/ -k discord.
The prebuild step used `rm -rf` and `cp -r`, which fail on Windows (`'rm' is not recognized`). Replace with an inline Node one-liner using fs.rmSync / fs.cpSync so the build works on Windows, macOS, and Linux without adding a dependency.
…Research#25978) Pre-existing diagnostics below an edit point used to surface as 'LSP diagnostics introduced by this edit' whenever the edit deleted or inserted lines. The delta-filter key included the diagnostic's range, so the same logical error reported at a different line in the post-edit snapshot looked like a brand new diagnostic. Concrete case: deleting 14 lines in cli.py caused Pyright errors at lines 9873, 10590, 12413, 13004 (unrelated to the edit) to be reported as introduced by it. Fix: build a piecewise-linear line-shift map (via difflib's SequenceMatcher) from pre and post content, and remap baseline diagnostics into post-edit coordinates before the set-difference. Diagnostics in deleted regions drop out cleanly; diagnostics below the edit shift by the right amount; diagnostics above are untouched. The strict (range-aware) equality key stays — so a genuinely new instance of an identical error class at a different line still surfaces as new. Pieces: - agent/lsp/range_shift.py — build_line_shift, shift_diagnostic_range, shift_baseline. Pure functions, no LSP state. - agent/lsp/manager.py — LSPService.get_diagnostics_sync gains an optional line_shift kwarg; baseline is shift_baseline'd before computing the seen-set. _diag_key keeps the strict range key. - tools/file_operations.py — write_file captures pre_content for any LSP-handled extension (not just LINTERS_INPROC) and passes pre/post to _maybe_lsp_diagnostics, which builds the shift map. - New _lsp_handles_extension helper guards the pre_content read. Trade-offs preserved: - Genuinely new same-class errors at different lines still surface (content-only key would have swallowed them). - Pre-existing errors at unshifted positions still get filtered (covered by the strict-key path with no shift). - Best-effort: when pre_content can't be captured (file didn't exist, permissions), the unshifted comparison still catches most pre-existing errors; the edge case it misses is a new file with a non-empty baseline, which is structurally impossible.
Three Windows-only bugs in the web-dashboard build path. Each is small, scoped, and verified end-to-end on Windows 11 — including under a stock cmd.exe / PowerShell console with its default cp1252 encoding. 1. `sync-assets` shells out to Unix-only commands web/package.json hard-codes `rm -rf … && cp -r …`. Neither exists on Windows cmd.exe. `hermes_cli/main.py::_build_web_ui` runs npm via subprocess (which on Windows defaults to cmd.exe), so the prebuild hook crashed before Vite ever ran and the dashboard never built. Fix: web/scripts/sync-assets.mjs — ~20 lines of Node using fs.rmSync + fs.cpSync (stdlib, Node >= 16.7). No new deps, identical behavior on POSIX and Windows. 2. Build failures were silent _build_web_ui ran both subprocess calls with capture_output=True and never relayed the captured buffers on failure. Users saw 'Web UI build failed' and nothing else — no stdout, no stderr, no hint that the real problem was 'rm is not recognized'. Fix: inner _relay() helper that decodes and prints stdout + stderr (utf-8, errors='replace') whenever a step returns non-zero. Replaces the existing stderr_tail-only relay on the build path; success path is unchanged. (stderr_tail is preserved for the stale-dist fallback branch added by NousResearch#23817.) Salvaged from NousResearch#13368 by @johnisag onto current main. Conflict resolution preserves main's improvements: - _run_npm_install_deterministic() (replaces bare subprocess.run for npm install) - npm-build retry-after-sleep for Windows boot-time races (NousResearch#23817) - stale-dist fallback for non-interactive callers (NousResearch#23817) Closes NousResearch#25073, NousResearch#13368.
Codex review pointed out that even with the sync-assets fix applied,
_build_web_ui still crashes on a stock Windows console before reaching
npm: Python stdout defaults to cp1252 (or similar) and raises
UnicodeEncodeError when print() hits the arrow/check glyphs used for
status messages (→, ✗, ⚠, ✓). Reproduced locally in PowerShell:
$ PYTHONIOENCODING=cp1252 python -c "from hermes_cli.main import _build_web_ui; _build_web_ui(Path('web'), fatal=True)"
UnicodeEncodeError: 'charmap' codec can't encode character '\u2192' ...
The previous PR body claimed "end-to-end verified on Windows 11", but
that was under the venv's default (utf-8) stdout. A plain `py` or
PowerShell invocation would still fail before sync-assets ever ran.
Fix: inner _say() helper that falls back to
text.encode(sys.stdout.encoding, errors="replace")
when print() raises UnicodeEncodeError. Glyphs degrade to '?' on
ASCII / cp1252 consoles; utf-8 consoles are unaffected. Verified the
full build pipeline runs to completion with PYTHONIOENCODING=cp1252.
Scoped tightly to _build_web_ui (the function this PR already touches);
other call sites in the codebase with the same risk are out of scope.
…gnal_handler The call site at line 246 is already wrapped in try/except NotImplementedError (added in NousResearch#25969). The checker just doesn't peek at surrounding context. Mark with the suppression comment so the blocking check passes.
The 'sessions' command has been registered in the central command registry since NousResearch#20805 (May 2025) and surfaces in /help and tab-completion, but the classic CLI's process_command() never had an elif branch for it. The canonical name fell through and printed 'Unknown command: sessions'. The TUI side was wired up correctly via the SessionPicker overlay; only the legacy CLI was missing the dispatch. Adds _handle_sessions_command() which mirrors /resume's no-arg behavior inline (the CLI has no overlay primitive equivalent to the TUI picker): - /sessions and /sessions list → print the recent-sessions table - /sessions <id_or_title> → delegates to _handle_resume_command Includes regression tests covering the dispatcher wiring (the original bug) plus the three handler branches.
…NousResearch#11049) (NousResearch#26759) Plugins can now replace a built-in tool by passing override=True to ctx.register_tool(). Without it, the registry rejects any registration that would shadow an existing tool from a different toolset (unchanged default behavior). Unlocks the use case from NousResearch#11049: drop-in replacement of browser/web backends without forking core. Composes with the existing pre_tool_call hook for runtime interception of any implementation. The override is audit-logged at INFO so it surfaces in agent.log.
Document the three protocols already available for driving hermes-agent from external programs — ACP, the TUI gateway JSON-RPC, and the OpenAI-compatible API server — with a 'which one should I use' guide and a Pi-style RPC command mapping table. Sidebar entry under Developer Guide -> Architecture.
Zero-install localhost tunnels over SSH via Pinggy. Covers HTTP/HTTPS, TCP, TLS, access control (basic auth / bearer / IP whitelist), header manipulation (CORS, force-HTTPS), web debugger, Pro token mode, and four composite recipes (webhook receiver, MCP server exposure, local LLM endpoint share, dev-server quick-share with one-shot password). Closes NousResearch#361
…sor stops drifting (NousResearch#26717) * fix(tui): keep Ink displayCursor in sync with fast-echo writes so cursor stops drifting TextInput's fast-echo bypass writes characters directly to stdout to avoid waiting on a React re-render for each keystroke. The hardware cursor advances by text.length cells, but Ink's cached `displayCursor` (the basis for the next frame's relative cursor-move preamble in log-update) stayed unchanged. When ANY unrelated component re-rendered between the fast-echo write and the deferred composer setCur/setParent flush — status bar timer, streaming reasoning, etc. — the next frame's preamble emitted a relative cursor move from a stale parked position and the hardware cursor parked N cells offset from the actual caret. Visible symptom: extra whitespace between the just-typed character and the cursor block, intermittent, worse on long sessions during streaming. Alt-screen was immune because frames begin with absolute CSI H. This adds a small API in @hermes/ink: - `Ink.noteExternalCursorAdvance(dx, dy?)` — bumps displayCursor if set, otherwise seeds from frontFrame.cursor so the next preamble's relative move correctly cancels the external advance. No-op on alt-screen. - `CursorAdvanceContext` + `useCursorAdvance()` hook to expose it. TextInput then calls `noteCursorAdvance(text.length)` after the fast-echo `stdout.write(text)` append, and `noteCursorAdvance(-1)` after the fast-backspace `\b \b` sequence. Tests: 4 new vitest cases pin the API contract (bumps when set, seeds from frontFrame.cursor when null, alt-screen no-op, zero-delta no-op). All 751 ui-tui tests pass; tests/test_tui_gateway_server.py (177) pass. * fix(tui): also advance cursorDeclaration so fast-echo survives deferred React state Copilot review on PR NousResearch#26717 flagged a gap in the original fix: TextInput's fast-echo path defers the React `cur` state update by 16ms (perf optimization that batches re-renders during heavy typing). Inside that window, `useDeclaredCursor` still publishes a target computed from the PRE-keystroke `cur` — `cursorLayout(display, cur, columns)`. Advancing only `displayCursor` would let any unrelated re-render in that 16ms window run onRender's cursor-park branch with the stale declaration and visually undo the fast-echo's advance. The fix is symmetric: `noteExternalCursorAdvance` now bumps BOTH `displayCursor` (the log-update relative-move basis) AND, if non-null, `cursorDeclaration.relativeX/Y` (the target the cursor parks at after every frame). When React finally flushes `setCur`, `useDeclaredCursor` publishes a fresh declaration that supersedes our bumped one — exactly what we want. Adds two new vitest cases covering both halves: - active declaration advances in lock-step with displayCursor - null declaration stays null (no spurious bump) All 753 ui-tui tests pass; tests/test_tui_gateway_server.py (177) pass. Closes review threads: PRRT_kwDOPRF1G86ChKtD (textInput.tsx:1016 fast-echo append) PRRT_kwDOPRF1G86ChKtF (textInput.tsx:924 fast-backspace) PRRT_kwDOPRF1G86ChKtG (ink-cursor-advance.test.ts:57 missing coverage) * fix(tui): make fast-echo survive TextInput rerenders + alt-screen (Copilot round 2) Round 2 of PR NousResearch#26717 review. Three real holes Copilot flagged after the initial cursorDeclaration bump: 1. alt-screen early-return skipped BOTH halves of the notifier. But the default TUI wraps the composer in <AlternateScreen> — that IS the production path. CSI H resets log-update's relative-move basis, but the alt-screen park branch uses absolute CUP = `rect.x + decl.relativeX`, so a stale declaration there still parks the cursor at the pre-keystroke caret. Fix: skip ONLY the displayCursor half on alt-screen; still bump cursorDeclaration. 2. TextInput's own rerender could clobber the Ink-level bump. The fast- echo path defers setCur by 16ms; if a parent state change rerenders TextInput in that window, the layout effect inside useDeclaredCursor reads the stale React `cur` state and re-publishes a declaration at the OLD column. Fix: `cursorLayout(display, curRef.current, columns)` — read the always- up-to-date ref, not the deferred state. useMemo dropped (compute is cheap, single-line wrap-text in the common case). 3. Tests bypassed the production wiring. Added two structural tests: - `still advances cursorDeclaration on alt-screen` in the Ink-level suite, asserting displayCursor stays put but the declaration advances by the delta. - `textInputCursorSourceOfTruth.test.ts` pins three structural invariants: layout reads curRef.current, never the bare `cur` state, and the fast-echo stdout.write calls remain paired with noteCursorAdvance(±N). Source-grep invariants > flaky Ink mount tests for this kind of regression. 757/757 ui-tui tests pass (+3 over round 1). type-check clean. lint introduces zero new errors on touched files. tests/test_tui_gateway_server.py (177) pass. Closes review threads: PRRT_kwDOPRF1G86ChOG2 (ink.tsx alt-screen guard) PRRT_kwDOPRF1G86ChOG9 (textInput.tsx fast-backspace rerender window) PRRT_kwDOPRF1G86ChOHC (textInput.tsx fast-append rerender window) PRRT_kwDOPRF1G86ChOHJ (alt-screen test asserts wrong invariant) PRRT_kwDOPRF1G86ChOHP (missing integration-style coverage) * fix(tui): reject fast-backspace at soft-wrap boundary (Copilot round 3) PR NousResearch#26717 round 3. Copilot caught two real things: 1. `\b \b` cannot move the terminal cursor onto the previous visual row across a soft-wrap boundary. When the caret sits at visual column 0 of a wrapped row (e.g. value 'hello ' at width 6 → cursorLayout produces (line 1, col 0)), backspace would leave the physical cursor in place while the logical caret moves up to the end of the previous visual line. `noteCursorAdvance(-1)` would then feed Ink a wrong delta. Fix: `canFastBackspaceShape` now takes the composer width and rejects when `cursorLayout(value, cursor, columns).column === 0`. The fast path falls through to the normal Ink render, which correctly lays out the new caret position. The PR-description inconsistency about alt-screen is fixed in a separate gh pr edit. Adds 4 new tests in textInputFastEcho.test.ts pinning the rejection at exact-multiple wrap boundaries plus a positive control inside a wrapped line and a back-compat case where `columns` is omitted. 761/761 ui-tui tests pass. type-check / lint clean. 177/177 Python tests/test_tui_gateway_server.py pass. Closes review threads: PRRT_kwDOPRF1G86ChxE5 (textInput.tsx:933 wrap-boundary regression) * fix(tui): polish doc + tests after Copilot round 4 Three polish points Copilot raised: 1. canFastBackspaceShape doc comment overstated the legacy contract — said it conservatively rejects potential wrap boundaries when columns is omitted, but the implementation actually skips the wrap-boundary check entirely. Reworded to make the legacy behavior explicit and warn callers not to rely on protection they don't get. 2. ink-cursor-advance.test.ts rationale comment for the 'advances cursorDeclaration in lock-step' case still referenced the pre-fix `cursorLayout(display, cur, columns)` expression. Now accurately describes the current source of truth — `curRef.current` in textInput.tsx — and explains the window the bump is bridging. 3. Removed the three `__get*ForTest` accessors from Ink. The test file already cast the instance to inspect private state in the couple of tests that needed declaration mutation; the rest now use a small `peek(ink)` helper that does the same cast for reads. No test-only API surface ships in production. 761/761 ui-tui tests pass. type-check clean. lint introduces zero new errors on touched files. 177/177 tests/test_tui_gateway_server.py pass. Closes review threads: PRRT_kwDOPRF1G86Ch23W (canFastBackspaceShape doc accuracy) PRRT_kwDOPRF1G86Ch23f (stale test rationale) PRRT_kwDOPRF1G86Ch23p (test-only API surface in production) * fix(tui): tighten doc + add dy test coverage (Copilot round 5) Two polish points from round 5: 1. canFastBackspaceShape doc had two paragraphs that conflicted — the main 'Additionally rejects when the physical cursor sits at visual column 0' was stated unconditionally, then the columns-param paragraph qualified that it only happens when columns is passed. Reworked into clear 'When supplied / When omitted' branches with a concrete example value ('hello ' returns true without columns even though it would be unsafe at width 6). No more inconsistency. 2. Added a test asserting cursorDeclaration.relativeY advances when dy is non-zero. Existing tests exercised dy on displayCursor only. Newlines in fast-echoed text don't currently hit the bypass (canFastAppendShape rejects '\n'), but dy is part of the public notifier contract and must propagate symmetrically with dx so future callers get a fully-implemented contract. 762/762 ui-tui tests pass (+1). type-check / lint / build clean. Closes review threads: PRRT_kwDOPRF1G86Ch6Sz (doc inconsistency) PRRT_kwDOPRF1G86Ch6TE (missing dy coverage on declaration) * fix(tui): doc polish (Copilot round 6) Four small but valid points: 1. textInputCursorSourceOfTruth.test.ts used bare 'fs'/'path'/'url' imports; the rest of ui-tui consistently uses the 'node:' prefix (see src/__tests__/useSessionLifecycle.test.ts, src/lib/editor.test.ts). Switched to node:fs / node:path / node:url to match convention. 2. CursorAdvanceContext.ts type-level doc described only displayCursor. The notifier intentionally also mutates the active cursorDeclaration and that's the only part that matters on alt-screen. Reworked the doc into a two-part 'updates both' summary with the alt-screen asymmetry called out explicitly. 3. use-cursor-advance.ts hook doc had the same problem. Same fix — document both pieces of state, both screen modes. 4. App.tsx onCursorAdvance prop comment was incomplete. Same fix — describe both state updates and the screen-mode asymmetry. No behavior change. 762/762 ui-tui tests pass. type-check / lint / build clean. Closes review threads (auto-resolved on PR but valid critiques): PRRT_kwDOPRF1G86Ch926 (node: prefix on built-in imports) PRRT_kwDOPRF1G86Ch92_ (use-cursor-advance.ts doc) PRRT_kwDOPRF1G86Ch93H (CursorAdvanceContext.ts type doc) PRRT_kwDOPRF1G86Ch93J (App.tsx prop comment)
…text (NousResearch#26823) Adds _sanitize_tool_error() in model_tools and routes both error paths through it: registry.dispatch's try/except (the primary path for tool exceptions) and handle_function_call's outer except (defense in depth). Stripping targets structural framing tokens that the model itself can react to even though json.dumps already handles wire-layer escaping: XML role tags (tool_call, function_call, result, response, output, input, system, assistant, user), CDATA sections, and markdown code fences. Caps message body at 2000 chars and wraps with [TOOL_ERROR] prefix. Defense-in-depth: a tool exception carrying '<tool_call>...' won't break message framing (json escapes it), but the model still reads those tokens and they nudge it toward role-confusion framing. Ported from ironclaw#1639 (one piece of NousResearch#3838's three-feature scout). The truncated-tool-call (NousResearch#1632) and empty-response-recovery (NousResearch#1677, NousResearch#1720) pieces are skipped because main now implements both far more thoroughly (run_agent.py L8147/L12209/L13012 for truncation retry + length rewrite; L4500/L15090+ for empty-response scaffolding stripper, multi-stage nudge, fallback model activation).
…uth (NousResearch#26763) * feat(x_search): gated X (Twitter) search tool with OAuth-or-API-key auth Salvages tools/x_search_tool.py from the closed PR NousResearch#10786 (originally by @Jaaneek) and reworks its credential resolution so the tool registers when EITHER xAI credential path is available: * XAI_API_KEY (paid xAI API key) is set in ~/.hermes/.env or the env, OR * The user is signed in via xAI Grok OAuth — SuperGrok subscription — i.e. hermes auth add xai-oauth has been run Both paths route through xAI's built-in x_search Responses tool at https://api.x.ai/v1/responses. When both credentials exist OAuth wins, matching tools/xai_http.py's existing preference order (uses SuperGrok quota instead of paid API spend). The check_fn calls resolve_xai_http_credentials() which auto-refreshes the OAuth access token if it's within the refresh skew window, so a True return means the bearer is fetchable AND non-empty. Wiring - tools/x_search_tool.py — new tool, ~370 LOC. Schema gated by check_fn, bearer resolved per-call so revoked OAuth surfaces a clean tool_error rather than an HTTP 401. - toolsets.py — "x_search" toolset def. NOT added to _HERMES_CORE_TOOLS; users opt in via hermes tools. - hermes_cli/tools_config.py — CONFIGURABLE_TOOLSETS entry + TOOL_CATEGORIES block with two provider options (OAuth + API key) sharing the existing xai_grok post_setup hook for credential bootstrap. - hermes_cli/config.py — DEFAULT_CONFIG["x_search"] with model / timeout_seconds / retries. Additive nested key; no version bump. - tests/tools/test_x_search_tool.py — 13 tests covering HTTP shape, handle validation, citation extraction, 4xx/5xx/timeout handling, and the full credential-resolution matrix (OAuth-only, API-key-only, both-set, neither-set, resolver-raises, config overrides, registry registration). - website/docs/guides/xai-grok-oauth.md — adds X Search to the direct-to-xAI tools section with off-by-default note. - website/docs/user-guide/features/tools.md — new row in the tools table. Off by default — users enable via `hermes tools` → 🐦 X (Twitter) Search. Schema only appears to the model when xAI credentials are configured. Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com> * docs(x_search): add dedicated feature page + reference entries - website/docs/user-guide/features/x-search.md (new) — full feature walkthrough: authentication, enablement, configuration, parameters, returned fields, example, troubleshooting, see-also links. - website/docs/reference/tools-reference.md — new "x_search" toolset section with parameter docs and credential gating note. - website/docs/reference/toolsets-reference.md — new row in the toolset catalog table. - website/sidebars.ts — wires the new feature page under Media & Web, after web-search. --------- Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>
…NousResearch#26824) Subagent delegation hardcoded api_mode='chat_completions' for any delegation.base_url that didn't match three specific hostnames (chatgpt.com, api.anthropic.com, api.kimi.com/coding), and never read delegation.api_mode from config. Azure AI Foundry's https://foundry.services.ai.azure.com/anthropic endpoint fell through and got chat_completions, causing 404s on every delegate_task call. The main agent already handles this correctly via the shared _detect_api_mode_for_url() helper (anything ending in /anthropic → anthropic_messages); delegation reimplemented its own narrower check. Reuse the shared detector and honor an explicit delegation.api_mode when set so users can also force the transport on non-standard endpoints the URL heuristic can't classify. Fixes NousResearch#10213. Co-authored-by: HiddenPuppy <HiddenPuppy@users.noreply.github.com>
…26825) Port from openai/codex#17667: MCP servers can now opt-in to parallel tool execution by setting supports_parallel_tool_calls: true in their config. This allows tools from the same server to run concurrently within a single tool-call batch, matching the behavior already available for built-in tools like web_search and read_file. Previously all MCP tools were forced sequential because they weren't in the _PARALLEL_SAFE_TOOLS set. Now _should_parallelize_tool_batch checks is_mcp_tool_parallel_safe() which looks up the server's config flag. Config example: mcp_servers: docs: command: "docs-server" supports_parallel_tool_calls: true Changes: - tools/mcp_tool.py: Track parallel-safe servers in _parallel_safe_servers set, populated during register_mcp_servers(). Add is_mcp_tool_parallel_safe() public API. - run_agent.py: Add _is_mcp_tool_parallel_safe() lazy-import wrapper. Update _should_parallelize_tool_batch() to check MCP tools against server config. - 11 new tests covering the feature end-to-end. - Updated MCP docs and config reference.
…earch#26829) Port three hardening patches from Claude Code 2.1.113's expanded deny rules to hermes' detect_dangerous_command() pattern list. 1. macOS /private/{etc,var,tmp,home} system paths /etc, /var, /tmp, /home are symlinks to /private/<name> on macOS. A write to /private/etc/sudoers works identically to /etc/sudoers but bypassed the plain /etc/ pattern check. Extracted a shared _SYSTEM_CONFIG_PATH fragment so /etc/ and the /private/ mirror stay in sync across redirect / tee / cp / mv / install / sed -i patterns. 2. killall -9 / -KILL / -SIGKILL / -s KILL / -r <regex> Parallel to the existing pkill -9 pattern. killall -9 against non-hermes processes was previously unprotected, and killall -r can sweep unrelated processes matching a regex. 3. find -execdir rm Same destructive effect as find -exec rm but ran in each match's directory. The previous pattern required a literal '-exec ' so -execdir slipped through. Guarded by 32 new test cases in 4 test classes: - TestMacOSPrivateSystemPaths (11 cases) - TestKillallKillSignals (9 cases) - TestFindExecdir (4 cases) - TestEtcPatternsUnaffectedByRefactor (6 regression guards on the existing /etc/ coverage after the _SYSTEM_CONFIG_PATH refactor) Inspiration: https://github.com/anthropics/claude-code/releases (Claude Code 2.1.113, April 17 2026 - "Enhanced deny rules" and "Dangerous path protection")
…rsions (NousResearch#26830) Closes NousResearch#10695. Picks up the still-vulnerable Python pins on current main: - aiohttp 3.13.3 -> 3.13.4 (messaging, slack, homeassistant, sms extras + lazy_deps platform.slack) — CVE-2026-34513 (DNS cache exhaustion), CVE-2026-34518 (cookie/proxy-auth leak on cross-origin redirect, relevant for the gateway since it handles OAuth tokens), CVE-2026-34519 (response reason injection), CVE-2026-34520 (null bytes in headers), CVE-2026-34525 (multiple Host headers). - anthropic 0.86.0 -> 0.87.0 (anthropic extra + lazy_deps provider.anthropic) — CVE-2026-34450 (memory tool files created mode 0o666), CVE-2026-34452 (path-traversal in async local-filesystem memory tool). Not directly exploitable since hermes-agent doesn't use the SDK's filesystem memory tool, but the SDK is bumped for hygiene. - cryptography pinned explicitly at 46.0.7 in core dependencies — CVE-2026-39892 (buffer overflow on non-contiguous buffers). Previously came in transitively via PyJWT[crypto]; the explicit floor keeps the WeCom/Weixin crypto paths from drifting below the fix. curl-cffi from the original issue is no longer in pyproject.toml or uv.lock, so no action needed there. uv.lock regenerated cleanly; only aiohttp / anthropic / cryptography moved. Credit: original issue + scoping by @shaun0927 (NousResearch#10695, NousResearch#10701). Floor analysis and packaging-surface audit by @gnanirahulnutakki (NousResearch#10784), adapted to current main's exact-pin style. Co-authored-by: shaun0927 <shaun0927@users.noreply.github.com> Co-authored-by: Gnani Rahul Nutakki <gnanirahulnutakki@users.noreply.github.com>
…arch#355) (NousResearch#26729) * feat(skills): add osint-investigation optional skill (closes NousResearch#355) Phase-1 public-records OSINT investigation framework adapted from ShinMegamiBoson/OpenPlanter (MIT). Lives in optional-skills/research/. Six data-source wiki entries (FEC, SEC EDGAR, USAspending, Senate LD, OFAC SDN, ICIJ Offshore Leaks), each following the 9-section template: summary, access, schema, coverage, cross-reference keys, data quality, acquisition, legal, references. Six stdlib-only acquisition scripts that emit normalized CSV, plus three analysis scripts: - entity_resolution.py — three-tier match (exact / fuzzy / token overlap) with explicit confidence per row - timing_analysis.py — permutation test for donation/contract timing correlation, joins through cross-links - build_findings.py — assembles structured findings.json with evidence chains pointing back to source rows Validation: full pipeline runs end-to-end on synthetic fixtures. Entity resolution found 24 cross-matches with 0 false positives on a 5-row / 4-row test set. Timing analysis on 5 donations clustered near 3 awards returned p=0.000, effect size 2.41 SD. Findings JSON correctly tags HIGH-severity timing pattern. All 9 scripts pass --help and py_compile. Docs site page auto-generated by website/scripts/generate-skill-docs.py; sidebar + catalog entries updated by the same generator. * fix(osint-investigation): live API fixes from end-to-end sweep Live-tested the skill on a real public-citizen query and found three bugs the synthetic E2E missed. All three are now fixed and re-verified. 1. FEC fetch hung on contributor name searches. The combination of two_year_transaction_period + sort=date + contributor_name puts the OpenFEC query plan on a slow path that the upstream gateway times out (25s+). Switched to min_date/max_date with no explicit sort. Renamed --candidate to --contributor (the original name was misleading: FEC searches by donor, not by candidate; --candidate is kept as a deprecated alias). Added --state filter for narrowing. 2. ICIJ Offshore Leaks reconcile endpoint returns 404. ICIJ removed the Open Refine reconciliation API. Rewrote fetch_icij_offshore.py to download the official bulk CSV ZIP (~70 MB, public, no auth) and search it locally. Cached under $HERMES_OSINT_CACHE/icij/ (default ~/.cache/hermes-osint/icij/) for 30 days, --force-refresh to refetch. Verified live: 'PUTIN' query returns 5 Panama Papers officer matches in 0.5s after first download. 3. SEC EDGAR silently returned 0 when the company-name resolver matched an individual Form 3/4/5 filer (insider trading disclosures). Now surfaces 'Resolved company X → CIK Y (Z)' on stderr, prints a filing-type histogram when the type filter wipes results, and explicitly warns when the matched CIK appears to be an individual filer rather than a corporate registrant. Bonus: _http.py was retrying 429 responses with exponential backoff plus honoring (often-missing) Retry-After headers, which compounded into multi-second hangs per page when the upstream key was over quota. Changed to fail-fast on 429 with a clear, actionable error showing the upstream's quota message. Verified: 0.3s fast-fail vs the previous 60s hang on DEMO_KEY rate-limit exhaustion. Updated SKILL.md, fec.md, and icij-offshore.md to match the new CLI flags and ICIJ bulk-cache flow. Regenerated the docusaurus page via website/scripts/generate-skill-docs.py. Live sweep results across all 6 sources for 'Dillon Rolnick, New York': - OFAC SDN: 0 matches ✓ (correctly not sanctioned) - USAspending: 0 matches ✓ (correctly not a federal contractor) - Senate LDA: 0 matches ✓ (correctly not a lobbying client) - SEC EDGAR: warns it resolved to 'Rolnick Michael' (CIK 0001845264) who is an individual Form 3 filer, not a corporate registrant - ICIJ: 0 matches ✓ (correctly not in any offshore leak) - FEC: rate-limited (DEMO_KEY); fails fast with clear quota message * feat(osint-investigation): expand to 12 sources covering identity, property, courts, archives, news Phase-2 expansion per Teknium feedback that the original 6-source skill (federal financial/regulatory only) wasn't a complete OSINT toolkit. Adds 6 more sources covering the major omissions a real investigation would reach for first. New sources (6 fetch scripts + 6 wiki entries): 1. NYC ACRIS — Real property records (deeds, mortgages, liens) via the city's Socrata API. Search by party name or property address. Joins Parties to Master to populate doc_type, dates, borough, and amount. Coverage: 5 NYC boroughs, ~70M party records, 1966-present. 2. OpenCorporates — Global corporate registry covering 130+ jurisdictions (~200M companies). Free API token at https://opencorporates.com/api_accounts/new raises the rate limit; HTML fallback works without one (limited fields). 3. CourtListener (Free Law Project) — federal + state court opinions (~10M back to colonial era) + PACER dockets via RECAP. Anonymous v4 search works; COURTLISTENER_TOKEN raises rate limits. 4. Wayback Machine CDX — historical web captures (~900B+). Used both for surveillance-of-record (when did this site change?) and as a content-recovery layer when other sources point to dead URLs. 5. Wikipedia + Wikidata — narrative bio + structured facts. Wikipedia OpenSearch for article matching, REST summary for extracts, Wikidata Action API (wbgetentities) for claims. Avoids the SPARQL Query Service which is aggressively rate-limited. 6. GDELT 2.0 DOC API — global news monitoring in 100+ languages, ~2015-present. Auto-retries with 6s backoff on the standard 1-req-per-5-sec throttle. Other changes in this commit: - SEC EDGAR no longer raises SystemExit when the company-name resolver finds no CIK; writes an empty CSV with header so the rest of a pipeline can keep moving and the warning is just on stderr. - _http.py User-Agent updated per Wikimedia policy: includes app name, version, and a 'set HERMES_OSINT_UA to identify yourself' instruction. - SKILL.md workflow now groups sources into two clusters (federal financial vs identity/property/courts/archives/news) with bash examples for each. 'When to use this skill' lists the broader set of investigation patterns the expanded sources unlock. Live sweep results on 'Dillon Rolnick, New York' across all 12 sources: ofac ✓ 0 (correctly clean) icij ✓ 0 (correctly not in any leak) usaspending ✓ 0 (correctly not a federal contractor) senate_lda ✓ 0 (correctly not a lobbying client) sec_edgar ✓ 0, warns: resolved to 'Rolnick Michael' (CIK 0001845264), individual Form 3 filer, NOT a corporate registrant fec — rate-limited (DEMO_KEY exhausted), fails fast with clear quota message nyc_acris ✓ 200 records named Rolnick across NYC; 48 records at 571 Hudson (the property the web identifies as his) opencorporates ✓ 0 (no API token configured; HTML fallback) courtlistener ✓ 0 for 'Dillon Rolnick'; 20 for 'Rolnick' generally; 5 for 'Microsoft' sanity check wayback ✓ 30 captures of nousresearch.com from 2011-present wikipedia ✓ 0 (correctly not notable enough); Bill Gates sanity returns full structured facts (occupation, employer, DOB, place of birth, country) gdelt ✓ 0 for 'Dillon Rolnick'; 5 for 'Nous Research' All 17 scripts compile clean and pass --help. Synthetic analysis pipeline regression still passes (entity_resolution 30 matches, timing p=0.000, findings 2). * feat(osint-investigation): remove FEC; DEMO_KEY rate-limits make it unreliable The FEC fetcher consistently failed the live sweep because the OpenFEC DEMO_KEY tier (40 calls/hour) exhausts on a single investigation, and the upstream returns slow-path query plans for unindexed contributor-name searches that the gateway times out. Without a real API key it's not usable; with one the user has to sign up at api.data.gov first. That's too much setup friction for a skill that should work out of the box. Removed: - scripts/fetch_fec.py - references/sources/fec.md Updated: - SKILL.md frontmatter description + tags - 'When NOT to use' now points users at https://www.fec.gov/data/ for federal donations - entity_resolution example switched from donor↔contractor to lobbying-client↔contractor (Senate LDA + USAspending pair) - timing_analysis example switched to lobbying-filings vs awards - 8 wiki entries had their 'FEC ↔ ...' cross-reference bullets removed 11 sources remain (5 federal financial + 6 identity/property/courts/ archives/news). All scripts compile, pass --help, and the synthetic analysis pipeline still passes on the new lobbying-shaped regression fixture (30 matches, p=0.000 on tight clustering, 2 findings).
Fixes NousResearch#26693 `hermes doctor` currently promotes invalid direct API keys into the final summary even when the matching OAuth path is already healthy. That makes the setup look more broken than it really is. This change keeps the failed API Connectivity row visible but stops treating it as a blocking summary issue when a healthy OAuth fallback already exists for the same provider family. Covered cases: - Gemini OAuth + invalid direct Gemini key - MiniMax OAuth + invalid direct MiniMax key Based on NousResearch#26704 by @worlldz.
…rs (NousResearch#10648) Address two blocking issues when using GitHub Copilot integrations: 1. ACP mode: detect the gh-copilot CLI deprecation error from stderr and surface an actionable message with alternatives instead of hanging or showing a cryptic error. 2. GitHub Models (Azure) 413: recognize models.inference.ai.azure.com as a known GitHub Models URL, and print a targeted hint explaining the hard 8K token limit that makes this endpoint incompatible with Hermes' system prompt size.
…apping Cover the deprecation pattern matching against real gh-copilot stderr output, verify the GitHub Models Azure URL is in _URL_TO_PROVIDER, and confirm _is_github_models_base_url recognises the Azure endpoint.
…ls 413 hint Follow-up improvements on top of @konsisumer's cherry-picked fix for NousResearch#10648: 1. Deprecation patterns required BOTH a product fingerprint ('gh-copilot') and a deprecation marker. The previous list included 'copilot-cli' and bare 'deprecation', which would false-positive on stderr from the NEW @github/copilot CLI — whose repo is literally github.com/github/copilot-cli and which legitimately surfaces those substrings in its own messages. 2. Replace the deprecation hint. The user in NousResearch#10648 installed 'gh extension install github/gh-copilot' (the deprecated extension) thinking that's what ACP mode uses, when ACP actually spawns the new 'copilot' binary from '@github/copilot'. The hint now points users at the correct install command ('npm install -g @github/copilot') with the new CLI's repo URL, and demotes provider-switching to a fallback alternative. 3. Change _URL_TO_PROVIDER value for models.inference.ai.azure.com from the 'github-models' alias to the canonical 'copilot' provider id, matching the convention used by every other entry in the table. 4. Sharpen the 413 hint message. The free tier's ~8K cap is below the system-prompt floor, so this endpoint is fundamentally incompatible with an agentic loop — not a 'use a different URL' problem. Tests: - New parametrized false-positive coverage for the new CLI's stderr shape. - Updated assertion to require canonical 'copilot' provider mapping. - All 14 deprecation/URL tests pass.
…sResearch#4469) (NousResearch#26822) When the agent is running and the user sends multiple TEXT messages in rapid succession, base.py's active-session branch stored the pending event as a single-slot replacement: self._pending_messages[session_key] = event Three rapid messages A, B, C landed as: A (interrupts), B (replaces A before consumer reads), C (replaces B). Only C reached the next turn — A and B were silently dropped. This is the symptom in NousResearch#4469. Route the follow-up through merge_pending_message_event(..., merge_text=True) so TEXT events accumulate into the existing pending event's text instead of clobbering it. Photo and media bursts already merged through the same helper; this just extends the merge_text path (already used by the Telegram bursty-grace branch in gateway/run.py) to all platforms. Test exercises BasePlatformAdapter.handle_message directly with the session marked active and asserts three rapid TEXT events merge to 'part two\\npart three' rather than dropping the middle message. Sanity-checked the test would fail without the fix. Credits @devorun for the original investigation and analysis in NousResearch#4491 that surfaced the underlying queue handling, though their fix targeted GatewayRunner._pending_messages which is now dead state on main.
The PKCE flow reused the code_verifier as the OAuth state parameter. Per RFC 6749 §10.12 and RFC 7636, these serve different purposes: state is an anti-CSRF token visible in the authorization URL; the code_verifier must remain secret for the token exchange. Generate an independent secrets.token_urlsafe(32) for state and validate it on callback to provide actual CSRF protection. Closes NousResearch#10693
Group the secrets import with time and webbrowser at the top of run_hermes_oauth_login_pure(), matching the existing pattern. Drop the _secrets alias — no name conflict in this scope.
…tion Two unit tests for run_hermes_oauth_login_pure(): 1. test_authorization_url_state_is_not_pkce_verifier — asserts state in the auth URL is independent from the PKCE code_verifier sent in the token exchange, and that the verifier never appears in the URL. 2. test_callback_state_mismatch_aborts — asserts the flow returns None (no token exchange) when the callback state does not match the value we generated. Negative control verified: reintroducing the b17e5c1 vulnerable pattern (state = verifier, no callback validation) makes both tests fail. Also adds AUTHOR_MAP entry for shaun0927 (contributor of the fix).
The Foundation Release — Hermes installs and runs anywhere now. Highlights: - Native Windows support (early beta) — PowerShell installer, native subprocess/PTY paths, ~40 follow-up Windows-only fixes - pip install hermes-agent — PyPI wheel - Cold-start wave — ~19s off hermes launch, 180x faster browser_console (CDP WS) - Supply-chain advisory checker + lazy-deps + tiered install fallback - OpenAI-compatible local proxy for OAuth providers (Claude Pro, ChatGPT Pro, SuperGrok) - Cross-session 1h Claude prompt cache (Anthropic / OpenRouter / Nous Portal) - 2 new platforms: LINE + SimpleX Chat (22 total) - Microsoft Graph foundation — Teams pipeline + webhook adapter - /handoff actually transfers sessions live - x_search first-class tool, vision_analyze pixel passthrough - LSP semantic diagnostics on every write - Unified video_generate with pluggable backends - computer_use cua-driver backend - 9 new optional skills, OpenRouter Pareto Code router, xAI Grok OAuth - 12 P0 + 50 P1 closures 808 commits · 633 PRs · 1393 files · 165k insertions · 545 issues closed · 215 contributors
…n/load (NousResearch#12285) (NousResearch#26943) Persisted assistant `reasoning_content` / `reasoning` fields are now emitted as ACP `agent_thought_chunk` notifications during `_replay_session_history`, so editor clients (Zed, etc.) rebuild collapsed Thinking panes when the user re-opens a session that used a thinking model. Ordering matches live streaming: thought precedes message text within the same assistant turn, mirroring how `reasoning_callback` deltas arrive before `stream_delta_callback` deltas in `events.py::make_thinking_cb` / `make_message_cb`. Behavior on non-reasoning histories is unchanged; the replay loop's existing text / tool_call / tool_call_update / plan emission is preserved bit-for-bit. Closes NousResearch#12285. Credit: - @Yukipukii1 (NousResearch#14691) — original thought-replay design via `acp.update_agent_thought_text`; the tool-call portion of that PR has since landed via NousResearch#19139, but the reasoning replay is theirs. - @HenkDz (NousResearch#17652 / NousResearch#18578) — established the `_replay_session_history` and `_history_*` helper conventions this builds on. - @D1zzyDwarf (NousResearch#16531) — also closed by this work.
…ousResearch#12285 follow-up) (NousResearch#26957) Switches `_replay_session_history` from `loop.call_soon`-deferred (after the `LoadSessionResponse` is written) to `await`-inline (before the response is constructed) for both `session/load` and `session/resume`. Adds defensive try/except around the awaited call so a replay helper crash still yields a successful load response — partial transcripts are acceptable, total load failure is not. The deferral was added on May 2 in commit 19854c7 with the rationale "Zed only attaches streamed transcript/tool updates once the load/resume response has completed." That justification was incorrect: - Zed's current ACP integration (zed-industries/zed crates/agent_servers/src/acp.rs) explicitly registers the session-update routing entry BEFORE awaiting the loadSession RPC, with the comment: "so that any session/update notifications that arrive during the call (e.g. history replay during session/load) can find the thread." - Every other reference ACP server (Codex, Claude Code, OpenCode, Pi, agentao) replays history BEFORE responding to the load request. - The ACP spec wording ("Stream the entire conversation history back to the client via notifications") and the natural JSON-RPC reading both mean "during the request's lifetime", not "after the response resolves". Empirical reproduction (reported by Biraj on @agentclientprotocol/sdk v0.21.1): the same custom ACP client works correctly against Codex / Claude Code / OpenCode / Pi but receives 0 notifications from Hermes because it measures the per-call notification count at the moment `loadSession` resolves — which on Hermes was before the `call_soon`- scheduled replay coroutine had a chance to run. Changes: - `acp_adapter/server.py`: remove `_schedule_history_replay`; both `load_session` and `resume_session` now `await self._replay_session_history` before returning, wrapped in try/except that logs and continues on helper exceptions. - `tests/acp/test_server.py`: replace the single `test_load_session_schedules_history_replay_after_response` (which encoded the now-incorrect post-response ordering) with two tests asserting `events == ["replay", "returned"]` for load and resume. Add two regression tests confirming that a replay helper raising still yields a `LoadSessionResponse` / `ResumeSessionResponse` rather than propagating the exception out as a JSON-RPC error. Result: 240 ACP tests pass (was 238), ruff clean. Verified end-to-end: biraj's synchronous notification-counter pattern now sees 6 notifications during `loadSession` for a 5-message session, matching all other reference ACP servers. The `_fenced_text` change in `acp_adapter/tools.py` from the same May 2 commit is orthogonal and intentionally left intact — it's a separate, still-valid fix for Zed's pipe-as-table rendering. Refs NousResearch#12285. Follows up NousResearch#26943 (which added thought-chunk replay but kept the deferral).
|
Important Review skippedToo many files! This PR contains 297 files, which is 147 over the limit of 150. To get a review, narrow the scope: ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (3)
📒 Files selected for processing (297)
You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Superseded by #3, which carries only the minimal hermetic test-fix diff from a clean branch. |
… contract Three test classes lock in the NousResearch#30963 fix: 1. TestPartialStreamStubFinishReason — drives _interruptible_streaming_api_call through the two recovery branches and asserts: - text-only partial → finish_reason="length" (the new behaviour), - mid-tool-call partial → finish_reason="stop" (unchanged on purpose). 2. TestLengthContinuationPromptBranching — pure-Python check on the branch that picks the continuation prompt by response.id. Locks the network error wording for partial-stream-stub vs. the output-length wording for everything else. 3. TestConversationLoopPartialStreamContinuation — feeds a stub + continuation pair into run_conversation, verifies the loop makes a second API call (instead of exiting with text_response(stop)), confirms the network-error continuation prompt actually reaches the model on call #2, and that final_response stitches both halves. Refs: NousResearch#30963
Summary
Validation
pytest -q tests/providers/test_plugin_discovery.py tests/hermes_cli/test_runtime_provider_resolution.py tests/hermes_cli/test_ollama_cloud_auth.py tests/hermes_cli/test_gateway_service.py tests/hermes_cli/test_web_server.py::TestNewEndpoints::test_profiles_create_creates_wrapper_alias_when_safepytest -q tests/providers tests/hermes_cli/test_runtime_provider_resolution.py tests/hermes_cli/test_ollama_cloud_auth.pypytest -q tests/hermes_cliNotes
main.