chore: remove Atropos RL environments and tinker-atropos integration by alt-glitch · Pull Request #26106 · NousResearch/hermes-agent

alt-glitch · 2026-05-15T04:35:22Z

Summary

Remove the entire Atropos RL training integration from hermes-agent. This includes RL environments, tool call parsers, the agent loop for RL, benchmarks, the tinker-atropos submodule, and all associated tooling/config/docs/tests.

What's removed

Deleted entirely (~15,700 lines)

environments/ — 43 files (base env, agent loop, tool call parsers, SWE env, terminal test env, benchmarks, web research env, OPD env)
rl_cli.py — standalone RL training CLI
tools/rl_training_tool.py — all 10 rl_* tools
tinker-atropos git submodule + .gitmodules
7 test files (test_agent_loop, test_agent_loop_vllm, test_agent_loop_tool_calling, test_tool_call_parsers, test_managed_server_tool_support, test_rl_training_tool, test_terminalbench2_env_security)
optional-skills/mlops/hermes-atropos-environments/ (skill + 3 reference docs)
3 website doc pages (environments.md, rl-training.md, mlops-hermes-atropos-environments.md)

Modified (~30 files)

Python source: toolsets.py, model_tools.py, hermes_cli/tools_config.py, hermes_cli/config.py, hermes_cli/doctor.py, hermes_cli/setup.py, hermes_cli/status.py, agent/display.py, tools/budget_config.py
Build: pyproject.toml (remove rl + yc-bench extras, rl_cli module, tinker-atropos excludes), uv.lock regenerated
Install scripts: setup-hermes.sh, scripts/install.sh, scripts/install.ps1, nix/hermes-agent.nix
Config: cli-config.yaml.example, .env.example
Docs: README.md, README.zh-CN.md, CONTRIBUTING.md, AGENTS.md
Website: sidebars.ts + 7 reference/integration pages
Tests: conftest.py, test_model_tools.py, test_set_config_value.py, test_setup_hermes_script.py, test_streaming_tool_call_repair.py

Not touched (historical)

RELEASE_v0.2.0.md, RELEASE_v0.11.0.md — left as-is since they're historical release notes

Verification

All imports clean (toolsets, model_tools, hermes_cli.tools_config)
Zero straggler grep hits for atropos/tinker/rl_* tool names across all source files
uv.lock regenerated — no atroposlib/tinker/yc-bench/wandb entries remain
Test suite running (CI will confirm)

…r-atropos submodule Delete: - environments/ (43 files — base env, agent loop, tool call parsers, benchmarks) - rl_cli.py (standalone RL training CLI) - tools/rl_training_tool.py (all 10 rl_* tools) - tests: test_rl_training_tool, test_tool_call_parsers, test_managed_server_tool_support, test_agent_loop, test_agent_loop_vllm, test_agent_loop_tool_calling, test_terminalbench2_env_security - optional-skills/mlops/hermes-atropos-environments/ - tinker-atropos git submodule + .gitmodules

- toolsets.py: remove rl toolset block + update comment - model_tools.py: remove rl_tools group + update async bridging comment - hermes_cli/tools_config.py: remove RL display entry, _DEFAULT_OFF_TOOLSETS, setup block, and rl_training post-setup handler - tools/budget_config.py: remove RL environment reference in docstring - tests/test_model_tools.py: remove rl_tools from expected groups - tests/run_agent/test_streaming_tool_call_repair.py: fix stale cross-reference

…ct.toml - Remove rl extra (atroposlib, tinker, fastapi, uvicorn, wandb) - Remove yc-bench extra - Remove rl_cli from py-modules - Remove [tool.ty.src] exclude for tinker-atropos - Remove [tool.ruff] exclude for tinker-atropos - Regenerate uv.lock

- setup-hermes.sh: remove entire tinker-atropos submodule install block - scripts/install.sh: remove both tinker-atropos blocks (Termux + standard) - scripts/install.ps1: remove tinker-atropos block - nix/hermes-agent.nix: remove tinker-atropos pip install line

- Delete: environments.md, rl-training.md, mlops-hermes-atropos-environments.md - sidebars.ts: remove rl-training and environments sidebar entries - optional-skills-catalog.md: remove hermes-atropos-environments row - tools-reference.md: remove entire rl toolset section - toolsets-reference.md: remove rl row + update example - integrations/index.md: remove RL Training bullet - architecture.md: remove environments/ from tree + RL section - contributing.md: remove tinker-atropos setup - updating.md: remove tinker-atropos install + stale submodule update

- hermes_cli/config.py: remove TINKER_API_KEY + WANDB_API_KEY env var defs - hermes_cli/doctor.py: remove Submodules check section (tinker-atropos) - hermes_cli/setup.py: remove RL Training status check - hermes_cli/status.py: remove Tinker + WandB from API key status display - agent/display.py: remove both rl_* tool preview/activity blocks - website/docs: remove RL references from providers.md + env-variables.md - tests: remove TINKER_API_KEY from conftest, set_config_value, setup_script

github-actions · 2026-05-15T04:35:41Z

🚨 CRITICAL Supply Chain Risk Detected

This PR contains a pattern that has been used in real supply chain attacks. A maintainer must review the flagged code carefully before merging.

🚨 CRITICAL: Install-hook file added or modified

These files can execute code during package installation or interpreter startup.

Files:

hermes_cli/setup.py

Scanner only fires on high-signal indicators: .pth files, base64+exec/eval combos, subprocess with encoded commands, or install-hook files. Low-signal warnings were removed intentionally — if you're seeing this comment, the finding is worth inspecting.

github-actions · 2026-05-15T04:36:26Z

🔎 Lint report: `hermes/hermes-a899b5e7` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8247 on HEAD, 8374 on base (✅ -127)

🆕 New issues (6):

Rule	Count
`invalid-argument-type`	4
`unresolved-attribute`	1
`unsupported-operator`	1

First entries

tests/hermes_cli/test_tools_config.py:720: [unresolved-attribute] unresolved-attribute: Attribute `get` is not defined on `str` in union `str | dict[str, str | list[Unknown] | bool | list[str]] | dict[str, str | list[Unknown]] | dict[str, str | list[dict[str, str]]] | Unknown`
hermes_cli/tools_config.py:2434: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `dict[Unknown, Unknown]` and `str | list[dict[str, str | list[Unknown] | bool | list[str]] | dict[str, str | list[Unknown]] | dict[str, str | list[dict[str, str]]]] | list[dict[str, str | list[Unknown] | bool | list[str]] | dict[str, str | list[dict[str, str]]]] | ... omitted 3 union elements`
run_agent.py:13750: [invalid-argument-type] invalid-argument-type: Argument to function `_is_oauth_token` is incorrect: Expected `str`, found `str | dict[Unknown | str, Unknown | str | dict[str, str]] | Any | ... omitted 3 union elements`
tests/hermes_cli/test_tools_config.py:534: [invalid-argument-type] invalid-argument-type: Argument to function `_configure_provider` is incorrect: Expected `dict[Unknown, Unknown]`, found `str | dict[str, str | list[Unknown] | bool | list[str]] | dict[str, str | list[Unknown]] | dict[str, str | list[dict[str, str]]] | Unknown`
run_agent.py:7482: [invalid-argument-type] invalid-argument-type: Argument to function `build_anthropic_client` is incorrect: Expected `str`, found `str | dict[Unknown | str, Unknown | str | dict[str, str]] | Any | ... omitted 3 union elements`
run_agent.py:13753: [invalid-argument-type] invalid-argument-type: Argument to function `len` is incorrect: Expected `Sized`, found `(str & ~AlwaysFalsy) | (dict[Unknown | str, Unknown | str | dict[str, str]] & ~AlwaysFalsy) | (Any & ~AlwaysFalsy) | ... omitted 3 union elements`

✅ Fixed issues (107):

Rule	Count
`unresolved-import`	62
`invalid-argument-type`	14
`unresolved-attribute`	13
`invalid-assignment`	12
`not-subscriptable`	2
`invalid-parameter-default`	2
`unsupported-operator`	1
`unresolved-reference`	1

First entries

tools/rl_training_tool.py:108: [unresolved-attribute] unresolved-attribute: Attribute `keys` is not defined on `list[dict[str, str | int | float]]`, `bool` in union `dict[str, str | int | float] | list[dict[str, str | int | float]] | bool`
tools/rl_training_tool.py:1129: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(i: SupportsIndex, /) -> dict[str, str | int | float], (s: slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> list[dict[str, str | int | float]]]` cannot be called with key of type `Literal["max_token_length"]` on object of type `list[dict[str, str | int | float]]`
tests/tools/test_managed_server_tool_support.py:54: [unresolved-import] unresolved-import: Cannot resolve imported module `atroposlib.envs.server_handling.server_manager`
tools/rl_training_tool.py:1285: [invalid-assignment] invalid-assignment: Invalid subscript assignment with key of type `Literal["step_success_rate"]` and value of type `int | float` on object of type `dict[str, str | list[Unknown] | int]`
environments/tool_call_parsers/longcat_parser.py:13: [unresolved-import] unresolved-import: Cannot resolve imported module `openai.types.chat.chat_completion_message_tool_call`
tools/rl_training_tool.py:1131: [not-subscriptable] not-subscriptable: Cannot subscript object of type `bool` with no `__getitem__` method
environments/benchmarks/terminalbench_2/terminalbench2_env.py:317: [unresolved-import] unresolved-import: Cannot resolve imported module `datasets`
environments/hermes_base_env.py:61: [unresolved-import] unresolved-import: Cannot resolve imported module `atroposlib.type_definitions`
environments/benchmarks/terminalbench_2/terminalbench2_env.py:58: [unresolved-import] unresolved-import: Cannot resolve imported module `atroposlib.envs.server_handling.server_manager`
tests/environments/benchmarks/test_terminalbench2_env_security.py:10: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/run_agent/test_agent_loop_vllm.py:80: [unresolved-import] unresolved-import: Cannot resolve imported module `atroposlib.envs.server_handling.server_manager`
environments/web_research_env.py:51: [unresolved-import] unresolved-import: Cannot resolve imported module `pydantic`
tools/rl_training_tool.py:775: [invalid-assignment] invalid-assignment: Invalid subscript assignment with key of type `Literal["wandb_run_name"]` and value of type `str` on object of type `list[dict[str, str | int | float]]`
tests/hermes_cli/test_tools_config.py:720: [unresolved-attribute] unresolved-attribute: Attribute `get` is not defined on `str`, `int` in union `str | dict[str, str | list[Unknown] | bool | list[str]] | dict[str, str | list[Unknown]] | ... omitted 3 union elements`
tools/rl_training_tool.py:963: [unresolved-import] unresolved-import: Cannot resolve imported module `wandb`
tools/rl_training_tool.py:778: [invalid-assignment] invalid-assignment: Invalid subscript assignment with key of type `Literal["wandb_name"]` and value of type `Any & ~AlwaysFalsy` on object of type `list[dict[str, str | int | float]]`
tests/run_agent/test_agent_loop.py:380: [unresolved-attribute] unresolved-attribute: Unresolved attribute `get_state` on type `MockServer`
environments/agentic_opd_env.py:459: [unresolved-import] unresolved-import: Cannot resolve imported module `datasets`
environments/tool_call_parsers/mistral_parser.py:16: [unresolved-import] unresolved-import: Cannot resolve imported module `openai.types.chat.chat_completion_message_tool_call`
tests/run_agent/test_agent_loop.py:16: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
environments/benchmarks/yc_bench/yc_bench_env.py:663: [unresolved-import] unresolved-import: Cannot resolve imported module `tqdm`
environments/terminal_test_env/terminal_test_env.py:44: [unresolved-import] unresolved-import: Cannot resolve imported module `atroposlib.envs.server_handling.server_manager`
environments/agentic_opd_env.py:84: [unresolved-import] unresolved-import: Cannot resolve imported module `atroposlib.envs.server_handling.server_manager`
rl_cli.py:358: [invalid-assignment] invalid-assignment: Object of type `str | None` is not assignable to `str`
tests/tools/test_rl_training_tool.py:57: [unresolved-attribute] unresolved-attribute: Unresolved attribute `api_log_file` on type `RunState`
... and 82 more

Unchanged: 4302 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

* test(ci): stabilize shared optional dependency baselines * fix(install): preserve pip entry point when re-running on symlinked install setup_path() writes the user-facing hermes shim with `cat >`, which follows existing symlinks. Older installs created `$command_link_dir/hermes` as a symlink to `$HERMES_BIN` (`venv/bin/hermes`), so re-running install.sh stomped the pip entry point with a bash shim that exec'd itself in an infinite loop. `rm -f` the link target before writing so the shim lands at `$command_link_dir/hermes` and the venv entry point is left intact. Adds a regression test that reproduces the symlink-stomp end-to-end (creates the symlink, drives the real shim-write block from setup_path, asserts the venv pip script body survives and the shim is now a regular file). Both new assertions fail on origin/main and pass with the fix. Closes #21454. * feat(discord): render clarify choices as buttons Brings Discord to parity with Telegram on the clarify tool's interactive UX. Overrides BasePlatformAdapter.send_clarify on DiscordAdapter to attach a button view when choices are present. - ClarifyChoiceView: one discord.ui.Button per choice (max 24, Discord's 25-component view cap leaves one slot for Other) plus a final 'Other (type answer)' button. - Numeric click -> tools.clarify_gateway.resolve_gateway_clarify( clarify_id, choice_text) using the canonical choice text from the gateway entry (falls back to the button label if the entry vanished). - Other click -> tools.clarify_gateway.mark_awaiting_text(clarify_id) so the gateway's text-intercept captures the next user message in this session as the response. - Auth via the shared _component_check_auth helper (same OR-semantics as ExecApprovalView / SlashConfirmView / UpdatePromptView / ModelPickerView). - Open-ended (no choices) path renders the prompt as a plain embed and relies on the existing text-intercept resolution. - Single-use: first valid click disables every button and updates the embed footer with who answered and what they chose. No changes to BasePlatformAdapter.send_clarify or the gateway's clarify_callback wiring -- the existing scaffolding already drives all adapters; Discord just inherits the default text fallback today and gains buttons by virtue of this override. Test conftest extended: _FakeEmbed gains add_field() / set_footer() stubs so tests can construct embedded views without monkey-patching per-test. Original PR: #19249 by @LeonSGP43. This is a reshape of the contributor's work onto current main's clarify infrastructure (clarify_id + entry-based resolution shared with Telegram, instead of a parallel on_answer-closure mechanism). The button view structure and UX shape are preserved. Tests: 14 new tests in tests/gateway/test_discord_clarify_buttons.py. 391/391 existing Discord gateway tests still pass. Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com> * fix(cli): allow rotating broken OpenRouter / AI Gateway key in `hermes model` flow (#25750) Before: when `OPENROUTER_API_KEY` (or `AI_GATEWAY_API_KEY`) was already set in ~/.hermes/.env, `hermes model openrouter` / `hermes model ai-gateway` skipped the API-key prompt entirely and jumped straight to the model picker. Users with a broken / expired / wrong key had no way to replace it without editing ~/.hermes/.env by hand or re-running `hermes setup` from scratch. Both flows now route through the existing `_prompt_api_key()` helper, which surfaces [K]eep / [R]eplace / [C]lear when a key is already configured — the same UX the generic API-key providers (z.ai, MiniMax, Gemini, etc.) and the Daytona setup already use. * fix(install.ps1): pin uv sync to venv\, verify baseline imports on Windows (#25755) * fix(cli): allow rotating broken OpenRouter / AI Gateway key in `hermes model` flow Before: when `OPENROUTER_API_KEY` (or `AI_GATEWAY_API_KEY`) was already set in ~/.hermes/.env, `hermes model openrouter` / `hermes model ai-gateway` skipped the API-key prompt entirely and jumped straight to the model picker. Users with a broken / expired / wrong key had no way to replace it without editing ~/.hermes/.env by hand or re-running `hermes setup` from scratch. Both flows now route through the existing `_prompt_api_key()` helper, which surfaces [K]eep / [R]eplace / [C]lear when a key is already configured — the same UX the generic API-key providers (z.ai, MiniMax, Gemini, etc.) and the Daytona setup already use. * fix(install.ps1): pin uv sync target to venv\, verify baseline imports Two related Windows-installer bugs that produce a broken venv with `ModuleNotFoundError: No module named 'dotenv'` on first `hermes` run. ## Bug 1: uv sync ignores VIRTUAL_ENV, syncs into .venv\ instead of venv\ `Install-Dependencies` creates the venv at `venv\` via `uv venv venv`, sets `$env:VIRTUAL_ENV = "$InstallDir\venv"`, then runs `uv sync --extra all --locked`. Modern uv (>=0.5) ignores `VIRTUAL_ENV` for the `sync` subcommand and uses the project default `.venv\` instead. Result: deps land in `$InstallDir\.venv\`, `venv\` stays empty except for the python.exe stub from the earlier `uv venv` call, `hermes.exe` ends up wired to the wrong site-packages. The bash installer (`scripts/install.sh`) already worked around this in `install_deps()` line 1127 by passing `UV_PROJECT_ENVIRONMENT` — that flag tells uv exactly where to put the project env regardless of `VIRTUAL_ENV`. Port the same fix to PowerShell. ## Bug 2: no post-install verification If the sync still misdirects for any other reason (uv version drift, filesystem quirk, user re-run scenarios), the installer reports success and the user only finds out by running `hermes` and getting an unhelpful traceback. Add a baseline-import probe that runs the venv's own python against the four packages every `hermes` invocation needs (`dotenv`, `openai`, `rich`, `prompt_toolkit`). On failure, throw with a recovery command tailored to whether a sibling `.venv\` exists. User report (Windows 11, Python 3.13.5, Hermes v0.13.0): manual repro steps were exactly this — `uv sync` landed in `.venv\`, recovered by junctioning `venv\` → `.venv\` to bridge the path mismatch. * fix(telegram): escape dynamic markdown in callback flows Use MarkdownV2 formatting for Telegram callback follow-ups and interactive prompts where dynamic names or user text can break legacy Markdown parsing. Add regression coverage for reload-mcp, model picker, approval callbacks, and update prompts. * fix(telegram): restore model-switch success path + author map The cherry-picked PR over-indented the edit_message_text block for the mm: (model selected → switch) success path so the confirmation edit lived inside the preceding 'except Exception as exc' branch and only fired when the callback raised. Dedent the try/except back to 12-space indent so it runs after the callback succeeds, restoring the original flow that removes the inline buttons and shows the 'Switched to ...' confirmation. Add a regression test (test_model_selected_edits_message_on_success) that asserts edit_message_text is awaited and the result text is routed through format_message (MARKDOWN_V2 + backtick survival). Add phuongvm to scripts/release.py AUTHOR_MAP. * fix(memory): skip OpenViking upload symlinks * fix(codex-runtime): retire wedged sessions + post-tool watchdog + OAuth refresh classify (#25769) Mirrors openclaw beta.8's app-server resilience fixes so a stuck codex subprocess can't burn the full turn deadline and so users get a `codex login` pointer instead of raw RPC errors when their token expires. - TurnResult.should_retire signals the caller to drop+respawn codex. - Deadline-hit path and dead-subprocess detection set should_retire so the next turn doesn't ride a CPU-spinning or auth-broken process. - Post-tool watchdog (post_tool_quiet_timeout=90s): if a tool item completes and codex goes silent past the threshold without further output or turn/completed, fast-fail instead of waiting the full 600s. Resets on any non-tool activity so normal think-after-tool flows are not affected. - <turn_aborted> and <turn_aborted/> in agent text are treated as terminal — some codex builds tear down a turn that way without emitting turn/completed. - _classify_oauth_failure() inspects RPC error message + stderr tail for invalid_grant / token refresh / 401 / etc. and rewrites user-facing errors to 'run codex login'. Conservative: generic failures still surface verbatim. Fires at turn/start failure, turn/completed failure, and dead-subprocess paths. - thread/start cross-fill: tolerate thread.id, thread.sessionId, top-level sessionId/threadId so future codex schema drift doesn't KeyError us at handshake. - run_agent.py: when run_turn returns should_retire=True OR raises, close + null self._codex_session so the next turn respawns. Tests: +30 cases across session + integration suites. tests/agent/transports/test_codex_app_server_session.py 50/50 pass tests/run_agent/test_codex_app_server_integration.py 27/27 pass Broader codex scope (transports + cli runtime/migration) 376/376 pass * chore(release): add AUTHOR_MAP entries for second new-contributor batch Pre-stages AUTHOR_MAP for 7 new contributors in the upcoming batch: - HxT9 (#25760) - evgyur (#25651) - AsoTora (#25624) - oxngon (#25603) - yifengingit (#25589) - vanthinh6886 (#25562) - Arkmusn (#25559) EthanGuo-coder, wesleysimplicio, and zccyman are already in the map. * fix: read approvals.timeout from config in CLI approval callback The _approval_callback method in HermesCLI hardcoded timeout=60 instead of reading the approvals.timeout config value. This meant the config setting was silently ignored for CLI interactive prompts. Other approval paths (callbacks.py, tools/approval.py) already read the config correctly — only cli.py was missed. * fix: use AUTOINCREMENT id for message ordering instead of timestamp On WSL2 (and similar environments), time.time() is not strictly monotonic due to NTP sync or host clock adjustments. When clock regression occurs during a multi-tool flush, later-inserted rows get earlier timestamps, causing ORDER BY timestamp, id to sort them before rows that were written first. This breaks the tool_calls/tool_response adjacency invariant and triggers HTTP 400 from the API. Use ORDER BY id instead, since id (INTEGER PRIMARY KEY AUTOINCREMENT) always reflects true insertion order regardless of system clock behavior. * docs: clarify media impact on session context * fix: stop retrying initial MCP auth failures * fix(gateway): enable text-intercept for multi-choice clarify fallback (#25567) * fix: restrict .env file permissions to 0600 Set file mode 0600 on ~/.hermes/.env after creation in the installer and after every write via memory_setup._write_env_vars(). This ensures only the file owner can read/write API keys and tokens, matching standard practice for credential files (.netrc, .aws/credentials, .ssh/config). Fixes #25477 * fix(gateway): forward image attachments to background agent tasks When the gateway spawned a background agent (e.g. for delegation), media URLs and types from the originating message weren't forwarded — the bg agent saw the prompt but no attached images. Vision-enabled tasks effectively lost their inputs. Forwards media_urls/media_types through the bg-task spawn path and runs the same vision-enrichment step the main flow uses, so the bg agent gets image descriptions inlined into its prompt. Closes #25614. Salvage of #25603 by @oxngon (manually re-applied — original branch was severely stale against current main). * fix(terminal): prevent safety filter false positives on keywords inside quoted strings The _foreground_background_guidance() function matched background-wrapper keywords (nohup/disown/setsid) anywhere in the command text, including inside quoted strings, Python -c code, commit messages, and PR body text. Two-layer fix: 1. Strip single-quoted, double-quoted, and backtick-quoted content before pattern matching via _strip_quotes() helper. 2. Tighten the regex to only match keywords at command-start positions (after ^, ;, &, &&, ||, or $() — not mid-argument. Both layers are needed: quote stripping handles the common case of keywords in string literals, and the position-aware regex handles unquoted cases like 'export FOO=setsid' (word boundary match, wrong position). Fixes #20064 * chore(release): map oswaldb22 noreply email for AUTHOR_MAP Co-Authored-By: Oswald <oswaldb22@users.noreply.github.com> * test(toolsets): lock web search into default platform coverage Adds regression tests pinning web search into the WhatsApp and api-server default platform-coverage toolsets. Pure test additions, no runtime change. Salvage of the test-addition commit from #25692 by @wesleysimplicio. (The AUTHOR_MAP fixup commit from the same PR landed separately as 529ec85c7.) * fix(update): refresh lazy-installed backends on hermes update (#25766) Pyproject's [all] extra was slimmed down in May 2026 — ~20 optional backends moved to tools/lazy_deps.py and only install on first use. hermes update runs uv pip install -e .[all] which doesn't touch any of them, so pin bumps in LAZY_DEPS (CVE response, transitive fixes) were silently ignored on already-activated backends. Two changes: 1. _is_satisfied() now parses the spec and checks the installed version against the constraint via packaging.specifiers. Previously it returned True the moment the package name was importable, which made ensure() a name-presence gate rather than a version-pin gate. 2. New active_features() / refresh_active_features() pair: lists every feature with at least one of its packages currently installed, then re-runs ensure() on each. Refresh is invoked at the end of _cmd_update_impl, right after the [all] install completes. Cold backends (never activated) stay quiet — no churn for them. Output during update is one summary block: → Refreshing 4 active lazy backend(s)... ↑ 1 refreshed: provider.anthropic ✓ 3 already current or ⚠ memory.honcho failed to refresh: <pip stderr> Failures never raise out of update — backends keep their previously- installed version and we tell the user to rerun once upstream is fixed. security.allow_lazy_installs=false is honored: features get marked "skipped" with the reason shown. Tests: 18 new unit tests covering version-aware satisfaction (exact pin, range, extras blocks, missing package, malformed spec), active feature discovery, and refresh status reporting. All 61 lazy_deps tests pass. * fix(agent/gemini-cloudcode): seed delta defaults for reasoning-only stream chunks _make_stream_chunk built delta_kwargs with only `role`, so a reasoning-only chunk produced a SimpleNamespace without a `.content` attribute. Downstream consumers that read `delta.content` then raised AttributeError on Gemini 2.5 Flash, where the thinking delta arrives before any content delta. Seed `content`, `tool_calls`, `reasoning`, and `reasoning_content` as None up front, matching the pattern already used in gemini_native_adapter.py. Key-present arguments still override the defaults. Fixes #24974 References: Related open PR #24984 (luyao618) applies the same 1-line fix; this PR adds a regression test that #24984 omits Co-Authored-By: Claude <noreply@anthropic.com> * fix(install): support non-sudo service-user installs on apt distros (#25814) The Debian/Ubuntu branch of install_node_deps() ran 'npx playwright install --with-deps chromium' unconditionally. Playwright invokes sudo interactively to apt-install Chromium's system libraries, which blocks the installer for non-sudo users (systemd service accounts, unprivileged operator users) on an unsatisfiable password prompt. Changes: - install.sh: gate --with-deps behind a sudo capability check on the apt branch (matches the existing Arch/pacman branch pattern). Non-sudo users fall back to 'npx playwright install chromium' alone and the installer prints the exact 'sudo npx playwright install-deps chromium' command an administrator can run separately. - install.sh: add --skip-browser (alias --no-playwright) to skip the Playwright step entirely for headless installs that don't need browser automation. Mirrors the existing --no-venv / --skip-setup shape. - installation.md: add a 'Non-Sudo / System Service User Installs' section covering the admin/service-user split, the --skip-browser flag, and the ~/.local/bin PATH gotcha (the root cause of the 'No module named dotenv' error users hit when running the repo source 'hermes' script with system Python instead of the venv launcher). - test_install_sh_browser_install.py: regression coverage for the --skip-browser flag and the sudo-gate on the apt branch. Reported by @ssilver in Discord. * skill(comfyui): add template-integrity reference from @purzbeats (#25828) Adds references/template-integrity.md covering safe conversion of the official comfyui-workflow-templates package from editor format to API format — Reroute bypass via link tracing, dotted dynamic-input keys (values.a, resize_type.width) that must NOT be flattened, server-error "patch don't rebuild" loop, Cloud quirks (302 redirect to signed GCS URL, free-tier 1 concurrent job, 1920x1080 OOM on RTX 5090), and a Discord-compatible ffmpeg stitch recipe (yuv420p + xfade/acrossfade). SKILL.md lists the new reference so the agent loads it when starting from an official template. purzbeats added to author list and to scripts/release.py AUTHOR_MAP. Co-authored-by: purzbeats <97489706+purzbeats@users.noreply.github.com> * fix(whatsapp): drop status broadcasts and channel newsletters before agent dispatch (#25845) WhatsApp pseudo-chats (Status updates / Stories, Channels / Newsletters, broadcast lists) were being routed through the full agent pipeline. A user's gateway.log showed the agent replying to a contact's Story ('status@broadcast') with 345 chars plus title-generation cost, which also shows up in the contact's status feed. Drop these JIDs at _should_process_message() before the policy gate so they're filtered regardless of dm_policy or allowlist state. Covers: - status@broadcast (Stories) - *@newsletter (Channels) - *@broadcast (broadcast lists, future-proofing) The bridge.js already filters these on the fromMe outbound path, but inbound events on self-chat mode skipped that check. Tests: - status@broadcast dropped on open policy - broadcast filter wins over allowlisted senders - real DMs still pass through - helper unit cases (case-insensitive, whitespace-tolerant) 26/26 tests/gateway/test_whatsapp_group_gating.py pass; 59/59 adjacent WhatsApp test suites pass. * fix(ci): stabilize shared test state after 21012 * fix(telegram): set REQUIRES_EDIT_FINALIZE so final MarkdownV2 edit is not skipped When the final streamed text is identical to the last plain-text edit, stream_consumer._send_or_edit short-circuits and never calls adapter.edit_message(finalize=True). For Telegram, this skips the plain-text → MarkdownV2 conversion, leaving raw Markdown syntax visible to the user. Set REQUIRES_EDIT_FINALIZE = True on TelegramAdapter so the finalize edit is always delivered, matching the existing DingTalk pattern. Fixes #25710 * fix(gateway): load streaming config from nested gateway.streaming key `hermes config set gateway.streaming.*` writes the streaming block nested under a `gateway:` key in config.yaml, but the config loader only checked for a top-level `streaming:` key — silently ignoring the nested variant. Fall back to `yaml_cfg['gateway']['streaming']` when the top-level key is absent, matching the pattern already used for other nested config sections. Closes #25676 * fix(gateway): prevent duplicate final send when only cosmetic edit failed When the stream consumer's got_done handler successfully delivers the final response content via _send_or_edit but the subsequent edit (e.g. cursor removal) fails, final_response_sent remains False even though the user has already received the final answer. The gateway's fallback send path then re-delivers the same content, causing the user to see the response twice on Telegram. Introduce a new _final_content_delivered flag on the stream consumer, set by the got_done handler when the final content has reached the user. The _run_agent suppression logic now treats this flag as an additional signal (alongside final_response_sent and response_previewed) that final delivery is already complete. This preserves the existing behavior for intermediate-text-only streams (where already_sent=True but no final content has been delivered) — those still receive the gateway's fallback send, matching the test expectation in test_partial_stream_output_does_not_set_already_sent. Adds TestFinalContentDeliveredSuppression with two cases covering both the suppression (content delivered + edit failed) and the non-suppression (intermediate text only) branches. * fix(agent): keep image tool results from poisoning text-only sessions * fix(codex-app-server): attach redacted stderr tail to generic failures (#25929) When codex app-server fails outside the OAuth-classified path (non-auth turn/start errors, plain TimeoutErrors, generic turn-ended status, subprocess silently exits, hard deadline timeout), the user got a bare 'Internal error' / 'turn/start failed: ...' with no context. Diagnosing config/provider/auth-bridge issues forced a re-run with verbose codex flags. Add a _format_error_with_stderr helper that appends the last few stderr lines via agent.redact.redact_sensitive_text(force=True), and use it at every catch-all error site: - ensure_started() failures (codex init / thread/start) now return a TurnResult.error with should_retire=True instead of bubbling - non-OAuth turn/start CodexAppServerError / TimeoutError - subprocess-died branch (previously dumped raw stderr_blob[-300:] with no redaction — a leak risk) - turn ended with non-completed status - hard turn-timeout deadline OAuth-classified failures and the post-tool quiet watchdog already produce clean hints and stay unchanged. The redactor catches sk-*, gh*_*, Authorization: Bearer, query-string tokens, JWTs, private keys, etc., so provider error payloads can't leak into chat output or trajectories. Inspired by openclaw#80718, adapted for our app-server transport. * fix(cli): batch resize history replay * chore(release): map @1000Delta in AUTHOR_MAP * fix(voice): remove per-tool-call beep in CLI voice mode (#25967) The spinner already shows tool activity visually; the 1.2 kHz tone on every tool.started event was unwanted noise (especially on WSL2, where each beep also triggers Windows Terminal's bell notification). Removed the play_beep call in _on_tool_progress entirely. Record start/stop beeps (gated by voice.beep_enabled) are unaffected. * fix: preserve ansi output history on resize replay * chore(release): map @LeonSGP43 commit email in AUTHOR_MAP * fix(cli): clamp scrollback box widths + suppress status bar after resize (#25975) When the terminal shrinks, already-printed box-drawing rules (response, reasoning, streaming TTS, background-task Panels) reflow into multiple narrower rows — visible as duplicated horizontal separators / ghost lines in scrollback. Similarly, prompt_toolkit redraws a fresh status bar on SIGWINCH on top of one the terminal just reflowed, producing double-bar artifacts on column shrink. Two surgical changes: 1. Decorative scrollback boxes now use a new `HermesCLI._scrollback_box_width()` helper that clamps to `max(32, min(width, 56))`. The live TUI footer is unaffected and still uses the full width. Covers: streaming response box (open + close), reasoning box (open + close, both streaming and post-stream paths), streaming-TTS box close, final-response Rich Panel, and the background-task Rich Panel. 2. `_recover_after_resize()` now also sets a new `_status_bar_suppressed_after_resize` flag so the dynamic status bar and both input separator rules stay hidden until the next user input. The flag is cleared in the process loop the moment the user submits their next prompt, restoring chrome cleanly. Tests: - New `test_input_rules_hide_after_resize_until_next_input` covers the flag's effect on rule heights. - New `test_scrollback_box_width_caps_to_resize_safe_value` covers the helper at floor / cap / mid-range / overflow. - Existing resize-recovery test extended to assert the flag flips. Refs: #18449 #19280 #22976 Salvage of #24403. Co-authored-by: Szymonclawd <szymonclawd@mac.home> * fix(ui-tui): heal same-dimension alt-screen resize drift - Treat same-dimension resize events in alt-screen mode as a repaint signal, because terminal hosts can reflow or restore the physical buffer without changing columns/rows. - Ensure pending resize erases are emitted even when the virtual diff is empty, so stale physical glyphs are still cleared. - Extract alt-screen resize repaint into prepareAltScreenResizeRepaint() for readability. - Add defensive clearTimeout in prepareAltScreenResizeRepaint so rapid resize bursts don't stack redundant delayed repaints. - Add a focused regression test for same-dimension alt-screen resize healing. Addresses #18449 Related to #17961 * chore(release): map @luoyuctl in AUTHOR_MAP * feat(proxy): local OpenAI-compatible proxy for OAuth providers (#25969) Adds 'hermes proxy start' — a local HTTP server that lets external apps (OpenViking, Karakeep, Open WebUI, ...) use a Hermes-managed provider subscription as their LLM endpoint. The proxy attaches the user's real OAuth-resolved credentials to each forwarded request, refreshing them automatically; the client can send any bearer (it gets stripped). Ships with one adapter — Nous Portal. The UpstreamAdapter ABC and registry in hermes_cli/proxy/adapters/ are designed for additional OAuth providers to plug in by name without server changes. Commands: hermes proxy start [--provider nous] [--host 127.0.0.1] [--port 8645] hermes proxy status hermes proxy providers Allowed Portal paths: /v1/chat/completions, /v1/completions, /v1/embeddings, /v1/models. Anything else returns 404 with a clear error pointing at the allowed list. aiohttp is gated like gateway/platforms/api_server.py (try-import, clean runtime error if missing). No new core dependency. Tests: 24 unit tests + 1 separate E2E that spawns the real subprocess and verifies the upstream receives the right bearer with the client's header stripped. * feat(discord): channel history backfill for multi-user sessions Adds optional channel-context backfill for Discord shared-channel sessions so the agent can see recent messages it missed between its own turns (typically when require_mention=true filters out most traffic). Previously the agent only saw the @mention message that triggered it, which led to disorienting replies in active multi-user channels where the conversation context was invisible. With backfill enabled, a configurable number of recent messages are fetched per-turn and prepended to the trigger message as a context block, kept separate from sender-prefix logic so attribution remains clean. This re-opens the work from #13063 (approved by @OutThisLife on 2026-04-20, closed when I closed the branch to address the simpolism:main head-branch issue plus an ordering bug I caught later in live use). Filing against the freshly-rewritten problem statement in #13054 so the design is grounded in the failure mode rather than the implementation shape. The implementation follows the **push-mode last-self-anchored** design from the two options laid out in #13054. See the issue for the trade-off discussion vs pull-mode (#13120 was an earlier closed PR using that shape). Treating this as a reference implementation — happy to rewrite as last-trigger anchoring or as a hybrid with #13120 if maintainers prefer. Changes: - gateway/platforms/discord.py: - new `_discord_history_backfill()` / `_discord_history_backfill_limit()` helpers (config.extra > env > default), mirroring the existing `_discord_require_mention()` shape - new `_fetch_channel_context()` that scans `channel.history()` backwards from the trigger to the bot's last message (or limit), formats as `[Recent channel messages] / [name] msg / ...`, respects DISCORD_ALLOW_BOTS, skips system messages - per-channel `_last_self_message_id` cache to narrow the fetch window on hot paths (avoids full history scan when the bot has spoken recently) - **IMPORTANT**: passes `oldest_first=False` explicitly to `channel.history()`. discord.py 2.x silently flips the default to True when `after=` is supplied, which would select the EARLIEST N messages after our last response instead of the LATEST N before the trigger. In high-traffic windows this would return stale tool traces and drop the actual final answer the user is asking about. See regression test below. Caught in live use during a Codex tool-trace burst on May 13 2026. - gateway/config.py: discord_history_backfill + discord_history_backfill_limit settings + yaml→env bridge - gateway/platforms/base.py: channel_context field on MessageEvent - gateway/run.py: prepend channel_context after sender-prefix so the [sender name] tag applies to the trigger message alone, not to the backfill - hermes_cli/config.py: defaults for new discord.history_backfill and discord.history_backfill_limit keys - cli-config.yaml.example: documented defaults - tests/gateway/test_discord_free_response.py: 7 new tests covering cold-start backfill, self-message stop boundary, other-bot filtering, cache hot-path narrowing, stale-cache fallback, shared-channel + per-user backfill paths, and the ordering regression test (`test_fetch_channel_context_cache_uses_latest_window_when_after_set`) - tests/gateway/test_config.py: yaml→env bridge tests - tests/gateway/test_session.py: prefix-order edge cases - website/docs/user-guide/messaging/discord.md: env vars + config keys + usage docs Tested on Ubuntu 24.04 — empirically validated in my own multi-bot Discord research server for the past three weeks. Fixes #13054 Supersedes #13063 (closed) * feat(discord): default history backfill on, expand to per-user + threads Follow-up to snav's PR #25463 contribution: flip default to on, broaden scope so backfill fires whenever require_mention gates the bot (not just shared-session channels). Why: - The mention-gate creates a session-transcript gap regardless of whether the channel is shared or per-user. In per-user sessions, Alice's session is still missing other participants' messages and her own pre-mention messages — backfill fills both gaps. - Threads naturally scope to thread-only history because discord.py's channel.history() on a thread returns only that thread's messages. - DMs still skip — every DM triggers the bot, so the session transcript is already complete. Changes: - hermes_cli/config.py: discord.history_backfill default → true - gateway/platforms/discord.py: drop the _is_shared gate, keep _is_dm skip and _needed_mention gate; env var DISCORD_HISTORY_BACKFILL default → 'true' - cli-config.yaml.example + website docs: update defaults and prose; add the DISCORD_HISTORY_BACKFILL / _LIMIT env var rows that were documented in the PR description but missing from the env-var table - tests/gateway/test_discord_free_response.py: - flip test_discord_per_user_channel_does_not_backfill → test_discord_per_user_channel_backfills_too (new behavior) - add test_discord_dm_does_not_backfill (DM skip is invariant) - give FakeThread a no-op history() so existing thread tests don't hit a fake discord.Forbidden when backfill now fires on threads too Tests: 160/160 in target files; 400/400 across all tests/gateway/ -k discord. * fix(web): make sync-assets script cross-platform The prebuild step used `rm -rf` and `cp -r`, which fail on Windows (`'rm' is not recognized`). Replace with an inline Node one-liner using fs.rmSync / fs.cpSync so the build works on Windows, macOS, and Linux without adding a dependency. * fix(lsp): shift baseline diagnostics into post-edit coordinates (#25978) Pre-existing diagnostics below an edit point used to surface as 'LSP diagnostics introduced by this edit' whenever the edit deleted or inserted lines. The delta-filter key included the diagnostic's range, so the same logical error reported at a different line in the post-edit snapshot looked like a brand new diagnostic. Concrete case: deleting 14 lines in cli.py caused Pyright errors at lines 9873, 10590, 12413, 13004 (unrelated to the edit) to be reported as introduced by it. Fix: build a piecewise-linear line-shift map (via difflib's SequenceMatcher) from pre and post content, and remap baseline diagnostics into post-edit coordinates before the set-difference. Diagnostics in deleted regions drop out cleanly; diagnostics below the edit shift by the right amount; diagnostics above are untouched. The strict (range-aware) equality key stays — so a genuinely new instance of an identical error class at a different line still surfaces as new. Pieces: - agent/lsp/range_shift.py — build_line_shift, shift_diagnostic_range, shift_baseline. Pure functions, no LSP state. - agent/lsp/manager.py — LSPService.get_diagnostics_sync gains an optional line_shift kwarg; baseline is shift_baseline'd before computing the seen-set. _diag_key keeps the strict range key. - tools/file_operations.py — write_file captures pre_content for any LSP-handled extension (not just LINTERS_INPROC) and passes pre/post to _maybe_lsp_diagnostics, which builds the shift map. - New _lsp_handles_extension helper guards the pre_content read. Trade-offs preserved: - Genuinely new same-class errors at different lines still surface (content-only key would have swallowed them). - Pre-existing errors at unshifted positions still get filtered (covered by the strict-key path with no shift). - Best-effort: when pre_content can't be captured (file didn't exist, permissions), the unshifted comparison still catches most pre-existing errors; the edge case it misses is a new file with a non-empty baseline, which is structurally impossible. * fix(web): cross-platform sync-assets + surface build errors on failure Three Windows-only bugs in the web-dashboard build path. Each is small, scoped, and verified end-to-end on Windows 11 — including under a stock cmd.exe / PowerShell console with its default cp1252 encoding. 1. `sync-assets` shells out to Unix-only commands web/package.json hard-codes `rm -rf … && cp -r …`. Neither exists on Windows cmd.exe. `hermes_cli/main.py::_build_web_ui` runs npm via subprocess (which on Windows defaults to cmd.exe), so the prebuild hook crashed before Vite ever ran and the dashboard never built. Fix: web/scripts/sync-assets.mjs — ~20 lines of Node using fs.rmSync + fs.cpSync (stdlib, Node >= 16.7). No new deps, identical behavior on POSIX and Windows. 2. Build failures were silent _build_web_ui ran both subprocess calls with capture_output=True and never relayed the captured buffers on failure. Users saw 'Web UI build failed' and nothing else — no stdout, no stderr, no hint that the real problem was 'rm is not recognized'. Fix: inner _relay() helper that decodes and prints stdout + stderr (utf-8, errors='replace') whenever a step returns non-zero. Replaces the existing stderr_tail-only relay on the build path; success path is unchanged. (stderr_tail is preserved for the stale-dist fallback branch added by #23817.) Salvaged from #13368 by @johnisag onto current main. Conflict resolution preserves main's improvements: - _run_npm_install_deterministic() (replaces bare subprocess.run for npm install) - npm-build retry-after-sleep for Windows boot-time races (#23817) - stale-dist fallback for non-interactive callers (#23817) Closes #25073, #13368. * fix(web): handle non-UTF8 Windows console encodings in _build_web_ui Codex review pointed out that even with the sync-assets fix applied, _build_web_ui still crashes on a stock Windows console before reaching npm: Python stdout defaults to cp1252 (or similar) and raises UnicodeEncodeError when print() hits the arrow/check glyphs used for status messages (→, ✗, ⚠, ✓). Reproduced locally in PowerShell: $ PYTHONIOENCODING=cp1252 python -c "from hermes_cli.main import _build_web_ui; _build_web_ui(Path('web'), fatal=True)" UnicodeEncodeError: 'charmap' codec can't encode character '\u2192' ... The previous PR body claimed "end-to-end verified on Windows 11", but that was under the venv's default (utf-8) stdout. A plain `py` or PowerShell invocation would still fail before sync-assets ever ran. Fix: inner _say() helper that falls back to text.encode(sys.stdout.encoding, errors="replace") when print() raises UnicodeEncodeError. Glyphs degrade to '?' on ASCII / cp1252 consoles; utf-8 consoles are unaffected. Verified the full build pipeline runs to completion with PYTHONIOENCODING=cp1252. Scoped tightly to _build_web_ui (the function this PR already touches); other call sites in the codebase with the same risk are out of scope. * chore(release): map agorgianitisj@hotmail.com -> johnisag * fix(proxy): suppress false-positive windows-footgun on guarded add_signal_handler The call site at line 246 is already wrapped in try/except NotImplementedError (added in #25969). The checker just doesn't peek at surrounding context. Mark with the suppression comment so the blocking check passes. * fix(cli): wire /sessions slash command in the classic CLI The 'sessions' command has been registered in the central command registry since #20805 (May 2025) and surfaces in /help and tab-completion, but the classic CLI's process_command() never had an elif branch for it. The canonical name fell through and printed 'Unknown command: sessions'. The TUI side was wired up correctly via the SessionPicker overlay; only the legacy CLI was missing the dispatch. Adds _handle_sessions_command() which mirrors /resume's no-arg behavior inline (the CLI has no overlay primitive equivalent to the TUI picker): - /sessions and /sessions list → print the recent-sessions table - /sessions <id_or_title> → delegates to _handle_resume_command Includes regression tests covering the dispatcher wiring (the original bug) plus the three handler branches. * chore(release): map phil.thomas@gametime.co -> explainanalyze * chore(release): map phil.thomas@gametime.co -> explainanalyze * fix(browser): use correct env var for --no-sandbox bypass AGENT_BROWSER_CHROME_FLAGS is not read by agent-browser CLI. The correct env var is AGENT_BROWSER_ARGS, with comma-separated values. This fixes Chrome 'No usable sandbox' crash on Ubuntu 23.10+ systems where AppArmor restricts unprivileged user namespaces. The detection logic was correct but the fix used the wrong environment variable name and space-separated instead of comma-separated args. * fix(browser): honor pre-set AGENT_BROWSER_ARGS and document the bypass Follow-up to the sandbox-bypass env-var fix: - Update the opt-out gate so a user-provided AGENT_BROWSER_ARGS is also respected, not just the legacy AGENT_BROWSER_CHROME_FLAGS. Previously the gate only checked the broken legacy var, so a user who pre-set AGENT_BROWSER_ARGS would still get clobbered by Hermes's auto-injection. - Document AGENT_BROWSER_ARGS in .env.example, the browser feature page, and the env var reference, with notes about the auto-injection on AppArmor-restricted systems (Ubuntu 23.10+, DGX Spark, containers). - Add Anadi Jaggia to AUTHOR_MAP. * fix(cli): fall back to SelectSelector when kqueue can't watch stdin On macOS with uv-managed cPython 3.11, the default kqueue selector cannot register fd 0, so prompt_toolkit's loop.add_reader raises OSError(EINVAL) ("[Errno 22] Invalid argument") from kqueue.control() and the agent crashes immediately on startup (#5884, also reported in #6393). Probe KqueueSelector.register(0, EVENT_READ) before launching prompt_toolkit. If it fails, install an event-loop policy that returns a SelectorEventLoop backed by SelectSelector — select() works fine on stdin in this Python build, so add_reader succeeds and the agent launches normally. Also extend the existing #6393 fallback handler to recognize EINVAL / EBADF / "Invalid argument" so that any future selector failure on stdin shows the friendly "reinstall Python via pyenv or Homebrew" guidance instead of an opaque traceback. Verified on macOS (Darwin 24.6.0) with uv-managed cPython 3.11.15: the kqueue probe fails, the policy switch fires, and `hermes` launches cleanly. No effect on platforms where kqueue can register fd 0. * chore(release): add AUTHOR_MAP entry for outdoorsea * fix(aux): surface Nous auth-unavailable warning in auxiliary client When the auxiliary client falls through Nous (e.g. no stored auth, or runtime credential mint failed), users currently see only `debug`-level lines, so the next provider in the fallback chain takes over silently. Promote the no-auth path to a warning that tells operators to run `hermes auth`, and add a debug breadcrumb on the rarer mint-failed-but-stored-auth-still-present fallback path so the existing behavior (use the raw stored token) is preserved while staying investigable. Salvaged from #23881 by @0xharryriddle. The contributor's original patch also short-circuited the second branch with a return, which broke the pool-entry fallback path covered by `test_try_nous_uses_pool_entry` — kept the warning intent, dropped the return so the fallback still works. Dropped the contributor's changes to `hermes_cli/goals.py` because the goal-pause path is unreachable when the auxiliary client is None (`judge_goal` returns `parse_failed=False`, which resets `consecutive_parse_failures`), so the reason string they added never surfaces in the pause message. Refs #23876 * feat: add ACP registry metadata for Zed * chore(release): bump ACP Registry assets in lockstep with pyproject The ACP Registry manifest (acp_registry/agent.json), the npm launcher package.json, and the launcher's HERMES_AGENT_VERSION constant must all match pyproject.toml exactly — tests/acp/test_registry_manifest.py enforces this lockstep. Without a release-script hook, the next weekly version bump fails that test until someone hand-edits four files. Extend update_version_files() to drive the ACP bump alongside __init__.py and pyproject.toml, and add tests covering the lockstep and the missing-files no-op path. Also map adam.manning@gmail.com -> am423 for the salvage commit. * chore: remove Atropos RL environments and tinker-atropos integration (#26106) * chore: remove Atropos RL environments, tools, tests, skill, and tinker-atropos submodule Delete: - environments/ (43 files — base env, agent loop, tool call parsers, benchmarks) - rl_cli.py (standalone RL training CLI) - tools/rl_training_tool.py (all 10 rl_* tools) - tests: test_rl_training_tool, test_tool_call_parsers, test_managed_server_tool_support, test_agent_loop, test_agent_loop_vllm, test_agent_loop_tool_calling, test_terminalbench2_env_security - optional-skills/mlops/hermes-atropos-environments/ - tinker-atropos git submodule + .gitmodules * chore: remove RL/Atropos references from Python source - toolsets.py: remove rl toolset block + update comment - model_tools.py: remove rl_tools group + update async bridging comment - hermes_cli/tools_config.py: remove RL display entry, _DEFAULT_OFF_TOOLSETS, setup block, and rl_training post-setup handler - tools/budget_config.py: remove RL environment reference in docstring - tests/test_model_tools.py: remove rl_tools from expected groups - tests/run_agent/test_streaming_tool_call_repair.py: fix stale cross-reference * chore: remove rl/yc-bench extras and tinker-atropos refs from pyproject.toml - Remove rl extra (atroposlib, tinker, fastapi, uvicorn, wandb) - Remove yc-bench extra - Remove rl_cli from py-modules - Remove [tool.ty.src] exclude for tinker-atropos - Remove [tool.ruff] exclude for tinker-atropos - Regenerate uv.lock * chore: remove tinker-atropos from install/setup scripts - setup-hermes.sh: remove entire tinker-atropos submodule install block - scripts/install.sh: remove both tinker-atropos blocks (Termux + standard) - scripts/install.ps1: remove tinker-atropos block - nix/hermes-agent.nix: remove tinker-atropos pip install line * chore: remove RL references from cli-config.yaml.example * docs: remove Atropos/RL references from README, CONTRIBUTING, AGENTS.md * docs: remove RL/Atropos references from website - Delete: environments.md, rl-training.md, mlops-hermes-atropos-environments.md - sidebars.ts: remove rl-training and environments sidebar entries - optional-skills-catalog.md: remove hermes-atropos-environments row - tools-reference.md: remove entire rl toolset section - toolsets-reference.md: remove rl row + update example - integrations/index.md: remove RL Training bullet - architecture.md: remove environments/ from tree + RL section - contributing.md: remove tinker-atropos setup - updating.md: remove tinker-atropos install + stale submodule update * chore: remove remaining RL/Atropos stragglers - hermes_cli/config.py: remove TINKER_API_KEY + WANDB_API_KEY env var defs - hermes_cli/doctor.py: remove Submodules check section (tinker-atropos) - hermes_cli/setup.py: remove RL Training status check - hermes_cli/status.py: remove Tinker + WandB from API key status display - agent/display.py: remove both rl_* tool preview/activity blocks - website/docs: remove RL references from providers.md + env-variables.md - tests: remove TINKER_API_KEY from conftest, set_config_value, setup_script * chore: remove RL training section from .env.example * feat(acp-registry): switch to uvx distribution, drop npm launcher The ACP Registry schema supports uvx as a first-class distribution method alongside npx and binary. Pointing the registry directly at the existing hermes-agent PyPI release removes: - the @nousresearch npm scope (we don't own it) - a separate npm publish step on every weekly release - 90 lines of Node launcher + tests in packages/hermes-agent-acp/ The Zed registry now installs Hermes via: uvx --from 'hermes-agent[acp]==<version>' hermes-acp This is the same command the npm launcher was shelling out to anyway, so end-user behavior is unchanged. Registry CI validates the PyPI URL + version-pin exact match automatically. Changes: - acp_registry/agent.json: distribution.npx -> distribution.uvx - delete packages/hermes-agent-acp/ entirely - scripts/release.py: drop npm-launcher bump paths, keep manifest lockstep - tests/acp/test_registry_manifest.py: assert uvx shape + version pin - tests/scripts/test_release_acp_registry.py: rewrite for uvx-only shape - docs (user-guide + dev-guide): drop all npm-launcher references - delete docs/plans/acp-registry-zed-integration.md (stale, npm-shaped) Validated against agentclientprotocol/registry agent.schema.json via jsonschema. hermes-agent==0.13.0 is already live on PyPI. * fix(deps): pin brotlicffi so aiohttp can decode Discord's Brotli attachments Discord's CDN serves attachments with Content-Encoding: br. aiohttp's compression_utils tries 'import brotlicffi as brotli' first and falls back to google's Brotli, but Brotli<1.2.0's Decompressor.process() is 1-arg while aiohttp calls it with 2 args (data, max_length). Result: every .txt/.md/.doc uploaded to a Discord-gateway session fails to decode at att.read() with 'Can not decode content-encoding: br' / 'TypeError: process() takes exactly 1 argument (2 given)', the agent never sees the bytes, and falls back to filesystem guessing. Pin brotlicffi==1.2.0.1 in both surfaces: - tools/lazy_deps.py 'platform.discord' tuple: Discord users on the lazy-install path get it on first discord.py import. - pyproject.toml [messaging] extra: users who explicitly install hermes-agent[messaging] (skipping the lazy path) get it eagerly. brotlicffi wins aiohttp's import race regardless of what else is installed (try brotlicffi / except: import brotli), so existing setups that already pulled google's Brotli transitively don't change behavior beyond the bug fix. ~1.5 MB wheel, manylinux/macOS/Windows coverage. E2E verified: round-trip decode of Brotli-compressed payload via aiohttp.compression_utils.brotli succeeds with brotlicffi pinned; same test against Brotli==1.1.0 alone reproduces the reported TypeError. Credit to @Korkyzer for the original diagnosis and fix shape in #15744; the lazy-deps gating layer was added on top to keep brotlicffi out of the install path for users who don't run a Discord gateway. Fixes #12511. Closes #15744. Co-authored-by: Korky <korkyzer@gmail.com> * fix(cli): kill resize scrollback duplication + light-mode visibility Two long-standing prompt_toolkit bugs in the base hermes CLI: 1. Resize duplication. Column-shrink resize used to push 40+ rows of duplicate chrome (status bar, input rules) into terminal scrollback every resize. Same wall as pt issues #29 (open since 2014), #1675, #1933 — aider/xonsh/ipython all use alt-screen to dodge it. Root cause (verified by reading prompt_toolkit/renderer.py): _output_screen_diff (renderer.py L232-242) deliberately moves the cursor to the bottom of the canvas after every paint 'to make sure the terminal scrolls up'. In non-fullscreen mode this scrolls chrome content into terminal scrollback on every render — not just on resize. Fix: monkey-patch prompt_toolkit.renderer._output_screen_diff to bypass the reserve-vertical-space cursor move. When pt's logic checks 'if current_height > previous_screen.height', we inflate the previous screen height so the branch falls through. ~30-line wrapper, no fork of pt, no alt-screen, no DECSTBM scroll region. Verified empirically in real Terminal.app: 10 resizes (mixed shrinks/widens 1300→500→1400) during streaming produced ZERO scrollback delta, full agent response preserved, status bar pinned at bottom, no visible duplicates. pt is pinned to ==3.0.52 so the private-function patch is safe; future pt bumps will need to re-verify the signature matches. 2. Light-mode terminal visibility. Hardcoded skin colors (#FFF8DC cornsilk, #FFD700 gold, #B8860B dark goldenrod) are tuned for dark Terminal.app — invisible on light/cream backgrounds. Port ui-tui/src/theme.ts detectLightMode() to Python so the base CLI adapts. Detection priority: HERMES_LIGHT/HERMES_TUI_LIGHT env → HERMES_TUI_THEME=light|dark → HERMES_TUI_BACKGROUND=#RRGGBB → COLORFGBG env (xterm/Konsole/urxvt) → OSC 11 query (\x1b]11;?\x1b\\) with 100ms timeout → default dark. OSC 11 is tty-gated so gateway/cron/batch/subagent code paths don't pay the timeout cost. When light mode is detected, dark-mode colors auto-remap to readable equivalents (#FFF8DC → #1A1A1A, #FFD700 → #9A6B00, etc). Hooked at three points: - _hex_to_ansi() — auto-remaps any color emitted via the ANSI helper - _build_tui_style_dict() — rewrites pt style strings (chrome bg/fg) - SkinConfig.get_color() — wrapped at module load so Rich Panel borders/body text get the remap too Status-bar foreground colors (#C0C0C0, #888888, etc.) are explicitly skipped because they're paired with a dark navy bg — remapping them would make them invisible in dark mode. 3. Other visibility fixes: [thinking] reasoning preview now uses ANSI dim+italic (\x1b[2;3m) instead of #B8860B so it inherits terminal default fg color. Input/prompt area defaults to terminal default fg (was #FFF8DC cornsilk → invisible on cream). Co-authored-by: Brooklyn Nicholson <brooklyn.bb.nicholson@gmail.com> * test(cli): cover light-mode detection + SkinConfig.get_color remap Adds 16 unit tests covering the light/dark terminal detection path introduced in the previous commit: - Env override priority (HERMES_LIGHT, HERMES_TUI_LIGHT, HERMES_TUI_THEME, HERMES_TUI_BACKGROUND, COLORFGBG) - Detection cache stickiness - _maybe_remap_for_light_mode() no-op in dark mode - Known dark-mode color remap (#FFF8DC -> #1A1A1A etc) - Case-insensitive lookup - Unknown color passthrough - Status-bar paired colors (#C0C0C0, #888888, #555555, #8B8682) are intentionally NOT remapped — regression guard for the patch-11 fix, since remapping them would produce dark-on-dark on the status bar's navy bg - SkinConfig.get_color() wrapper is installed and idempotent - SkinConfig.get_color() does remap in light mode and passes through in dark mode We don't try to fake an OSC 11 reply — that path is exercised end-to-end in real Terminal.app; the env-override path covers the algorithmic logic. * revert(cli): drop scrollback box width clamp (#25975), restore full-width borders (#26163) #25975 (salvaging #24403) clamped decorative scrollback Panels and streaming box rules to `max(32, min(width, 56))` as a defense against terminal-emulator reflow when columns shrink. On any modern wide terminal this made the response/reasoning borders look stubby — 56 cols inside a 200-col viewport. #26137 (salvaging #25981, by @OutThisLife) landed a more fundamental fix: prompt_toolkit's `_output_screen_diff` is monkey-patched so its reserve-vertical-space cursor move no longer pushes chrome into scrollback at all. With that in place, the clamp is no longer load-bearing for the chrome-into-scrollback class of bugs — the remaining risk is purely cosmetic reflow of *already stamped* Panel borders during an aggressive column shrink, which we now accept as a tradeoff for restoring proper full-width rendering. Changes: - `_scrollback_box_width()` returns `max(32, width)` (just the floor, no upper cap). All 10 call sites stay valid. - Updated `test_scrollback_box_width_caps_to_resize_safe_value` to the new `test_scrollback_box_width_returns_viewport_width` asserting full-width passthrough above the 32-col floor. Floor of 32 is kept so `'─' * (w - 2)` math stays positive on tiny terminals. Refs #18449 #19280 #22976 (the original reflow class) and #25975 (the clamp this reverts). * fix(goals): raise judge max_tokens 200 → 4096, make configurable The freeform /goal judge was capped at max_tokens=200, which reliably truncated the JSON verdict on reasoning-heavy models (deepseek-v4-pro, qwq, etc.) — the model burns tokens on hidden reasoning before emitting visible content, and the first /goal turn's prompt is larger than later turns, blowing past 200. Symptom: agent.log shows `judge reply was not JSON: '{"done": true, "reason": "The agent successfully'` followed by repeated `judge returned empty response` lines, then the goal pauses with a misleading 'judge model isn't returning the required JSON verdict' message. Diagnosed live by @helix4u — empirically verified that raising the budget on an unmodified worktree makes the failures go away on the exact configs users were hitting on Nous Plus subscription paths. Changes: - DEFAULT_JUDGE_MAX_TOKENS = 4096 (up from 200) - New auxiliary.goal_judge.max_tokens config knob for tuning in specifically constrained setups - _goal_judge_max_tokens() resolves the value with fail-open semantics (non-int / non-positive / load failure → default). load_config() is mtime-cached so per-turn lookup is cheap. Scoped narrowly to the verified root cause — does not introduce a submit_verdict tool-call schema (see #26162 / #23671 for that direction; they can land separately if we want them). Tests: tests/hermes_cli/test_goals.py + tests/cli/test_cli_goal_interrupt.py + tests/gateway/test_goal_verdict_send.py — 62/62 passing. E2E verified: config override honored (8192), missing/garbage/zero values fall back to 4096, no-auxiliary-section falls back to 4096. Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com> Credits: - @helix4u (Gille) — diagnosed the max_tokens=200 truncation via live testing on an unmodified worktree, drafted the original fix shape in #26162. - @AhmetArif0 — flagged the freeform judge fragility in #23671 from the tool-call angle. - @0xharryriddle (HarryRiddle.eth) — reported the issue from a Nous Plus subscription setup in #23876 with full debug reports. Closes #23876 Supersedes #26162, #23671, #23881 * ci: add PyPI publish workflow (salvaged from #25901) (#26148) * ci(pypi): add publish workflow for automated PyPI releases Triggered by CalVer tag pushes from scripts/release.py (v20* pattern). Three jobs: build (uv build) → publish (OIDC trusted publishing) → sign (Sigstore + attach to existing GitHub Release). - workflow_dispatch as manual escape hatch - skip-existing for safe re-runs - Graceful skip when GitHub Release not found (sign job) - Top-level permissions: contents: read (CodeQL compliant) Requires one-time setup: PyPI trusted publisher + GitHub pypi environment. Co-authored-by: dmahan93 <44207705+dmahan93@users.noreply.github.com> * fix(release): address review findings - Stage acp_registry/agent.json in version bump commit (was silently left unstaged) - Add missing return when no previous tags found without --first-release - Fix get_pr_number return type annotation (str -> str | None) - Prefer uv build over python -m build (matches CI workflow), with fallback - Use unit separator (%x1f) in git log format to handle | in author names - Add explicit encoding='utf-8' to .release_notes.md write Workflow hardening: - Gracefully skip signing when GitHub Release not found (env var gate instead of exit 1, so PyPI publish still shows green) * fix(ci): harden PyPI workflow — SHA-pin actions, guard workflow_dispatch, explicit build flags - Pin all actions to commit SHAs (supply-chain hardening for id-token:write) - workflow_dispatch now requires confirm_tag input + checks out that tag - Both uv build paths explicitly pass --sdist --wheel --------- Co-authored-by: dmahan93 <44207705+dmahan93@users.noreply.github.com> * feat(yuanbao): add _parse_resource_id and update _extract_text for ybres anchors * feat(yuanbao): add quote_media_refs extraction to QuoteContextMiddleware * feat(yuanbao): prioritize quote media refs over history backfill in DispatchMiddleware * fix(yuanbao): resolve quoted file/image via transcript lookup when quote desc lacks ybres When a user quotes a file message (type=3) and @bot, the quote's desc field only contains the filename without a ybres:// resource reference. The existing QuoteContextMiddleware only extracted media refs from desc using the ybres regex, which always returned empty for file quotes. Fix: add a transcript lookup fallback in QuoteContextMiddleware.handle() — when quote_media_refs is empty but reply_to_message_id is set, search the session transcript for the quoted message_id and extract ybres anchors from its content. Also fix message_type classification: when quote media resolves non-image files, override message_type to DOCUMENT so gateway/run.py's document injection logic properly prepends the file path and content for the agent. * refactor(yuanbao): improve quote media fallback — move to DispatchMiddleware, tighten conditions * feat(skills-hub): add huggingface/skills as trusted default tap (#2549) Adds Hugging Face's official skill catalog to the default GitHub taps and classifies it as a trusted source alongside openai/skills and anthropics/skills. - tools/skills_guard.py: huggingface/skills -> TRUSTED_REPOS - tools/skills_hub.py: GitHubSource.DEFAULT_TAPS += huggingface/skills (skills/) - website/docs: list it under default taps + trusted-source examples Closes #2549. Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com> * fix(session): persist auto-reset state across gateway restarts was_auto_reset, auto_reset_reason, and reset_had_activity were not included in SessionEntry.to_dict() / from_dict(), so a gateway restart between session expiry and the user's next message would silently drop the auto-reset notification and context note. Add the three fields to the serialization roundtrip with safe defaults (False / None / False) so existing sessions.json files load cleanly. Add three roundtrip tests to test_session_reset_notify.py. * fix(gateway): isinstance-guard string-form 429 error body When a non-Anthropic provider (e.g. Morpheus proxy) returns a 429 with `{"error": "Too Many Requests"}` instead of the expected `{"error": {"type": ...}}` dict, _err_body.json().get("error", {}) returns the raw string and the next .get("type") line crashes with AttributeError, taking down the message handler. Guard with isinstance(_err_json, dict) so non-dict error bodies fall through to the generic rate-limit hint. Salvaged from PR #2587 by @KiraKatana. The PR's fallback-config `base_url`/`api_key_env` fix was already implemented independently on main (run_agent.py:8759-8780) with additional aliases and Ollama Cloud host handling, so only the gateway guard is cherry-picked. Co-authored-by: KiraKatana <kira.ops@proton.me> * fix: clean stale conversation mappings on response eviction/deletion ResponseStore.put() and .delete() now remove conversations rows that reference evicted or deleted response IDs, preventing 404 errors when a conversation name is reused after its backing response was purged. Adds regression tests for delete, eviction, and handler-level reuse. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore(release): add AUTHOR_MAP entry for CoinTheHat * fix(whatsapp): fail fast when Baileys sendMessage hangs Baileys' sock.sendMessage() can hang indefinitely while uploading media to WhatsApp servers (and, less often, on text sends), pinning the bridge's Express handler until the gateway's aiohttp timeout fires — surfacing to the user as a 120s wait followed by an empty error from the TTS/voice path. Wrap every sock.sendMessage() call inside the bridge in a sendWithTimeout() helper that rejects after WHATSAPP_SEND_TIMEOUT_MS (default 60s) via Promise.race. The four call sites are /send, /edit, and /send-media's primary send. Express handlers catch the rejection in their existing try/catch and return a real 500 to the gateway, which can then surface a retryable error. Salvaged from #2608 — wysie diagnosed the hang and the Promise.race shape; the other two parts of that PR (gateway HTTP session pooling, base.py metadata kwarg removal) already landed on main via separate routes and are no longer needed. Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com> * security(deps): add upper bounds to 5 loose deps + document supply chain policy (#24226) After the Mini Shai-Hulud supply chain campaign (May 2026) and the litellm compromise (March 2026), codify the dependency pinning policy that was established in PRs #2810 and #9801 but never written down for contributors. Changes: - pyproject.toml: Add tight upper bounds to the 5 deps that slipped through as review escapes from external contributor PRs: - hindsight-client>=0.4.22,<0.5 (was >=0.4.22) - aiosqlite>=0.20,<0.23 (was >=0.20) - asyncpg>=0.29,<0.32 (was >=0.29) - alibabacloud-dingtalk>=2.0.0,<3 (was >=2.0.0) - youtube-transcript-api>=1.2.0,<2 (was >=1.2.0) Pre-1.0 packages get <0.(current_minor+2) — tight enough to block hostile minor releases but loose enough to not require bumps every week. - CONTRIBUTING.md: Add 'Dependency pinning policy' section under Security with the full rationale, table of source types + treatments, and examples. - AGENTS.md: Add concise 'Dependency Pinning Policy' section for AI coding agents with the decision table and step-by-step checklist. - supply-chain-audit.yml: Add dep-bounds job that fails PRs introducing PyPI deps without <ceiling upper bounds. Fires on pyproject.toml changes. Posts a PR comment with the specific unbounded specs found. Refs: #2796 #2810 #9801 #24205 * feat(image-gen): actionable setup message when no FAL backend is reachable (#26222) When the in-tree FAL path has no API key (and no managed gateway), the handler used to return a bare 'FAL_KEY environment variable not set' error. Users had no idea where to get a key, that a managed Nous gateway exists, or that plugin-registered providers are an option. Now `image_generate_tool` returns a structured multi-line message: - signup link (https://fal.ai) - managed-gateway status (if Nous tools are enabled) - pointer to `hermes tools` / `hermes plugins list` for alternate backends, so users on a stale `image_gen.provider` know where to look The schema is untouched — `check_fn` still gates the tool out of the schema when no backend is reachable at startup, consistent with every other conditional tool. This patch fixes the call-time failure modes: managed-gateway 5xx, plugin provider disappearing mid-session, etc. Inspired by #2546 / @Mibayy. The PR was ~5700 commits stale against the new plugin-aware image_gen architecture, so this is a forward port of the actionable-error idea rather than a cherry-pick. Closes #2543 Co-authored-by: Mibayy <mibayy@users.noreply.github.com> * docs(cron): worked recipes for the wakeAgent pre-run gat…

…ousResearch#26106) * chore: remove Atropos RL environments, tools, tests, skill, and tinker-atropos submodule Delete: - environments/ (43 files — base env, agent loop, tool call parsers, benchmarks) - rl_cli.py (standalone RL training CLI) - tools/rl_training_tool.py (all 10 rl_* tools) - tests: test_rl_training_tool, test_tool_call_parsers, test_managed_server_tool_support, test_agent_loop, test_agent_loop_vllm, test_agent_loop_tool_calling, test_terminalbench2_env_security - optional-skills/mlops/hermes-atropos-environments/ - tinker-atropos git submodule + .gitmodules * chore: remove RL/Atropos references from Python source - toolsets.py: remove rl toolset block + update comment - model_tools.py: remove rl_tools group + update async bridging comment - hermes_cli/tools_config.py: remove RL display entry, _DEFAULT_OFF_TOOLSETS, setup block, and rl_training post-setup handler - tools/budget_config.py: remove RL environment reference in docstring - tests/test_model_tools.py: remove rl_tools from expected groups - tests/run_agent/test_streaming_tool_call_repair.py: fix stale cross-reference * chore: remove rl/yc-bench extras and tinker-atropos refs from pyproject.toml - Remove rl extra (atroposlib, tinker, fastapi, uvicorn, wandb) - Remove yc-bench extra - Remove rl_cli from py-modules - Remove [tool.ty.src] exclude for tinker-atropos - Remove [tool.ruff] exclude for tinker-atropos - Regenerate uv.lock * chore: remove tinker-atropos from install/setup scripts - setup-hermes.sh: remove entire tinker-atropos submodule install block - scripts/install.sh: remove both tinker-atropos blocks (Termux + standard) - scripts/install.ps1: remove tinker-atropos block - nix/hermes-agent.nix: remove tinker-atropos pip install line * chore: remove RL references from cli-config.yaml.example * docs: remove Atropos/RL references from README, CONTRIBUTING, AGENTS.md * docs: remove RL/Atropos references from website - Delete: environments.md, rl-training.md, mlops-hermes-atropos-environments.md - sidebars.ts: remove rl-training and environments sidebar entries - optional-skills-catalog.md: remove hermes-atropos-environments row - tools-reference.md: remove entire rl toolset section - toolsets-reference.md: remove rl row + update example - integrations/index.md: remove RL Training bullet - architecture.md: remove environments/ from tree + RL section - contributing.md: remove tinker-atropos setup - updating.md: remove tinker-atropos install + stale submodule update * chore: remove remaining RL/Atropos stragglers - hermes_cli/config.py: remove TINKER_API_KEY + WANDB_API_KEY env var defs - hermes_cli/doctor.py: remove Submodules check section (tinker-atropos) - hermes_cli/setup.py: remove RL Training status check - hermes_cli/status.py: remove Tinker + WandB from API key status display - agent/display.py: remove both rl_* tool preview/activity blocks - website/docs: remove RL references from providers.md + env-variables.md - tests: remove TINKER_API_KEY from conftest, set_config_value, setup_script * chore: remove RL training section from .env.example

Remove the tinker-atropos submodule from the PR branch because maintainers intentionally separated it from the main Hermes repo in PR NousResearch#26106.

…ousResearch#26106) * chore: remove Atropos RL environments, tools, tests, skill, and tinker-atropos submodule Delete: - environments/ (43 files — base env, agent loop, tool call parsers, benchmarks) - rl_cli.py (standalone RL training CLI) - tools/rl_training_tool.py (all 10 rl_* tools) - tests: test_rl_training_tool, test_tool_call_parsers, test_managed_server_tool_support, test_agent_loop, test_agent_loop_vllm, test_agent_loop_tool_calling, test_terminalbench2_env_security - optional-skills/mlops/hermes-atropos-environments/ - tinker-atropos git submodule + .gitmodules * chore: remove RL/Atropos references from Python source - toolsets.py: remove rl toolset block + update comment - model_tools.py: remove rl_tools group + update async bridging comment - hermes_cli/tools_config.py: remove RL display entry, _DEFAULT_OFF_TOOLSETS, setup block, and rl_training post-setup handler - tools/budget_config.py: remove RL environment reference in docstring - tests/test_model_tools.py: remove rl_tools from expected groups - tests/run_agent/test_streaming_tool_call_repair.py: fix stale cross-reference * chore: remove rl/yc-bench extras and tinker-atropos refs from pyproject.toml - Remove rl extra (atroposlib, tinker, fastapi, uvicorn, wandb) - Remove yc-bench extra - Remove rl_cli from py-modules - Remove [tool.ty.src] exclude for tinker-atropos - Remove [tool.ruff] exclude for tinker-atropos - Regenerate uv.lock * chore: remove tinker-atropos from install/setup scripts - setup-hermes.sh: remove entire tinker-atropos submodule install block - scripts/install.sh: remove both tinker-atropos blocks (Termux + standard) - scripts/install.ps1: remove tinker-atropos block - nix/hermes-agent.nix: remove tinker-atropos pip install line * chore: remove RL references from cli-config.yaml.example * docs: remove Atropos/RL references from README, CONTRIBUTING, AGENTS.md * docs: remove RL/Atropos references from website - Delete: environments.md, rl-training.md, mlops-hermes-atropos-environments.md - sidebars.ts: remove rl-training and environments sidebar entries - optional-skills-catalog.md: remove hermes-atropos-environments row - tools-reference.md: remove entire rl toolset section - toolsets-reference.md: remove rl row + update example - integrations/index.md: remove RL Training bullet - architecture.md: remove environments/ from tree + RL section - contributing.md: remove tinker-atropos setup - updating.md: remove tinker-atropos install + stale submodule update * chore: remove remaining RL/Atropos stragglers - hermes_cli/config.py: remove TINKER_API_KEY + WANDB_API_KEY env var defs - hermes_cli/doctor.py: remove Submodules check section (tinker-atropos) - hermes_cli/setup.py: remove RL Training status check - hermes_cli/status.py: remove Tinker + WandB from API key status display - agent/display.py: remove both rl_* tool preview/activity blocks - website/docs: remove RL references from providers.md + env-variables.md - tests: remove TINKER_API_KEY from conftest, set_config_value, setup_script * chore: remove RL training section from .env.example

… (#38105) Follow-up to #38089. The merged PR removed --recurse-submodules from the installer, CI, and getting-started docs, but missed the same stale clause in: - CONTRIBUTING.md (Prerequisites table) - website/docs/developer-guide/contributing.md (table + clone command) - zh-Hans mirror of the developer-guide contributing doc git-lfs is kept in the Git requirement rows since it's a separate, real prerequisite. No .gitmodules has existed since the Atropos RL submodule was removed in #26106.

…Research#38089 (NousResearch#38105) Follow-up to NousResearch#38089. The merged PR removed --recurse-submodules from the installer, CI, and getting-started docs, but missed the same stale clause in: - CONTRIBUTING.md (Prerequisites table) - website/docs/developer-guide/contributing.md (table + clone command) - zh-Hans mirror of the developer-guide contributing doc git-lfs is kept in the Git requirement rows since it's a separate, real prerequisite. No .gitmodules has existed since the Atropos RL submodule was removed in NousResearch#26106.

#69) * fix(tui): clear selection on right-click copy + group transcript blocks Two TUI polish fixes. (1) Right-click copy now clears the highlight. The right-click handler copied an active selection via onCopySelectionNoClear (the copy-on-select variant that keeps the highlight during a drag) and never cleared it, so after right-click-to-copy the selection stayed lit with no confirmation and a follow-up right-click re-copied the stale range instead of pasting. A successful right-click copy now clears the selection and notifies; if the copy fails (no clipboard path) the highlight survives and we fall back to the right-click paste handler, exactly as before. (2) Group transcript blocks so boundaries read clearly. Model replies, reasoning/tool trails, and system/error notes rendered with no vertical separation, so distinct block types butted together and were hard to scan. Group adjacent blocks by kind: one blank line opens only where the visual group changes (model prose <-> reasoning/tool trails <-> notes), while a run of same-kind blocks renders flush. The rule lives in domain/blockLayout.ts (messageGroup + hasLeadGap) and is applied intrinsically in MessageLine via a `prev` prop, which fixes the things ad-hoc per-block margins kept breaking: - Streaming stability: the gap is derived from the stable predecessor, never the live block's own changing text, so the actively-streaming reply computes the same gap while it streams as the settled segment does once it flushes. No reflow/jump. - Transparent empty trails: a trail hidden by /details, or one carrying only a token tally (the finalDetails segment message.complete appends), renders nothing and is transparent to grouping (prevRenderedMsg skips it), so there are no floating gaps, no doubled gap after a prompt, and no padded space above the final reply. In the default/collapsed modes content-bearing trails always render, so the grouping is a no-op there. The virtual-height estimator counts the group-boundary line so scroll math stays accurate before Yoga remeasures. ui-tui/src/domain/blockLayout.ts (new), components/messageLine.tsx, components/streamingAssistant.tsx, components/appLayout.tsx, lib/virtualHeights.ts, app/useMainApp.ts. Tests: blockLayout.test.ts (grouping + hidden/empty-trail visibility), virtualHeights leadGap, app-mouse.test.ts copy behavior. Full ui-tui suite green apart from 3 pre-existing local/env failures (cursorDrift, ink-resize, virtualHeights user-prompt-width) unchanged from main. * fix(desktop): stop chat scroll jumping by disabling native scroll anchoring The thread renders virtualized turns in natural document flow with padding spacers, and @tanstack/react-virtual already adjusts scrollTop itself when an off-screen turn is measured and its real height differs from the 220px estimate. With the browser default `overflow-anchor: auto`, native scroll anchoring corrects that SAME size delta too, so the two double-correct and the view lurches — most visibly with Windows mouse wheels, whose coarse notches mount/measure several under-estimated turns per tick (Mac trackpads scroll ~1-3px/frame, keeping it sub-perceptual). Set `overflow-anchor: none` on the thread viewport so only the virtualizer compensates. Also adds `diag-scroll-reset.mjs`, a CDP wheel-up repro that A/B tests the anchor behavior at runtime to confirm the fix. * feat(desktop): clamp sticky human messages to ~2 lines until hover/focus Long user prompts stick to the top of the thread while the response streams beneath them, so a multi-line prompt could eat most of the viewport. Clamp the read-only human bubble's text to ~2 lines with a soft bottom fade; the clamp lifts on hover or keyboard focus, and clicking the bubble still opens the edit composer (which shows the full text). Short messages are untouched — no clamp, no fade. Overflow is measured on an unclamped inner wrapper so the ResizeObserver only fires on real content/width changes, not every frame while the outer max-height animates open; the measured height feeds --human-msg-full so expand/collapse animate to the true height instead of overshooting the cap. * fix(dashboard): trust non-web WS origins on OAuth-gated binds after ticket auth (#37870) Generalises #37747. The WS Origin guard (_ws_host_origin_is_allowed) only trusted the packaged Electron app's non-web origin (file:// / null / app://) when the bind was NOT OAuth-gated. The packaged Hermes Desktop renderer loads over file://, so when it drives a remote OAuth-gated gateway its /api/ws upgrade was rejected with HTTP 403 even though _ws_auth_ok had already validated the single-use ?ticket= one line earlier. This guard runs only AFTER _ws_auth_ok has accepted the WS credential, which is the real auth boundary in every mode: * loopback bind -> legacy dashboard session token * non-loopback --insecure -> legacy session token (Tailscale / LAN, #37747) * OAuth-gated public bind -> single-use, 30s-TTL, identity-bound ?ticket= A non-web origin can only come from a native client; a DNS-rebinding attack always arrives from an http(s) origin and is still match-checked against the bound host. So once the upstream credential check has passed, the Origin guard adds nothing for a non-web origin. Collapsed the loopback/non-gated special cases to 'return True' for non-web origins. http(s) origins keep the strict same-host check, so browser DNS-rebinding defence is unchanged. Tests: gated file:///null/app:// now asserted ALLOWED; cross-site http(s) still rejected on gated and loopback binds; #37747's loopback and non-loopback-insecure cases retained. 37/37 test_dashboard_auth_ws_auth + test_web_server_host_header pass. * fix(desktop): inset sticky human messages with --sticky-human-top Pin user bubbles 0.75rem below the scroll top via a single token instead of flush top-0, so the sticky header doesn't sit hard against the thread edge. * chore: uptick * fix(desktop): drop sticky human clamp max-height transition * fix(desktop): restore sticky human clamp transition at 0.75s * chore: uptick * feat(desktop): custom zoom shortcuts at half default step Replace Electron's built-in zoomIn/zoomOut/resetZoom menu roles with custom implementations that use a 0.1 zoom-level step instead of Chromium's default 0.2. This makes Ctrl/Cmd + +/-0 zoom feel more granular and less jumpy. Also adds installZoomShortcuts() which intercepts the keyboard shortcuts via before-input-event. This is necessary on Linux/Windows where the application menu is set to null, so Chromium's default handler would otherwise apply the full 0.2 step. * fix(docker): seed gateway_state.json from HERMES_GATEWAY_BOOTSTRAP_STATE on first boot (#37896) On a fresh volume there is no gateway_state.json, so the boot reconciler (cont-init.d/02-reconcile-profiles) registers the gateway-default s6 slot but leaves it down — it only auto-starts when the last recorded state was "running". A freshly-provisioned container therefore comes up with the gateway down until something starts it (e.g. the dashboard's start button). Add a generic, first-boot-only env-seed in stage2-hook.sh (which runs before 02-reconcile-profiles): when HERMES_GATEWAY_BOOTSTRAP_STATE=running and no gateway_state.json exists yet, seed {"gateway_state":"running"} so the reconciler brings the supervised slot up on the very first boot. This mirrors the existing HERMES_AUTH_JSON_BOOTSTRAP pattern: it seeds the same state file the reconciler already consults, guarded by [ ! -f ] so persisted runtime state always wins on later boots (a deliberately-stopped gateway stays stopped across restarts). Only the literal "running" is honoured (the sole value in the reconciler's _AUTOSTART_STATES). Generic container contract — no host-specific code. Useful to any orchestrator that provisions a blank volume and wants the gateway up from first boot (the supervised gateway/dashboard already work on such hosts; only the first-boot autostart was missing because the CLI lifecycle commands can't drive the s6 layer when container self-detection misses). Adds a shell-level contract test and documents the env var. * fix(desktop): keep in-flight new chats from vanishing on refresh Creating several sessions in a row (Ctrl-N, type, send, repeat) and waiting for one to finish made the other still-running chats disappear from the sidebar. Root cause: a new session's first user message isn't flushed to the SessionDB until its turn is persisted, so the row's message_count stays 0 mid-response. `refreshSessions()` lists with min_messages=1 and then hard-replaces $sessions. Because every message.complete triggers a refresh, the moment one session finished, the others (still at message_count 0) were filtered out of the server page and dropped from the list. Fix: merge instead of replace. `mergeWorkingSessions()` preserves any session that is still in $workingSessionIds but absent from the server page, so concurrent new chats stay visible until their own turn persists. Optimistic deletes/archives already remove the row from the previous list, so a removed session can't be resurrected by the merge. * fix(desktop): label in-flight new chats with the first message The send path created the optimistic sidebar row with a null preview, so a new chat read "Untitled session" until its turn persisted and auto-title ran. With concurrent new chats now preserved across refreshes, several "Untitled session" rows could show at once. Seed the optimistic preview with the user's first message (the branch path already does this) so each in-flight row is labeled immediately. The server's own preview/title supersedes it once the turn persists. * fix(docker): point TUI launcher at prebuilt bundle via HERMES_TUI_DIR (#37923) The embedded dashboard Chat tab dies on hosted images with a 502 / "[session ended]": the PTY child's `hermes --tui` spawn runs a runtime `npm install` that fails. Root cause: the root package-lock.json describes the WHOLE npm monorepo workspace set (root + web + ui-tui + apps/*), but the image only installs root/web/ui-tui — apps/* (the desktop app) is never `npm install`ed here, and its deps hoist into the shared root node_modules. So the actualized node_modules permanently disagrees with the canonical lock, `_tui_need_npm_install()` returns True on every launch, and the runtime `npm install` it triggers (a) can never converge against the partial monorepo and (b) races itself across concurrent /api/pty connections -> ENOTEMPTY -> the launcher `sys.exit(1)`s, the slow install blows past Fly's WS-upgrade window -> 502 -> the browser shows "[session ended]". Fix: set `ENV HERMES_TUI_DIR=/opt/hermes/ui-tui` so `_make_tui_argv` takes the prebuilt-bundle fast path (`node --expose-gc /opt/hermes/ui-tui/dist/entry.js`) and never reaches the install check — exactly the nix/packaged-release path the launcher was designed for. The bundle is already built at Layer 8 (`ui-tui && npm run build`); this just tells the launcher to use it. Verified on a freshly-built image: HERMES_TUI_DIR is set, the prebuilt dist/entry.js is present, `_make_tui_argv` resolves to the prebuilt node invocation (no npm), and `docker run ... --tui` no longer prints "npm install failed". New regression guard: tests/docker/test_tui_prebuilt_bundle.py. A separate launcher hardening (make _tui_need_npm_install tolerant of partial-monorepo installs) is tracked independently; this Docker-side fix resolves the hosted-chat symptom on its own. Area: docker (Dockerfile + tests/docker). * fix(desktop): disable GPU acceleration on remote displays to stop flicker Users on remote/forwarded displays (SSH X11 forwarding, VNC, RDP, WSLg) reported the window flickering during scroll/streaming; nobody on native Windows/macOS ever saw it. Root cause: the app shipped with Chromium's default GPU hardware acceleration and no remote-display handling. Over a remote connection the GPU compositor can't present accelerated layers cleanly across the wire, so the surface flashes on repaint. Local sessions composite on the GPU and never hit it. Detect a remote display before app `ready` (detectRemoteDisplay in bootstrap-platform.cjs) and fall back to software rendering via app.disableHardwareAcceleration() + --disable-gpu-compositing. Software compositing is rock-steady over the wire and the CPU cost is negligible next to the connection's latency. HERMES_DESKTOP_DISABLE_GPU overrides detection both ways for VNC/screen-sharing setups we can't sniff or remote hosts that do have working acceleration. * fix(desktop): don't treat WSLg as a remote display WSLg renders Linux GUIs locally through a vGPU surface rather than shipping frames over the wire, so it doesn't show the remote-compositor flicker — confirmed by a WSL user seeing zero flickering. Drop the WSL branch from detectRemoteDisplay so WSLg keeps hardware acceleration; detection now covers only genuinely-remote displays (SSH X11 forwarding, VNC, RDP). The HERMES_DESKTOP_DISABLE_GPU override still works for anyone who does hit it. * fix(desktop): keep slash/@ completion menu navigable and Esc-dismissable The desktop composer's `onKeyUp` handler unconditionally re-ran `refreshTrigger` on every keyup, including the Arrow/Enter/Tab/Escape keys the open-trigger `onKeyDown` branch had already fully handled. Because `refreshTrigger` re-detects the trigger and resets the active index to 0, this produced two bugs in the `/` (and `@`) completion popover: - ArrowDown/ArrowUp moved the highlight on keydown, then keyup snapped it straight back to the top — so the user could never cycle past the first couple of items. - Escape closed the menu on keydown, then keyup re-detected the still-present `/` and immediately reopened it — so Esc appeared to do nothing. Fix: skip the keyup-driven refresh for the navigation/control keys while a trigger menu is open (they never edit text, so refreshing is pointless), and only reset the highlight in `refreshTrigger` when the detected trigger query actually changed. Applied to both the main composer (chat/composer/index.tsx) and the message-edit composer (assistant-ui/thread.tsx), which shared the same bug. New `shouldSkipTriggerRefreshOnKeyUp` helper is unit-tested. * fix(desktop): make Stop button actually interrupt when a turn is queued When a follow-up message is queued during a busy turn, the composer clears and the primary button switches back to the Stop affordance. But clicking Stop ran interruptAndSendNextQueued(), which cancelled the turn and *immediately* re-sent the head of the queue. The auto-drain effect (busy true to false) compounded this: any explicit cancel flipped busy false and re-fired the queue. The net effect was that Stop appeared to never interrupt -- the agent kept running on the queued prompt. Fix: - Stop button (busy + empty composer) now always performs a pure interrupt via onCancel(); it no longer hijacks the queue. - An explicit interrupt latches userInterruptedRef so the busy to false auto-drain skips exactly one drain. Queued turns are preserved and the user resumes them deliberately (Cmd/Ctrl+K, Enter, or the per-row send-now arrow), matching the documented Esc=cancel / Cmd+K=send-next affordances. - Extracted the settle decision into shouldAutoDrainOnSettle() with unit tests covering natural completion vs. explicit interrupt. * fix(desktop): stop background session messages bleeding into the active transcript A still-busy background session (one the user toggled away from) keeps emitting updateSessionState() heartbeats — stream deltas, and especially the 'session busy' prompt-rejection errors from auto-drained queued turns. Each call invoked syncSessionStateToView() unconditionally, staging that session's messages into the shared $messages view. flushPendingViewState() guarded against the wrong session reaching the view, but only one requestAnimationFrame is scheduled per frame and pendingViewStateRef holds just the latest writer. So within a single frame a background write could overwrite an already-pending foreground write, and the stale background transcript (e.g. the red 'session busy' rows) would render on top of whatever session the user switched to — appearing to 'bleed' into every session. Guard at the staging site: a session may only stage into the view when it is the currently-active session. Background sessions still update their own cache entry; they just never touch $messages. Pure render fix, no behavior change to queuing, interrupt, or drain. * fix(dashboard): authenticate server-spawned PTY child WS with a process-internal credential The embedded-TUI PTY child attaches to two server-internal WebSockets: /api/ws (its primary JSON-RPC gateway backend) and /api/pub (the event sidecar). Both URLs are built server-side in web_server.py and handed to the child via its environment. In OAuth-gated mode (auth_required=true, every hosted Fly agent), _ws_auth_ok unconditionally rejects the legacy ?token=<_SESSION_TOKEN> path — a leaked session token must not grant WS access once the gate is engaged. But _build_gateway_ws_url() still only emitted ?token=, with no gated-mode branch (its sibling _build_sidecar_url had been given a ticket branch; the gateway-url builder was missed). So the TUI child's /api/ws upgrade was rejected 4401 -> 'gateway websocket connection failed' -> 'gateway startup timeout', leaving the embedded chat unusable on every gated deployment. A single-use 30s browser ticket is the wrong shape for this link: the child reads its attach URL once at startup and reuses it on every reconnect, and on a slow cold boot it may not dial within the TTL. (_build_sidecar_url's own docstring already flagged this fragility.) Fix: add a process-lifetime, multi-use internal credential to dashboard_auth.ws_tickets (internal_ws_credential / consume_internal_credential), minted once per process and NEVER injected into the SPA — it only leaves the process via a spawned child's env, so browser-side XSS can't read it, and a leak grants no more than a ticket already does. _ws_auth_ok accepts it via ?internal= in gated mode only. Both _build_gateway_ws_url and _build_sidecar_url now use it, so the child can reconnect both sockets. Loopback / --insecure behavior is unchanged (still ?token=). Needs review: touches _ws_auth_ok + dashboard_auth (core auth surface). * test(dashboard): direct unit coverage for internal WS credential + docstring fix Follow-up to Ben's PR #37892. Adds a TestInternalCredential block to test_dashboard_auth_ws_tickets.py exercising the mint-once stability, multi-use, unminted-rejection, empty-value, wrong-value, reset-and-remint, and ticket-store-independence branches directly (previously only covered indirectly via _ws_auth_ok, which left the unminted and empty-value branches unexercised). Also corrects the consume_internal_credential docstring: the returned identity dict is discarded by the current _ws_auth_ok caller (which only needs the boolean outcome), so the prior 'carry it into its session log' wording over-promised. * test(desktop): real-DOM regression for slash/@ menu keyboard nav The existing slash-menu fix (PR #37937) shipped a unit test that drove the keydown reducer directly. It did not exercise the actual DOM event path — specifically the keyup-driven `refreshTrigger` that was the root cause — so it would not have caught a regression in that path. This adds a faithful @testing-library reproduction that mounts the real `useLiveCompletionAdapter` plus the index.tsx trigger wiring and fires real `keyDown` + `keyUp` event pairs on a contentEditable. It asserts: - ArrowDown cycles through ALL items (0,1,2,3,4,0,1), not just the first two - Escape closes the menu and keyup does not reopen it Reverting the fix (always-refresh keyup + unconditional setTriggerActive(0)) makes this test fail with the highlight stuck at the top — confirming it guards the real bug. * fix(desktop): stop Esc reopening the slash/@ menu; harden keyup guard Follow-up to #37937. That fix guarded the composer's keyup with `shouldSkipTriggerRefreshOnKeyUp(key, trigger !== null)`. The `trigger !== null` check is timing-fragile for Escape: Escape's *keydown* sets `trigger = null` and closes the menu, but in a real browser the *keyup* fires after a re-render, so the handler closure sees `trigger === null`, the guard returns false, `refreshTrigger` runs, re-detects the still-present `/` in the input, and instantly reopens the menu. (jsdom batches state synchronously so a unit test could not observe this -- only the running app does.) Replace the value-based guard with a `triggerKeyConsumedRef` set synchronously in keydown whenever the open popover consumes a nav/control key (Arrow/Enter/Tab/Escape). keyup consults and clears that ref, so it is immune to the keydown->re-render->keyup timing. Applied to both the main composer (chat/composer/index.tsx) and the message-edit composer (assistant-ui/thread.tsx). Removes the now-unused `shouldSkipTriggerRefreshOnKeyUp` helper and its unit test. The real-DOM regression test now fires keydown+keyup pairs through the ref-based handlers and asserts Esc closes and stays closed. Verified by running a production renderer build (Vite v8) under Electron against a local backend: ArrowDown/ArrowUp cycle the full list and Esc dismisses the menu without reopening. * chore: remove committed RELEASE_v*.md changelogs from repo root (#37855) These per-release changelog files are transient working files used only to feed `gh release create --notes-file` at release time; the GitHub Release itself permanently stores the published notes. They were never a build artifact (no package-data glob, no MANIFEST.in include, no CI reference) and don't belong in the tracked tree. - Delete all 15 (v0.2.0 through v0.15.1) - Add RELEASE_v*.md to .gitignore so an accidental `git add -A` can't recommit them The hermes-release skill is updated separately to write the changelog to /tmp/ for the whole release process and never stage it. * fix(windows): rip out unused submodule support in installer & docker & docs we have no submodules anymore, so #37702 was kinda right, but we can just delete it entirely. * fix(docs): remove remaining stale submodule references missed by #38089 (#38105) Follow-up to #38089. The merged PR removed --recurse-submodules from the installer, CI, and getting-started docs, but missed the same stale clause in: - CONTRIBUTING.md (Prerequisites table) - website/docs/developer-guide/contributing.md (table + clone command) - zh-Hans mirror of the developer-guide contributing doc git-lfs is kept in the Git requirement rows since it's a separate, real prerequisite. No .gitmodules has existed since the Atropos RL submodule was removed in #26106. * docs: explain remote-gateway session token for Hermes Desktop (#38144) The desktop Remote gateway field asks for a session token that Hermes never surfaces — by default web_server.py mints an ephemeral token per boot and injects it into the served HTML, so there is nothing in config.yaml, /gateway, or env to copy. Document that you pin it yourself via HERMES_DASHBOARD_SESSION_TOKEN, run the backend with --insecure (keeps the legacy token auth path instead of engaging the OAuth gate), then paste that value into the desktop app. - web-dashboard.md: new 'Connecting Hermes Desktop to a remote backend' section (backend + desktop steps, --insecure vs OAuth-gate nuance, HERMES_DESKTOP_* env override, Tailscale guidance, troubleshooting). - environment-variables.md: new 'Web Dashboard & Hermes Desktop' env-var table (HERMES_DASHBOARD_SESSION_TOKEN, HERMES_DESKTOP_REMOTE_URL/TOKEN, the OAuth and public-url vars) — none were previously documented. * feat(matrix): support bang command aliases * fix(matrix): make bang-command resolution robust + fix dead skill-command branch Follow-up to the salvaged contributor commit: - Underscore→hyphen tolerance now emits a resolvable token. Previously the detect set accepted the hyphenated variant but emit returned the raw token, so '!set_home' produced '/set_home' which the dispatcher could not resolve. Now emits '/set-home'. Aliases are left as-is — the gateway dispatcher canonicalizes them itself. - Fix dead skill-command branch: skill command keys are stored slash-prefixed (e.g. '/arxiv') in get_skill_commands(), but the check compared the bare token, so '!arxiv' never normalized. Now compares the '/candidate' form, making skill aliases (e.g. !gif-search) work. - Re-run bang normalization after Matrix reply-fallback stripping so a quoted reply whose content is a bang command reaches command parity with the slash form. - Replace silent 'except Exception: pass' with logger.debug(exc_info=True). - Add AUTHOR_MAP entry for @nepenth. Tests: +5 (underscore-alias, skill-command branch, quoted-reply bang + slash parity). 162 Matrix tests pass. * docs: add remote-backend section to the Desktop App page (#38180) The Desktop App page covered install, settings, and chat but not how to connect the app to a backend on another machine — the exact thing @PedjaDrazic asked about. Add a 'Connecting to a remote backend' section that explains the Session token is the dashboard token Hermes never surfaces (pin it via HERMES_DASHBOARD_SESSION_TOKEN + run --insecure), and link to the web-dashboard page for the full backend setup rather than duplicating it. Add a reciprocal link from the web-dashboard remote section back to the Desktop App page. * fix(cli): exclude desktop-managed backend from stale-dashboard kill Fixes #37532 * fix(desktop): pass live backend PID to in-app update so its own dashboard is spared The Python half (#37538) reads HERMES_DESKTOP_CHILD_PID to exclude the desktop-managed backend from _kill_stale_dashboard_processes, but nothing set it. applyUpdatesPosixInApp now passes the live backend PID in the `hermes update` env, completing the #37532 fix end-to-end. * chore: add bbednarski9 to AUTHOR_MAP for #29722 salvage (#38189) Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com> * docs: make the Desktop App remote-backend section self-contained (#38194) The section explained why the Session token is hidden but punted the actual setup steps to the web-dashboard page via a link — a bounce for someone on the Desktop App page trying to connect. Inline the concrete steps instead: backend command block (mint token -> .env -> hermes dashboard --insecure), the in-app Remote gateway steps, the env-var override, Tailscale guidance, and a troubleshooting list. Keep a short pointer to the web-dashboard page for the same setup from that angle. * fix(mcp): banner shows 'disabled' not 'failed' for enabled:false servers (#38204) get_mcp_status() treated every non-connected server as a failure, so a server configured with enabled: false rendered as red '— failed' in the startup banner even though it was intentionally off. Add a 'disabled' field derived from the enabled flag and render disabled servers dim as '— disabled' instead. * feat(debug): include desktop.log in hermes debug share / /debug / hermes logs (#38203) The Electron desktop app writes boot failures, backend spawn output, and Python tracebacks to HERMES_HOME/logs/desktop.log, but debug-share only captured agent/errors/gateway — so desktop boot issues never made it into shared debug reports. - logs.py: register desktop -> desktop.log (enables 'hermes logs desktop') - debug.py: capture desktop snapshot, add to summary report, upload full desktop.log in 'share', update privacy notice - gateway /debug inherits the desktop tail via collect_debug_report() - main.py + docs: help text and log-name table (also adds missing gui row) - tests: desktop seed in fixture, new report test, three_pastes -> four_pastes * fix(desktop): add @testing-library/dom as explicit dev dependency @testing-library/react@16 declares @testing-library/dom as a peerDependency and re-exports waitFor/fireEvent/screen/within from it. Without dom installed as a direct dependency, tsc -b fails with TS2305 in every test file that imports those names — which breaks the apps/desktop build during installer bootstrap (Hermes Setup → "INSTALL DIDN'T FINISH"). * chore: regenerate lockfile + map vladkvlchk for salvaged #36978 - Add @testing-library/dom to apps/desktop devDeps in package-lock.json so npm ci validates against the manifest change (contributor left the lockfile out of the PR intentionally). - Removes stale 'peer: true' flags now that dom is an explicit devDep. - AUTHOR_MAP: prostoandrei9@gmail.com -> vladkvlchk (CI author gate). * fix(nix): bump npmDepsHash for refreshed lockfile Lockfile regeneration invalidated the flake's pinned npm-deps hash. Hash taken from fetchNpmDeps' authoritative 'got:' line (the prefetch-npm-deps Diagnose helper reports a different, wrong value due to a fetcherVersion normalization discrepancy). * fix(desktop): stop chat scroll backward-jump from content-growth interim scrolls (#37997) The thread scroll-anchor hook in apps/desktop/src/components/assistant-ui/ thread-virtualizer.tsx was disarming sticky-bottom whenever scrollTop decreased by >1px between scroll events. That check was too eager: when content height grows mid-frame (virtualizer measurement of a newly visible turn, streaming token, Streamdown/Shiki re-tokenization, composer chip toggle), the browser emits an interim 'scroll' event whose scrollTop is smaller than the previous frame's because scrollHeight just jumped. The rAF-scheduled pinToBottom hasn't run yet, so programmaticScrollPendingRef is 0 and the disarm fired. With sticky-bottom disarmed the scroller stuck ~50px above bottom — the visible at-rest backward jump that #37997 describes (and the same root cause as the wheel-up variant in #37527). Fix: - Track scrollHeight per frame (lastHeightRef). Disarm on scrollTop decrease ONLY when scrollHeight did not grow this frame. Real upward user intent (scrollbar drag, keyboard PgUp, programmatic scrollIntoView) still disarms because it moves scrollTop without growing the content. Wheel-up and touchmove continue to disarm via their own listeners. - Stop observing the scroller element itself in the ResizeObserver; only observe its content child. Viewport-only resizes (window resize, devtools panel toggle) no longer trigger spurious pins, matching the intent of the auto-stick-to-bottom behavior. Verified: - apps/desktop `tsc -b` clean. - apps/desktop `vitest run src/components/assistant-ui/streaming.test.tsx` passes (9/9), including the existing wheel-up disarm regression test that asserts scrollTop stays at 420 after a wheel-up + content growth. * fix(desktop): honor upward wheel scroll in long threads * feat(dashboard): check-before-update flow on the System page (#38205) The dashboard's update button ran 'hermes update' immediately with no preview. Now the System page shows whether an update is available and asks the user to confirm before applying it. - New GET /api/hermes/update/check: reports install method, current version, and commits-behind (via banner.check_for_updates, 6h-cached; ?force=1 busts the cache). Soft-fails to behind=null on network error; marks docker/nix/homebrew as can_apply=false with the out-of-band cmd. - System page: update-status badge on the Hermes version row (latest / N behind), a Check-for-updates button, and an Update-now button that opens a ConfirmDialog showing the commit count before POST /api/hermes/ update fires. Cached status loads with the rest of the page. - Docs + 5 endpoint tests (git/up-to-date/docker/soft-failure + auth gate). * fix(tui): stop persisting full tool output in trail lines (silent OOM death) A heavy --tui session (browser snapshots, large tool outputs) silently OOM-killed the Node parent within minutes — closing the gateway child's stdin, which the user saw only as a bare "gateway exited" / stdin EOF. CLI was immune. Root cause: each completed tool's verbose trail line embedded up to 16KB of result_text, persisted in transcript Msg.tools[] for the whole session and rendered EXPANDED by default, so an Ink render-node tree was built for every one of up to 800 messages at once. That tree blew past Node's heap at a few hundred MB — far below the 2.5GB memory-monitor exit threshold, so the death was never even attributed. - text.ts: persisted verbose tool-trail blocks now cap to a small preview (VERBOSE_TRAIL_MAX_CHARS=800/12 lines), not the 16KB live-render budget. Retained trail strings drop ~17x (12.2MB -> 0.7MB at 800 msgs); the live streaming tail still uses the larger LIVE_RENDER budget. - tui_gateway/server.py: lower the gateway-side verbose text cap to match (1KB/16 lines) so we stop shipping output the TUI no longer renders. - memoryMonitor.ts: derive critical/high thresholds from the real V8 heap ceiling (~88%/70%) instead of the hardcoded 2.5GB that killed the process at 31% of an 8GB ceiling; add a one-shot onWarn early-warning on fast sub-threshold heap growth so the next such death is diagnosable, not silent. - entry.tsx: wire onWarn to a crash-log breadcrumb + stderr line. Full tool output is unchanged in the agent context and SQLite session — this is display/transport only, no behavior or context change. Fixes #34095. Related #27282. Tests: ui-tui text + new memoryMonitor suites (33 pass), python verbose-cap guard (5 pass); full ui-tui suite shows no new failures vs pristine main. E2E repro confirms the retention drop. * fix(kanban): don't permanently block tasks that hit a provider rate limit (#38223) A kanban worker that exhausted its retries purely on a provider rate limit / quota wall (e.g. opencode-go's 5-hour window) exited with code 1. The dispatcher counted that as a crash, and with DEFAULT_FAILURE_LIMIT=2 two quota-wall hits permanently blocked the card. Fanning out many workers against one shared quota made this routine. Now a rate-limited worker exits with EX_TEMPFAIL (75); the dispatcher classifies that as a 'rate_limited' exit, releases the task back to 'ready' WITHOUT incrementing consecutive_failures (the breaker can't trip on a transient throttle), and the respawn guard defers the next attempt on a cooldown (default 5min, HERMES_KANBAN_RATE_LIMIT_COOLDOWN_SECONDS) until the quota window clears. Genuine crashes still count and trip the breaker as before. The 120s Retry-After cap is unchanged — no worker parks for hours holding a slot. - conversation_loop.py: surface failure_reason in the exhaustion return - cli.py: kanban worker picks exit 75 on rate_limit/billing failure - kanban_db.py: rate_limited exit kind, no-count requeue, cooldown guard * feat(observability): observer-grade telemetry hooks + NeMo-Relay plugin Adds backend-neutral observer hooks for plugins: session, turn, API request, tool, approval, and subagent lifecycle events with stable correlation IDs (session_id, task_id, turn_id, api_request_id, tool_call_id, parent/child subagent ids). Extends VALID_HOOKS with api_request_error and subagent_start. Hot path is zero-cost when no plugin subscribes: has_hook()/presence checks gate all payload construction, request payloads are returned by reference when no middleware rewrites, and the sanitized response payload no longer embeds raw response objects. Bundles the optional NeMo-Relay observability plugin (plugins/observability/nemo_relay) as an in-repo consumer of the new hooks, peer to the existing langfuse plugin. Fails open when the optional nemo-relay package is not installed. Authored-by: Bryan Bednarski <bbednarski@nvidia.com> Salvaged from #29722 onto current main. * test: restore unrelated trailing newlines in cwd/tool-search tests The salvaged PR incidentally stripped a trailing blank line from two unrelated test files (test_file_tools_cwd_resolution.py, test_tool_search.py). Restore them to keep the salvage diff scoped to the observability feature. * perf(observability): gate tool-hook emit on has_hook; slim per-tool footprint The salvaged observer contract gated the API-request hot path on has_hook() but left the per-tool emit ungated: every tool call ran result-field derivation + payload dict build + invoke_hook dispatch even with zero plugins registered. - _emit_post_tool_call_hook now short-circuits on has_hook("post_tool_call") and derives status/error fields lazily (after the gate, only when a listener will consume them). status defaults to None -> derived; explicit blocked/cancelled callers still pass status through. - transform_tool_result emit (pre-existing hook) likewise gated on has_hook(); skips _tool_result_observer_fields when no listener. - Removed the now-redundant _tool_result_observer_fields pre-computation at the three ok-path call sites (model_tools, agent_runtime_helpers, tool_executor) — the helper derives them, so the no-listener path costs one dict lookup and the call sites shrink. - Tests: stub has_hook=True where payload correctness is asserted; add a no-listener regression proving post_tool_call/transform_tool_result emit is skipped when nothing is registered. * test: stub has_hook in transform_tool_result hook tests CI slice 3 caught that tests/test_transform_tool_result_hook.py monkeypatches invoke_hook but not has_hook, so the new has_hook("transform_tool_result") gate skipped the emit and the transform never ran. Stub has_hook=True in the shared _run_handle_function_call helper whenever a custom invoke_hook is supplied (the test intends hooks to fire). The no-hook-registered test keeps the real has_hook=False path — that's the gate's intended behavior. * fix(doctor): detect + repair stale HERMES_MAX_ITERATIONS .env ghost shadowing config.yaml (#38222) * fix(doctor): detect + repair stale HERMES_MAX_ITERATIONS .env ghost shadowing config.yaml hermes doctor now flags when ~/.hermes/.env carries a HERMES_MAX_ITERATIONS value that disagrees with agent.max_turns in config.yaml, and 'hermes doctor --fix' removes the stale .env line so config.yaml is authoritative. 'hermes config show' surfaces the same drift inline under Max turns. The setup wizard stopped dual-writing this value, but users who edited only config.yaml from a pre-fix install keep a .env ghost. The gateway bridge normally overrides it at startup, but if the bridge bails on any earlier config-parse error the ghost silently wins — config says 400 while the gateway activity line reads N/90. The detector reads the .env FILE directly (load_env), not get_env_value/ os.environ, since the startup bridge may already have overwritten os.environ with the config value. Closes #17534. * fix(config): stop offering HERMES_MAX_ITERATIONS as an editable env var Removes HERMES_MAX_ITERATIONS from OPTIONAL_ENV_VARS so the dashboard env editor (PUT /api/env) and any env-var prompt no longer let a user write it to .env — which would recreate the stale ghost that shadows config.yaml's agent.max_turns (issue #17534). The iteration budget is configured only via config.yaml; the env var stays a read-only backward-compat fallback in the gateway/CLI, never a promoted write target. Regression test asserts it is absent from OPTIONAL_ENV_VARS. * fix(install.ps1): handle dirty worktree on Windows update (#38239) Git for Windows defaults to core.autocrlf=true, which renormalizes the repo's LF-only text files to CRLF in the working tree. On a managed, never-user-edited clone this makes tracked files (.envrc, AGENTS.md, agent/*.py, workflows) show as locally modified, so the update path's bare git checkout aborts with 'Your local changes would be overwritten by checkout' and the desktop bootstrap fails at stage=repository. The bash installer already autostashes before checkout; the PowerShell path had no dirty-tree handling at all and never pinned autocrlf. Fix: (1) git reset --hard HEAD before fetch/checkout in the update path to discard any pre-existing dirt, and (2) pin core.autocrlf=false on both the update and fresh-clone paths so the dirt is never created again. * fix(install): require Node >=20.19/22.12 for the desktop build The "Build desktop app" install step failed with an opaque "exit code 1" on machines with an old Node, and nothing in the logs explained it. Reproduced: on Node 20.5.1, `npm run pack`'s `vite build` crashes with You are using Node.js 20.5.1. Vite requires Node.js version 20.19+ or 22.12+. SyntaxError: The requested module 'node:util' does not provide an export named 'styleText' Vite 8 (rolldown) imports node:util.styleText, which doesn't exist before Node 20.12, so the build dies before producing the app. The installer's check_node / Test-Node accepted ANY pre-existing Node with no version floor, so a too-old system Node was used for the build instead of the bundled Node 22. Add a version floor (^20.19 || >=22.12) to check_node (install.sh) and Test-Node (install.ps1): a too-old system Node is replaced with the Hermes-managed Node 22 LTS, and the desktop stage re-resolves Node so the build always runs on a satisfying version. Declare the same range in apps/desktop/package.json engines. Verified: build succeeds on Node 22, fails on 20.5.1 with the error above; the floor logic matches Vite's range across boundary versions (20.18/20.19, 21.x, 22.11/22.12). * feat(dashboard): enrich profiles dashboard and de-dupe channel env vars (#37872) * feat(desktop): enrich profiles dashboard and de-dupe channel env vars Add active-profile switching, role descriptions (manual + auto-generate via the auxiliary LLM), per-profile model selection, and gateway-running / distribution badges to the GUI Profiles page. New profile creation gains clone-all, optional description and model assignment. Hide messaging-platform credentials (channel_managed) from the Keys/Env page since the Channels page is the canonical surface for them, and relabel the trimmed "messaging" category as "Gateway". Co-authored-by: Cursor <cursoragent@cursor.com> * fix(desktop): address review feedback on profiles/env changes - ProfilesPage: scope the action-menu outside-click handler to the menu's own container via a ref so opening one card's menu no longer leaves others open. - EnvPage: route the "Gateway" label and hint through i18n (t.common.gateway / gatewayHint) instead of hard-coded English, with an English fallback for untranslated locales. - web_server: only report description_auto=true when auto-generation actually succeeded. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(desktop): address second-round review on profiles - ProfilesPage: treat describe-auto success by null-checking the description and trust the response's description_auto flag instead of assuming true; disable the model-editor Save button unless the selected choice resolves to a real /api/model/options entry (avoids silent no-op saves). - tests: cover the new profile endpoints (active get/set + 404, description round-trip + 404, model round-trip + 400 validation, and describe-auto success/failure contracts). Co-authored-by: Cursor <cursoragent@cursor.com> * fix(desktop): more profiles review fixes (toggles, races, tests) - ProfilesPage: use the canonical `active` returned by setActiveProfile; make the SOUL/description/model action-menu items toggle their editor closed when already open; guard description save/auto-describe against stale responses via an activeDescRequest ref so a late reply can't clobber a different open editor. - tests: assert /api/env channel_managed classification matches _channel_managed_env_keys(). Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> * fix(tui): save TUI /save snapshots under Hermes home with system prompt (#38251) * fix(tui): save TUI /save snapshots under Hermes home with system prompt The TUI gateway's session.save RPC wrote hermes_conversation_<ts>.json to the workspace/project CWD via os.path.abspath(...) and only exported model and messages. This diverged from the classic CLI /save (which writes under the Hermes profile home) and from the dashboard save (which includes the system prompt). Write the snapshot under get_hermes_home()/sessions/saved/ and include system_prompt, session_id, and session_start so the TUI export matches the CLI and dashboard behavior. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(tui): prefer agent.session_start for /save export; assert it in test Address review feedback: derive session_start from the agent's session_start datetime (matching the classic CLI export) and fall back to the gateway session's created_at only when unavailable. Assert session_start in the regression test. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> * fix(desktop): self-update rebuilds and relaunches cleanly on macOS The macOS DMG / in-app update could leave Hermes unable to relaunch: the staged updater rebuilt the desktop without managed Node on PATH ("npm not found"), never installed the rebuilt bundle over the running app, and could race itself on `git stash`. Child install scripts also inherited a deleted cwd from the .app bundle replaced during self-update. - update.rs: prepend $HERMES_HOME/node/bin + venv bin to the rebuild PATH; read --branch / --target-app from args; add a macOS "install" stage that dittos the rebuilt bundle over the target app, clears quarantine, and relaunches via `open` (rolling back on a failed swap); guard start_update with an AtomicBool so concurrent startUpdate() calls can't race git stash. - main.cjs: pass --branch <configured> and --target-app <running bundle> to the staged updater, and spawn it with HERMES_HOME + managed Node/venv on PATH and cwd=HERMES_HOME. - bootstrap.rs: launch the desktop via `open <App>.app` on macOS instead of exec'ing Contents/MacOS/Hermes, avoiding cwd/quarantine issues post-rebuild. - powershell.rs: pin child install scripts to a stable cwd so they don't emit getcwd errors when the launching .app is replaced mid-install. - failure.tsx: in update mode show "Update didn't finish" / "Retry update" and retry via startUpdate() instead of re-running the installer bootstrap. * fix(desktop): dedupe clipboard image paste Chromium exposes the same pasted image on both DataTransfer.items and .files as distinct Blob objects, which attached twice. Prefer items and skip the files mirror when items already yielded images. * fix(installer): stop mislabeling stdout-style progress as stderr Both installers (Electron bootstrap-runner + Tauri) hardcoded a literal `stderr: ` prefix onto every line that arrived on fd 2. Tools like uv/pip/git/npm write normal progress to stderr by design, so routine install output showed up tagged as "stderr" (and rendered red in the Tauri progress UI), making a healthy install look like it was erroring. Carry the stream as structured metadata (`stream: 'stdout' | 'stderr'`) on the log event instead of mangling the line text. The UI now styles stderr subtly (dimmed) rather than alarmingly, and the persistent forensic logs keep their stdout/stderr distinction. * fix(dashboard): clamp PTY resize dimensions for WSL2 winsize garbage (#38200) * fix(dashboard): clamp PTY resize dimensions for WSL2 winsize garbage WSL2 reports columns=131072, rows=1 from a broken winsize probe. The dashboard /chat tab forwards xterm.js dimensions through PtyBridge.resize(), which packs them as unsigned short via struct.pack. 131072 > 65535 raised struct.error — uncaught (only OSError was handled) — breaking the resize path and leaving the TUI laid out for a one-row, absurdly-wide screen, which surfaces as blank/disappearing text. Clamp cols/rows to a sane [1, 2000]x[1, 1000] range before packing. Non-finite/non-integer probes fall back to the minimum so nothing can reach struct.pack and raise. * test(dashboard): de-flake pub/events broadcast test test_pub_broadcasts_to_events_subscribers round-tripped a frame through two nested Starlette TestClient WebSocket portals within a 10s wall-clock budget. Under heavy parallel CI load a starved ASGI thread occasionally blew that budget even though the server logic is correct, producing intermittent 'broadcast not received within 10s' failures. Drive _broadcast_event directly under asyncio with fake subscribers instead. Same fan-out contract (verbatim delivery to every subscriber on the channel, nothing to other channels), zero scheduling surface. Runs in ~0.3s, deterministic across 10 consecutive runs. * fix(gateway): decode schtasks output with locale encoding on Windows _exec_schtasks ran schtasks.exe with text=True but no encoding/errors, so localized Windows (e.g. Chinese) output in the console code page raised UnicodeDecodeError tracebacks from subprocess' reader threads during `hermes gateway status`. Decode with the locale's preferred encoding and errors="replace" so non-UTF-8 status output is read cleanly. Fixes #38172 * test(gateway): cover schtasks locale-safe decoding on Windows Assert _exec_schtasks passes an explicit encoding and errors="replace" to subprocess.run, and that _schtasks_encoding falls back to utf-8 when the locale lookup is empty or raises (#38172). * docs: remote desktop connect needs --tui on the backend (#38350) The Desktop App and Web Dashboard remote-connect instructions told users to start the backend with `hermes dashboard --no-open --insecure --host 0.0.0.0`, omitting --tui. Without --tui the embedded-chat WebSockets (/api/ws, /api/pty) are refused, so the desktop passes the /api/status health check and reports the backend "ready" — but chat never works because the socket is closed on connect. - Add --tui to both backend command blocks (with an inline why-comment). - Explain that the desktop chat runs over /api/ws + /api/pty and needs the embedded-chat surface enabled; a plain dashboard/gateway is not enough. - Add a troubleshooting entry for the exact symptom (connects, says ready, chat dead) on both pages. * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix(installer): pass LogStream to emit_log calls from #38296 PR #38296 added four emit_log() calls using the old 3-arg signature, but main had already changed emit_log to take a `stream: LogStream` argument (#38312, "stop mislabeling stdout-style progress as stderr"). The two PRs touched different lines, so the merge auto-resolved with no conflict and left main unable to compile the bootstrap installer (E0061: 4 args expected, 3 supplied). Supply the missing stream: Stdout for the update/install progress lines and Stderr for the "could not auto-launch desktop" failure, matching the convention from #38312. cargo check passes. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(desktop): persist pins, reconnect after sleep, dedupe session search Four related desktop session-management bugs: - Pins lost until refresh: pinned sessions are joined against the paginated in-memory session list, so a pinned chat that aged off the most-recent page got evicted on the next refresh (every message.complete triggers one) and the Pinned section went empty. mergeWorkingSessions -> mergeSessionPage now also preserves pinned rows (matched by live id or lineage root). Pin id checks in the chat header, command center, and delete/archive are normalized to the durable sessionPinId so pins survive auto-compression. - Stuck on "Starting Hermes" after sleep: macOS sleep drops the renderer WebSocket; nothing reconnected on wake so the composer stayed disabled. The gateway boot hook now auto-reconnects with backoff on close/error and on wake signals (powerMonitor resume/unlock-screen IPC, window online, visibilitychange). connect() gains an open timeout so a hung reconnect can't deadlock in 'connecting'. Composer placeholder distinguishes "Reconnecting to Hermes" from a cold start. - Loses chats from itself: the same hard-replace that dropped pins also dropped loaded sessions; mergeSessionPage keeps them. - Multiple copies/branches in search: /api/sessions/search deduped only by raw session_id, so compression segments and branches surfaced as separate hits. It now dedupes by lineage root and returns the live compression tip, matching the session_search tool's behavior. * fix(desktop): guard reconnect sockets and keep branch search precise Avoid stale WebSocket events from an old reconnect attempt flipping the gateway state after a newer socket opens. Also limit session-search dedupe to compression edges so branch-specific hits still open the branch instead of collapsing to the parent. * fix(packaging): ship locales/ i18n catalogs in wheel, sdist, and Nix (#38383) * fix(packaging): ship locales/ i18n catalogs in wheel, sdist, and Nix locales/ is a bare data dir (no __init__.py), invisible to packages.find and package-data. Sealed installs (pip wheel, Nix store venv) dropped it, so gateway/CLI commands rendered raw i18n keys like gateway.reset.header_default. - pyproject: [tool.setuptools.data-files] locales = ["locales/*.yaml"] (wheel) - MANIFEST.in: graft locales (sdist) - agent/i18n._locales_dir: env override -> source -> sysconfig data scheme - nix/hermes-agent.nix: copy locales into the store + set HERMES_BUNDLED_LOCALES as defense-in-depth. The wheel's data-files already materialize into the uv2nix venv, so resolution works with no env var; the override pins the store path against a future uv2nix change that could drop data-files. - tests: metadata regression, wheel + sdist build-install smoke tests, and a bundled-locales flake check that verifies BOTH the wrapper override and the env-var-less data-files path. Smoke test wired into CI. Closes #23943, #27632, #35374. Supersedes #23966, #27716, #30261, #33841, #35429, #35494, #35735, #36697. * test: cap locale e2e timeout, tighten catalog count guard The two wheel/sdist e2e tests inherit the global --timeout=30 from addopts; a cold-CI run (isolated build env + venv create + network pip install) can plausibly exceed it. Add @pytest.mark.timeout(300) so they don't ride the unit-test budget and flake intermittently. Also assert the shipped catalog count equals len(SUPPORTED_LANGUAGES) instead of a hardcoded >=16 floor, so the guard self-updates and trips on a single dropped catalog (not just a fully-empty graft). * fix(installer): never brick the install when a self-update swap fails The macOS self-update bundle swap (install_macos_app_update, added in #38296) could leave the user with NO app installed. If moving the existing /Applications/Hermes.app aside failed, the code deleted the running app outright and set moved_old=false; if the subsequent move of the freshly built bundle into place then also failed, the rollback was gated on moved_old (now false) and skipped — leaving the target deleted with no replacement. Extract the swap into swap_in_new_bundle() with a strict invariant: on ANY failure path the target is left pointing at a working bundle (either the original, rolled back, or untouched) and is never deleted with no replacement. Also clean up the staged .hermes-update-new copy on the failure paths instead of orphaning it. Add unit tests covering the happy path, the rollback-on-install-failure path, and the catastrophic both-moves-fail path. The catastrophic-path test was verified to FAIL against the old code ("original app must NOT be deleted on failure") and pass against the fix. * test(installer): cover the post-update relaunch/install target derivation The macOS self-update relaunches and installs over the app it derives via resolve_hermes_desktop_app (.../Hermes.app/Contents/MacOS/Hermes -> .../Hermes.app). That derivation is load-bearing for both the ditto install target and the auto-relaunch (open <app>), but had no test. Add unit coverage: - resolve_hermes_desktop_app_finds_built_bundle: a fake built release tree resolves to the .app bundle on macOS (and the exe elsewhere). - resolve_hermes_desktop_app_is_none_without_a_build: no build => None. Verified the positive test FAILS if the .app parent-walk is wrong (e.g. one too few .parent() hops), so it's a real guard against a regression that would break the post-update relaunch target. cargo test -> 17 passed. * feat(cli): make `hermes portal` the human-readable Portal onboarding alias `hermes portal` (no subcommand) now runs the one-shot Nous Portal onboarding — OAuth login, switch provider to Nous, offer Tool Gateway — identical to `hermes setup --portal` and the human-readable alias for `hermes auth add nous --type oauth` (which still works). The prior status default moves to `hermes portal info`; `status` is kept as a hidden back-compat alias. `open`/`tools` subcommands are unchanged. User-facing hints and docs (status.py, conversation_loop 401 guidance, SystemPage, README, website docs + zh-Hans) now point at `hermes portal` / `hermes portal info`. `--manual-paste` references keep the explicit auth command since `hermes portal` does not expose that flag. * fix(setup): point Portal login-failure retry hints at `hermes portal` The two retry hints inside _run_portal_one_shot (shown when the OAuth login fails) still suggested `hermes auth add nous --type oauth`. Since this path backs both `hermes portal` and `hermes setup --portal`, point users at the new human-readable `hermes portal` for consistency. * fix(desktop): prevent IME Enter from splitting messages and viewport resize from disarming scroll anchor (#38333) * fix(desktop): prevent IME Enter from splitting messages and viewport resize from disarming scroll anchor Two fixes for the Hermes Desktop composer: 1. IME composition Enter was treated as message submission. When a Korean/ Japanese/Chinese IME is composing text and the user presses Enter to finalise the preedit, handleEditorKeyDown fired submitDraft() because it did not check event.nativeEvent.isComposing. The assistant-ui hidden textarea already guards this correctly; the custom contentEditable handler was missing it. Added an early return when isComposing is true. 2. Viewport resize (composer expand/collapse, window resize) was disarming the scroll sticky-bottom anchor. When the composer grows, the thread viewport shrinks, the browser adjusts scrollTop down to keep content visible, and the onScroll handler misread this as a user scroll-up. Added lastClientHeightRef tracking so the disarm condition now requires BOTH stable scrollHeight AND stable clientHeight before treating a scrollTop decrease as user intent. Fixes: random mid-message sends during IME typing; scroll jumps when the composer resizes or the window changes size. * fix(desktop): prevent virtualizer measurement adjustments from fighting scroll anchoring The virtualizer's measureElement callbacks trigger scroll adjustments when item sizes differ from estimates. These fight our ResizeObserver + pinToBottom loop, creating visible rubber-banding (view snaps to composer then jumps back up), even during idle. Three changes: 1. React.memo on VirtualizedThread to stop parent re-renders cascading 2. Shared stickyBottomRef so scrollToFn can check bottom state 3. scrollToFn override: skip adjustments when user is at bottom * fix(desktop): use stable useCallback ref instead of inline arrow for onBranchInNewChat The inline arrow `messageId => void branchInNewChat(messageId)` created a new function reference on every render. This cascaded through: desktop-controller → ChatView → Thread → useMemo([...onBranchInNewChat]) → new messageComponents object → VirtualizedThread receives new prop → React.memo overridden → virtualizer recalculates → measurement adjustments trigger scroll jumps at the 15-second useStatusSnapshot interval. Pass the already-useCallback'd branchInNewChat directly. * fix(desktop): use ctrlEnter submitMode on hidden textarea + gate ResizeObserver on isRunning Two root-cause fixes: 1. IME message splitting: The hidden ComposerPrimitive.Input textarea had submitMode='enter' (default), so any Enter keydown it received — even during IME composition — triggered form.requestSubmit(). Changed to submitMode='ctrlEnter' so only the contentEditable div (which correctly checks isComposing) handles plain-Enter submission. 2. Scroll jumps during idle: The ResizeObserver auto-follow loop was active even when the thread wasn't running, causing spurious pinToBottom calls whenever any layout shift occurred (browser reflow, font load, GPU cache eviction). Gated the ResizeObserver on thread.isRunning so auto-scroll only follows during active streaming. User messages still pin via useLayoutEffect, and thread.runStart still calls jumpToBottom. * fix(desktop): keep chat bottom anchor stable through idle layout shifts * fix(desktop): prevent code block shrink scroll bounce * fix(desktop): release bottom height lock on run completion * fix(desktop): keep streaming code blocks rendered * fix(desktop): keep bottom anchored through final render * fix(desktop): render streaming reasoning code blocks * feat(desktop): add subtle streaming block animations * feat(cli): make `hermes portal` run the full quick-setup Nous flow (model picker) `hermes portal` / `hermes setup --portal` previously logged in and set provider=nous but left the model UNSELECTED (blank -> runtime default) and never showed a picker — unlike the first-time quick setup, which runs the model picker. Route `_run_portal_one_shot` through `_model_flow_nous` — the exact same routine quick setup (`_run_first_time_quick_setup`) and `hermes model` -> Nous use. It handles both the logged-out path (device-code OAuth, which picks a model internally) and the logged-in path (curated Nous model picker), then offers the Tool Gateway opt-in and sets provider=nous. Net effect: `hermes portal` now offers a model picker every time and is a true single-command collapse of quick setup's Nous step. Removes the hand-rolled auth_add_command + manual provider write + separate Tool Gateway prompt (now a single source of truth). Re-syncs the in-memory config from disk afterward so a caller's later save_config can't clobber the model/provider written by the login flow. Docs (CLI help, portal_cli docstrings, nous-portal EN + zh-Hans) updated to mention model selection. New regression test asserts `_run_portal_one_shot` delegates to `_model_flow_nous`. Verified live: `hermes portal` now shows the 27-model curated picker, 'Skip (keep current)' preserves prior provider/model. * fix(cli): harden `hermes portal` SystemExit handling + finish model-pick doc sweep Self-review of #38465 surfaced three real items: 1. SystemExit escape (defense): `_login_nous` raises SystemExit(130)/(1) on cancel/failure. The logged-out login path inside `_model_flow_nous` catches it, but the expired-session re-login path (main.py) only catches Exception, so a Ctrl-C during re-auth could propagate past `_run_portal_one_shot` and kill the CLI. Add SystemExit to the portal handler so all cancel/abort cases end with the graceful 'Setup cancelled / retry later' message. 2. Doc sweep: the model-pick step was only added to the bare-`hermes portal` prose. Propagate it to the surfaces describing `hermes setup --portal` behavior that still omitted model selection: - `--portal` argparse help (main.py) - nous-portal.md intro + the numbered 'what it does' step list (EN + zh-Hans) - run-hermes-with-nous-portal.md 'default model after setup --portal' line, which was now contradictory (there's a picker, not a forced default) (EN + zh) 3. Test coverage: add parametrized regression test asserting the portal handler swallows KeyboardInterrupt / EOFError / SystemExit (returns None, no escape). Note on 'Skip (keep current)': delegating to _model_flow_nous means picking Skip preserves the prior provider instead of force-switching to nous — this is intentional and matches quick setup exactly; docs now say 'sets Nous as your provider (when you pick a model)' rather than unconditionally. * fix(packaging): modernize project.license to PEP 639 SPDX string (#38353) * fix(packaging): modernize project.license to PEP 639 SPDX string Drops the SetuptoolsDeprecationWarning ('project.license as a TOML table is deprecated') emitted on every editable build under setuptools>=77 by switching license = { text = "MIT" } to the SPDX string form plus an explicit license-files entry. Bumps build-system requires to setuptools>=77 so an older build backend can't reject the string form. The warning was non-fatal (builds succeed with it) but surfaces prominently in install.ps1 build-failure output, where it gets mistaken for the cause of unrelated Windows build_editable crashes. * fix(packaging): bound setuptools build requirement per supply-chain policy Add the <83 upper bound to setuptools>=77.0 so the dep-bounds supply-chain gate (>=floor,<next_major) passes. * fix(skills): document xurl X Article ingestion * fix(docker): bake hindsight-client into the image (#38128) (#38530) …

…Research#38089 (NousResearch#38105) Follow-up to NousResearch#38089. The merged PR removed --recurse-submodules from the installer, CI, and getting-started docs, but missed the same stale clause in: - CONTRIBUTING.md (Prerequisites table) - website/docs/developer-guide/contributing.md (table + clone command) - zh-Hans mirror of the developer-guide contributing doc git-lfs is kept in the Git requirement rows since it's a separate, real prerequisite. No .gitmodules has existed since the Atropos RL submodule was removed in NousResearch#26106.

…ousResearch#26106) * chore: remove Atropos RL environments, tools, tests, skill, and tinker-atropos submodule Delete: - environments/ (43 files — base env, agent loop, tool call parsers, benchmarks) - rl_cli.py (standalone RL training CLI) - tools/rl_training_tool.py (all 10 rl_* tools) - tests: test_rl_training_tool, test_tool_call_parsers, test_managed_server_tool_support, test_agent_loop, test_agent_loop_vllm, test_agent_loop_tool_calling, test_terminalbench2_env_security - optional-skills/mlops/hermes-atropos-environments/ - tinker-atropos git submodule + .gitmodules * chore: remove RL/Atropos references from Python source - toolsets.py: remove rl toolset block + update comment - model_tools.py: remove rl_tools group + update async bridging comment - hermes_cli/tools_config.py: remove RL display entry, _DEFAULT_OFF_TOOLSETS, setup block, and rl_training post-setup handler - tools/budget_config.py: remove RL environment reference in docstring - tests/test_model_tools.py: remove rl_tools from expected groups - tests/run_agent/test_streaming_tool_call_repair.py: fix stale cross-reference * chore: remove rl/yc-bench extras and tinker-atropos refs from pyproject.toml - Remove rl extra (atroposlib, tinker, fastapi, uvicorn, wandb) - Remove yc-bench extra - Remove rl_cli from py-modules - Remove [tool.ty.src] exclude for tinker-atropos - Remove [tool.ruff] exclude for tinker-atropos - Regenerate uv.lock * chore: remove tinker-atropos from install/setup scripts - setup-hermes.sh: remove entire tinker-atropos submodule install block - scripts/install.sh: remove both tinker-atropos blocks (Termux + standard) - scripts/install.ps1: remove tinker-atropos block - nix/hermes-agent.nix: remove tinker-atropos pip install line * chore: remove RL references from cli-config.yaml.example * docs: remove Atropos/RL references from README, CONTRIBUTING, AGENTS.md * docs: remove RL/Atropos references from website - Delete: environments.md, rl-training.md, mlops-hermes-atropos-environments.md - sidebars.ts: remove rl-training and environments sidebar entries - optional-skills-catalog.md: remove hermes-atropos-environments row - tools-reference.md: remove entire rl toolset section - toolsets-reference.md: remove rl row + update example - integrations/index.md: remove RL Training bullet - architecture.md: remove environments/ from tree + RL section - contributing.md: remove tinker-atropos setup - updating.md: remove tinker-atropos install + stale submodule update * chore: remove remaining RL/Atropos stragglers - hermes_cli/config.py: remove TINKER_API_KEY + WANDB_API_KEY env var defs - hermes_cli/doctor.py: remove Submodules check section (tinker-atropos) - hermes_cli/setup.py: remove RL Training status check - hermes_cli/status.py: remove Tinker + WandB from API key status display - agent/display.py: remove both rl_* tool preview/activity blocks - website/docs: remove RL references from providers.md + env-variables.md - tests: remove TINKER_API_KEY from conftest, set_config_value, setup_script * chore: remove RL training section from .env.example

…26106) * chore: remove Atropos RL environments, tools, tests, skill, and tinker-atropos submodule Delete: - environments/ (43 files — base env, agent loop, tool call parsers, benchmarks) - rl_cli.py (standalone RL training CLI) - tools/rl_training_tool.py (all 10 rl_* tools) - tests: test_rl_training_tool, test_tool_call_parsers, test_managed_server_tool_support, test_agent_loop, test_agent_loop_vllm, test_agent_loop_tool_calling, test_terminalbench2_env_security - optional-skills/mlops/hermes-atropos-environments/ - tinker-atropos git submodule + .gitmodules * chore: remove RL/Atropos references from Python source - toolsets.py: remove rl toolset block + update comment - model_tools.py: remove rl_tools group + update async bridging comment - hermes_cli/tools_config.py: remove RL display entry, _DEFAULT_OFF_TOOLSETS, setup block, and rl_training post-setup handler - tools/budget_config.py: remove RL environment reference in docstring - tests/test_model_tools.py: remove rl_tools from expected groups - tests/run_agent/test_streaming_tool_call_repair.py: fix stale cross-reference * chore: remove rl/yc-bench extras and tinker-atropos refs from pyproject.toml - Remove rl extra (atroposlib, tinker, fastapi, uvicorn, wandb) - Remove yc-bench extra - Remove rl_cli from py-modules - Remove [tool.ty.src] exclude for tinker-atropos - Remove [tool.ruff] exclude for tinker-atropos - Regenerate uv.lock * chore: remove tinker-atropos from install/setup scripts - setup-hermes.sh: remove entire tinker-atropos submodule install block - scripts/install.sh: remove both tinker-atropos blocks (Termux + standard) - scripts/install.ps1: remove tinker-atropos block - nix/hermes-agent.nix: remove tinker-atropos pip install line * chore: remove RL references from cli-config.yaml.example * docs: remove Atropos/RL references from README, CONTRIBUTING, AGENTS.md * docs: remove RL/Atropos references from website - Delete: environments.md, rl-training.md, mlops-hermes-atropos-environments.md - sidebars.ts: remove rl-training and environments sidebar entries - optional-skills-catalog.md: remove hermes-atropos-environments row - tools-reference.md: remove entire rl toolset section - toolsets-reference.md: remove rl row + update example - integrations/index.md: remove RL Training bullet - architecture.md: remove environments/ from tree + RL section - contributing.md: remove tinker-atropos setup - updating.md: remove tinker-atropos install + stale submodule update * chore: remove remaining RL/Atropos stragglers - hermes_cli/config.py: remove TINKER_API_KEY + WANDB_API_KEY env var defs - hermes_cli/doctor.py: remove Submodules check section (tinker-atropos) - hermes_cli/setup.py: remove RL Training status check - hermes_cli/status.py: remove Tinker + WandB from API key status display - agent/display.py: remove both rl_* tool preview/activity blocks - website/docs: remove RL references from providers.md + env-variables.md - tests: remove TINKER_API_KEY from conftest, set_config_value, setup_script * chore: remove RL training section from .env.example

alt-glitch added 9 commits May 15, 2026 04:12

chore: remove RL references from cli-config.yaml.example

2c9d25f

docs: remove Atropos/RL references from README, CONTRIBUTING, AGENTS.md

9be44c3

chore: remove RL training section from .env.example

f5d507d

alt-glitch requested a review from a team May 15, 2026 04:35

teknium1 approved these changes May 15, 2026

View reviewed changes

alt-glitch merged commit 5af672c into main May 15, 2026
20 of 23 checks passed

alt-glitch deleted the hermes/hermes-a899b5e7 branch May 15, 2026 05:06

alt-glitch mentioned this pull request May 15, 2026

ci: add PyPI publish workflow (salvaged from #25901) #26148

Merged

5 tasks

0xarkstar mentioned this pull request May 18, 2026

feat(gateway,skills): unified skill trigger framework with Discord components #16530

Closed

alt-glitch mentioned this pull request May 19, 2026

chore: add tinker atropos submodule #28715

Closed

wkramx mentioned this pull request May 27, 2026

fix(docs): add stub page for removed rl-training to fix broken links #33398

Closed

ethernet8023 mentioned this pull request Jun 3, 2026

fix(windows): remove stale submodule references that break Git-for-Windows clone #38089

Merged

18 tasks

teknium1 mentioned this pull request Jun 3, 2026

fix(docs): remove remaining stale submodule references missed by #38089 #38105

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: remove Atropos RL environments and tinker-atropos integration#26106

chore: remove Atropos RL environments and tinker-atropos integration#26106
alt-glitch merged 9 commits into
mainfrom
hermes/hermes-a899b5e7

alt-glitch commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alt-glitch commented May 15, 2026

Summary

What's removed

Deleted entirely (~15,700 lines)

Modified (~30 files)

Not touched (historical)

Verification

Uh oh!

github-actions Bot commented May 15, 2026

🚨 CRITICAL Supply Chain Risk Detected

🚨 CRITICAL: Install-hook file added or modified

Uh oh!

github-actions Bot commented May 15, 2026

🔎 Lint report: hermes/hermes-a899b5e7 vs origin/main

ruff

ty (type checker)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

🔎 Lint report: `hermes/hermes-a899b5e7` vs `origin/main`