Skip to content

chore: sync with upstream main (2026-05-15)#33

Merged
bot-ted merged 1445 commits into
mainfrom
sync/upstream-20260515
May 15, 2026
Merged

chore: sync with upstream main (2026-05-15)#33
bot-ted merged 1445 commits into
mainfrom
sync/upstream-20260515

Conversation

@bot-ted

@bot-ted bot-ted commented May 15, 2026

Copy link
Copy Markdown
Owner

Daily sync with upstream. Auto-created by cron job.

New commits since last sync (%Y->- (origin/main, origin/HEAD, main)):

db84a78e6 fix(langfuse): complete observability fix — trace I/O, tool outputs, placeholder credentials (closes #22342, #22763) (#26320)
f199cd9f8 chore(release): map brian@dralth.com to btorresgil for #22345 salvage (#26319)
77276070f fix(codex-runtime): de-dup [plugins.X] tables and stop leaking HERMES_HOME into config.toml
274217316 fix(codex-runtime): keep migrated root keys top-level
13c72fb48 fix(tools): wrap browser provider network calls with error handling
6af994232 fix(url-safety): allow only http and https schemes
837395685 fix(slack): guard split()[0] against whitespace-only command text
94bdc63ff chore(release): add AUTHOR_MAP entry for nidhi-singh02
eacb398f7 fix(tools): add return_exceptions to asyncio.gather in web_tools
5301cc212 chore(release): add AUTHOR_MAP entry for nidhi-singh02
c4a21d783 fix(cli): log swallowed exception in runtime model auto-detection
59c7cc64f chore(release): add AUTHOR_MAP entry for amethystani
55f3262e7 fix(mcp): pre-compile env-var regex and unify interpolation
5360b5424 fix(providers): set User-Agent on ProviderProfile.fetch_models
647cc0bb0 chore(release): add AUTHOR_MAP entries for InB4DevOps
4f8aaf104 perf(run_agent): accumulate length-continuation prefix via list+join
b6e07417c feat(cli): show YOLO mode warning in banner and status bar
47614dbfc chore: wire simplex docs into sidebar + AUTHOR_MAP
09d9724a0 feat(gateway): add SimpleX Chat platform plugin
85782a4ed feat(acp): hermes acp --setup-browser bootstraps browser tools for registry installs
9f57f2286 chore(release): add AUTHOR_MAP entry for buntingszn
6682f91b8 feat(cron): support name-based lookup for job operations
05d9f641c docs(cron): worked recipes for the wakeAgent pre-run gate (#26229)
9329e0669 feat(image-gen): actionable setup message when no FAL backend is reachable (#26222)
04b1fdaec security(deps): add upper bounds to 5 loose deps + document supply chain policy (#24226)
681778a0b fix(whatsapp): fail fast when Baileys sendMessage hangs
0161d4bb6 chore(release): add AUTHOR_MAP entry for CoinTheHat
814c60092 fix: clean stale conversation mappings on response eviction/deletion
23ac522d3 fix(gateway): isinstance-guard string-form 429 error body
e0e7397c3 fix(session): persist auto-reset state across gateway restarts
...and 1414 more

Total: 1444 commits

teknium1 and others added 30 commits May 12, 2026 16:34
… provider

_resolve_task_provider_model drops cfg_base_url and cfg_api_key when
returning a named provider, causing configured API keys and base URLs
to be lost. Pass them through so named providers can use custom
endpoints while still resolving credentials from provider-specific
env vars.

Closes NousResearch#20139
…s exit

Fixes NousResearch#24127

On headless Linux VPS (no DISPLAY or WAYLAND_DISPLAY), some Python
webbrowser backends register TUI programs such as links, lynx, or
www-browser.  GenericBrowser.open() spawns these without redirecting
stdin/stdout, allowing them to take over the terminal.  This can cause
the process to receive SIGHUP and exit immediately even though uvicorn
bound the port successfully, producing a misleading success message
followed by an empty --status.

Fix: detect headless Linux at startup and skip the auto-open when no
display server is available.  On such systems the URL is still printed
so the user can open it manually or via an SSH tunnel.  The webbrowser
call is also wrapped in a try/except so any unexpected failure on other
platforms is silently absorbed rather than surfacing as an unhandled
exception in the daemon thread.
Replace the hardcoded i18n placeholder "~/.hermes/config.yaml" with the
real config_path returned from api.getStatus(), falling back to the i18n
string while loading or on API failure.

Co-authored-by: aqilaziz <gonzes7@gmail.com>
…ck leakage

When TUI exits, tmux captures some TUI output into its scrollback buffer.
On restart, stale scrollback content appears at the top of screen before
AlternateScreen takes over.

Add ANSI escape sequences at startup:
- ESC[2J  clear visible screen
- ESC[H   cursor home
- ESC[3J  clear scrollback buffer
Replace `len(label)` with `HermesCLI._status_bar_display_width(label)`
in two places where the response box top border is rendered.

`len()` counts characters, not terminal columns. CJK characters like
`测` and `试` each occupy 2 columns, causing the top border
`╭─ 测试 ───╮` to render 2 columns wider than the bottom border
`╰─────────╯`.

The `_status_bar_display_width` helper already exists (line 2881) and
uses `prompt_toolkit.utils.get_cwidth` for proper CJK width calculation.
…-decodable

The fuzzy @-file completer shells out to 'rg --files' via subprocess.run
with text=True. On Windows, Python 3.13 decodes stdout using the system
ANSI codepage (cp1252), so any filename containing bytes like 0x81/0x8f
crashes the background reader thread with UnicodeDecodeError. The
exception is swallowed inside subprocess, leaving proc.stdout=None, and
the next line ('proc.stdout.strip()') blows up with:

  AttributeError: 'NoneType' object has no attribute 'strip'

This takes down the prompt_toolkit event loop and forces 'Press ENTER to
continue' until the user clears the @-query.

Fix:
- Pass encoding='utf-8', errors='replace' so rg's UTF-8 output is decoded
  consistently across platforms and unmappable bytes don't crash.
- Guard 'proc.stdout' with a None check before .strip(), so a future
  reader-thread failure degrades gracefully instead of breaking input.
…usResearch#24628)

When the user runs /stop or a session is interrupted mid-flight, the
👀 in-progress reaction lingered on the user's message indefinitely.
Without another agent run to swap it for 👍/👎, the eyes stayed there
forever — visually misleading (looks like the agent is still working).

Fix: on ProcessingOutcome.CANCELLED, call set_message_reaction with
reaction=None to clear all reactions on the message. Documented Bot API
semantics (equivalent to Bot API 10.0's deleteMessageReaction, but works
on PTB 22.6 already without the version bump).

Test changes:
- Renamed test_on_processing_complete_cancelled_keeps_existing_reaction
  → test_on_processing_complete_cancelled_clears_reaction; updated
  assertion to expect set_message_reaction(reaction=None).
- Added test_on_processing_complete_cancelled_skipped_when_disabled
  (TELEGRAM_REACTIONS=false short-circuits).
- Added test_clear_reactions_handles_api_error_gracefully and
  test_clear_reactions_returns_false_without_bot to cover the new
  _clear_reactions helper.
…ing (NousResearch#24630)

Three follow-ups to PR NousResearch#24168 found during live E2E testing on TS/bash files:

1. typescript-language-server now installs the typescript SDK (tsserver)
   alongside it. Without that sibling install, initialize() failed with
   "Could not find a valid TypeScript installation" and the server was
   marked broken — no diagnostics ever reached the agent. New extra_pkgs
   field on INSTALL_RECIPES makes that explicit and reusable for future
   peer-dep cases.

2. _check_lint now treats "linter command exists on PATH but cannot
   actually run" as skipped instead of error. The motivating case is
   npx tsc when typescript is not in node_modules — npx prints its
   "This is not the tsc command you are looking for" banner and exits
   non-zero, which previously blocked the LSP semantic tier (gated on
   success or skipped). Pattern-matched per base command (npx,
   rustfmt, go) so genuine lint errors still flow through normally.

3. hermes lsp status now surfaces a Backend warnings section when
   bash-language-server is installed but shellcheck is missing. The
   server itself spawns fine but bash-language-server delegates
   diagnostics to shellcheck — without it on PATH the integration
   looks alive but never reports any problems. Same warning is
   logged once at server spawn time.

Validation:

- 12 new tests in tests/agent/lsp/test_install_and_lint_fixes.py:
    * recipe carries typescript SDK
    * _install_npm passes both pkg + extras to npm CLI
    * backwards compat: recipes without extras still work
    * _backend_warnings quiet when bash absent / both present
    * _backend_warnings fires when bash installed without shellcheck
    * status output includes the Backend warnings section
    * _looks_like_linter_unusable catches the npx tsc banner
    * real TS type errors not misclassified as unusable
    * unfamiliar linters fall through normally
    * _check_lint returns skipped on npx tsc unusable
    * _check_lint returns error on real tsc type errors
- Full lsp + file_operations test suite: 245/245 pass
- Live E2E:
    * try_install("typescript-language-server") installs both packages
      into node_modules
    * write_file(bad.ts, ...) returns lint=skipped + lsp_diagnostics
      with two real TS errors (was lint=error, no lsp_diagnostics)
    * hermes lsp status renders the shellcheck warning when bash is
      installed but shellcheck is not on PATH
Salvage of NousResearch#21063 — adds 'Weixin, and more' to module-level docstrings
in gateway/__init__.py, gateway/config.py, gateway/platforms/base.py
and the 'hermes gateway' subparser description.

Co-authored-by: wuwuzhijing <chuang.guo@hopechart.com>
_parse_target_ref() has no handler for XMPP JIDs (user@server or
room@conference.server), so they fall through to the final
`return None, None, False`. This causes send_message to fail when
targeting an XMPP chat by JID, since the JID is not numeric and
doesn't match any other platform pattern.

Add an explicit check for XMPP targets containing '@', matching the
existing Matrix pattern above it.
…rt it

Xiaomi MiMo's /v1/models endpoint returns 401 even with a valid API key,
causing hermes doctor to falsely report 'invalid API key'.

Add a `supports_health_check` field to ProviderProfile (default True).
Providers whose /models endpoint doesn't support auth verification can
set it to False. The doctor's dynamic provider discovery now reads this
field instead of hardcoding True.

The xiaomi provider plugin sets supports_health_check=False.
Tavily's /crawl endpoint requires Authorization: Bearer <key> in the header,
unlike /search and /extract which accept api_key in the JSON body.
Without the header, crawl returns 401 Unauthorized.
Cron jobs using `deliver: whatsapp` were silently dropped because the
resolver's home-channel env var dict in cron/scheduler.py listed every
messaging platform except whatsapp. _resolve_delivery_targets() returned
[] and no message was sent — but jobs.json marked the run successful and
no log line surfaced the failure.

The gateway adapter and the send_message tool path both honored
WHATSAPP_HOME_CHANNEL correctly; only the cron path missed.

Adds 'whatsapp' -> 'WHATSAPP_HOME_CHANNEL' to _HOME_TARGET_ENV_VARS.
Verified end-to-end with multiple cron pings landing in WhatsApp
self-chat after the fix.

Fixes NousResearch#22997
…ousResearch#24702)

PR NousResearch#24151 routed Portal Qwen (qwen3.6-plus) through the prefix_and_2
long-lived cache layout, attaching {"type":"ephemeral","ttl":"1h"}
markers to the tools[-1] entry and the stable system-prefix block.
That layout works for Portal Claude because Anthropic / OpenRouter on
Anthropic routes honour 1h TTL — but Portal Qwen ultimately proxies to
Alibaba DashScope, which documents a single "ephemeral" TTL of 5
minutes on its Context Cache. The ttl="1h" qualifier is silently
dropped upstream, so the two highest-value breakpoints (tools array +
system prefix) never land. Only the rolling-window 5m markers on the
last 2 messages cache, which matches the observed ~25% read rate.

Fix: keep Portal Qwen on cache_control via _anthropic_prompt_cache_policy
returning (True, False), but drop it from _supports_long_lived_anthropic_cache
so it rides the standard system_and_3 5m layout (system + last 3 messages,
all at 5m). Same 4 breakpoints, all in a TTL the upstream actually honours.

Refs: https://www.alibabacloud.com/help/en/model-studio/context-cache
      https://openrouter.ai/docs/features/prompt-caching (Alibaba Qwen
      section: "TTL: 5 minutes")

- _supports_long_lived_anthropic_cache: Portal scope narrowed back to Claude
- tests: flip the two qwen long-lived expectations to False, retitle
  non_claude_non_qwen_rejected -> non_claude_rejected
… path

Closes NousResearch#23064

When Hermes connects to Signal via signal-cli in daemon mode (linked
device setup), group messages sent from the user's phone were silently
dropped. The syncMessage handler only processed events where
destinationNumber equals the bot's own number (Note to Self).

Group messages from linked devices carry a groupInfo.groupId instead of a
destinationNumber. Extend the condition to also pass through sync messages
that have a groupId, so group messages are promoted to dataMessage and
reach the agent.
…empty list but PULSE_SERVER is set

In WSL2, sounddevice.query_devices() returns [] even when the
PulseAudio bridge is functional. The existing code already handled
the case where the query itself raises an exception, but it missed
the empty-list case.

This change treats an empty device list as non-fatal in WSL when
PULSE_SERVER is configured, matching the existing exception-handler
behavior.

Fixes: WSL users seeing 'No audio input/output devices detected'
even though paplay/arecord work fine.
…path

_session_info() used os.getcwd() which reflects the gateway process
working directory, not the user's actual working directory. This caused
the TUI status line to display incorrect paths (e.g. D:\HermesWork
instead of D:\Hermes\HermesWork) after agent turns that changed the
process cwd.

Align with session.create which already correctly reads TERMINAL_CWD
env var set by the CLI launcher.
…arch#24709)

- Note that typescript-language-server pulls in the typescript SDK
  automatically (peer-dep relationship was previously implicit and
  caused initialize failures when the SDK was absent).
- Add a Troubleshooting entry for the new Backend warnings section
  in hermes lsp status, with the shellcheck install commands across
  apt / brew / scoop.

Reflects what shipped in PR NousResearch#24630.
alt-glitch and others added 26 commits May 15, 2026 01:33
…ain policy (NousResearch#24226)

After the Mini Shai-Hulud supply chain campaign (May 2026) and the litellm
compromise (March 2026), codify the dependency pinning policy that was
established in PRs NousResearch#2810 and NousResearch#9801 but never written down for contributors.

Changes:
- pyproject.toml: Add tight upper bounds to the 5 deps that slipped
  through as review escapes from external contributor PRs:
  - hindsight-client>=0.4.22,<0.5 (was >=0.4.22)
  - aiosqlite>=0.20,<0.23 (was >=0.20)
  - asyncpg>=0.29,<0.32 (was >=0.29)
  - alibabacloud-dingtalk>=2.0.0,<3 (was >=2.0.0)
  - youtube-transcript-api>=1.2.0,<2 (was >=1.2.0)

  Pre-1.0 packages get <0.(current_minor+2) — tight enough to block
  hostile minor releases but loose enough to not require bumps every week.

- CONTRIBUTING.md: Add 'Dependency pinning policy' section under Security
  with the full rationale, table of source types + treatments, and examples.

- AGENTS.md: Add concise 'Dependency Pinning Policy' section for AI coding
  agents with the decision table and step-by-step checklist.

- supply-chain-audit.yml: Add dep-bounds job that fails PRs introducing
  PyPI deps without <ceiling upper bounds. Fires on pyproject.toml changes.
  Posts a PR comment with the specific unbounded specs found.

Refs: NousResearch#2796 NousResearch#2810 NousResearch#9801 NousResearch#24205
…hable (NousResearch#26222)

When the in-tree FAL path has no API key (and no managed gateway), the
handler used to return a bare 'FAL_KEY environment variable not set'
error. Users had no idea where to get a key, that a managed Nous
gateway exists, or that plugin-registered providers are an option.

Now `image_generate_tool` returns a structured multi-line message:
  - signup link (https://fal.ai)
  - managed-gateway status (if Nous tools are enabled)
  - pointer to `hermes tools` / `hermes plugins list` for alternate
    backends, so users on a stale `image_gen.provider` know where to look

The schema is untouched — `check_fn` still gates the tool out of the
schema when no backend is reachable at startup, consistent with every
other conditional tool. This patch fixes the call-time failure modes:
managed-gateway 5xx, plugin provider disappearing mid-session, etc.

Inspired by NousResearch#2546 / @Mibayy. The PR was ~5700 commits stale against
the new plugin-aware image_gen architecture, so this is a forward port
of the actionable-error idea rather than a cherry-pick.


Closes NousResearch#2543

Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
…ch#26229)

Adds three pre-run gate recipes to the cron docs:
- file-change gate (stat + mtime + state file)
- external-flag gate (file presence)
- SQL-count gate (user's own database, not state.db)

These are the use cases @iankar8 proposed adding as a parallel
'trigger' subsystem in NousResearch#2654. The existing `script` + `wakeAgent`
gate already covers all three at $0 — this lands the patterns as
documentation so users can find them, instead of adding a second
gating mechanism to the cron subsystem.
Cron mutation operations (run/pause/resume/remove) and 'hermes cron edit'
now accept a job name in addition to the hex ID, with case-insensitive
matching. Before this, 'hermes cron run my_job_name' died with
'Job with ID my_job_name not found' and forced the user to look up the
hex ID first.

The original PR matched by name but silently picked the first match when
two jobs shared a name. This version refuses to act on an ambiguous name
and surfaces every matching job (id, name, schedule, next_run_at) so the
caller can pick a specific ID.

- cron/jobs.py:
  - get_job() stays ID-only (preserves existing call-site semantics for
    web_server/api_server/curator/scheduler/test code that always passes
    real IDs).
  - resolve_job_ref() is the new name-or-ID resolver, used by pause/
    resume/trigger/remove_job. Exact ID match wins over a name match
    even if a different job's name happens to equal that ID. Ambiguous
    name match raises AmbiguousJobReference with all candidate IDs.
- tools/cronjob_tools.py: dispatch site uses resolve_job_ref, surfaces
  ambiguous matches as a structured error with the matching IDs.
- hermes_cli/cron.py: 'cron edit' uses resolve_job_ref so editing by
  name works and ambiguous names are reported with IDs.
- tests/cron/test_jobs.py: new TestResolveJobRef covering ID match,
  case-insensitive name match, ID-wins-over-name, ambiguous refusal,
  and that pause/resume/trigger/remove all refuse on ambiguity.

Closes NousResearch#2627
…gistry installs

The Zed ACP Registry path (uvx --from 'hermes-agent[acp]==X' hermes-acp)
gets a Python-only install. Browser tools depend on the agent-browser npm
package + Chromium, neither of which are in the wheel. Without an
explicit bootstrap, registry users have no path to working browser tools.

Ship a bundled, idempotent bootstrap script (Linux/macOS bash + Windows
PowerShell) inside acp_adapter/bootstrap/ as wheel package-data. New
entry points:

  hermes acp --setup-browser        # interactive; prompts before Chromium download
  hermes acp --setup-browser --yes  # non-interactive
  hermes-acp --setup-browser

The terminal-auth flow (hermes acp --setup) also offers the browser
bootstrap as a follow-up after model selection, so first-run registry
users get the option without knowing the flag exists.

Key design choices:
- npm install -g --prefix $NODE_PREFIX so we never need sudo. System Node
  on PATH is respected; only the install target is redirected to the
  user-writable Hermes-managed Node prefix.
- tools/browser_tool.py::_browser_candidate_path_dirs() already walks
  $HERMES_HOME/node/bin, so installed binaries are discovered with no
  agent-side code change.
- System Chrome/Chromium detection short-circuits the ~400 MB Playwright
  download when a suitable browser already exists.
- Bash + PowerShell live as ONE copy each under acp_adapter/bootstrap/.
  Not duplicated under scripts/. install.sh and install.ps1 keep their
  inline browser blocks for the source-checkout path.

E2E validated end-to-end:
  bash bootstrap_browser_tools.sh --skip-chromium
    → installs agent-browser into ~/.hermes/node/bin/
  tools.browser_tool._find_agent_browser()
    → returns the installed path
  check_browser_requirements()
    → returns True (browser tools register)

Tests:
- tests/acp/test_entry.py: 11 tests covering --setup-browser dispatch
  (linux + windows + --yes forwarding + failure propagation), the
  terminal-auth follow-up prompt path, and a package-data wheel-shipping
  assertion that catches any future pyproject.toml regression.

Docs: website/docs/user-guide/features/acp.md gains a 'Browser tools
(optional)' subsection with the two-line install + what-it-does.
SimpleX Chat (https://simplex.chat) is a private, decentralised messenger
with no persistent user IDs — every contact is identified by an opaque
internal ID generated at connection time. This adds it as a Hermes
gateway platform via the plugin system.

The adapter connects to a local simplex-chat daemon via WebSocket,
listens for inbound messages, and sends replies. Originally proposed in
PR NousResearch#2558 as a core-modifying integration; reshaped here as a self-
contained plugin under plugins/platforms/simplex/ with no edits to any
core file. Discovery is filesystem-based (scanned by gateway.config),
and the platform identity is resolved on demand via Platform("simplex").

Plugin contract:
- check_requirements() requires SIMPLEX_WS_URL AND the websockets package
- validate_config() / is_connected() accept env or config.yaml input
- _env_enablement() seeds PlatformConfig.extra (ws_url + home_channel)
- _standalone_send() supports out-of-process cron delivery
- interactive_setup() provides a stdin wizard for hermes gateway setup
- register() wires the adapter into the registry with required_env,
  install_hint, cron_deliver_env_var, allowed_users_env, and a
  platform_hint for the LLM.

Lazy dependency: the websockets Python package is imported inside the
functions that need it. The plugin is importable and discoverable even
when websockets is missing — check_requirements() simply returns False
until `pip install websockets` is run. No new pyproject extras are
introduced.

Environment variables:
  SIMPLEX_WS_URL             WebSocket URL of the daemon (required)
  SIMPLEX_ALLOWED_USERS      Comma-separated allowed contact IDs
  SIMPLEX_ALLOW_ALL_USERS    Set true to allow all contacts
  SIMPLEX_HOME_CHANNEL       Default contact for cron delivery
  SIMPLEX_HOME_CHANNEL_NAME  Human label for the home channel

Closes NousResearch#2557.
- Adds plugins/platforms/simplex docs page to the messaging sidebar
  between LINE and Open WebUI.
- Maps louismichalot@hotmail.com -> Mibayy in scripts/release.py so the
  attribution check on the salvage PR passes.
When running with --yolo, all dangerous command approvals are bypassed.
Make this state visible so users don't forget:

- Banner: '⚠ YOLO mode — all approval prompts bypassed' line in red, only
  shown when YOLO is active. Default case is silent (no extra line, no
  always-on 'restricted' label).
- Status bar: '⚠ YOLO' fragment appended in red (#FF4444 bold) across all
  three width tiers (<52, <76, ≥76) in both the plain-text fallback and
  the fragments builder.

Closes NousResearch#2663

Co-authored-by: Mibayy <Mibayy@users.noreply.github.com>
Replace O(n²) string concatenation of truncated_response_prefix in the
length-continuation retry loop with a list + ''.join(). Functionally
equivalent: same partial response on early return, same prepend on
final assembly. The legacy retry path is capped at 3 iterations, so
the practical wall-clock win is small, but the new idiom matches the
rest of the codebase and removes a needless repeated allocation.

Salvaged from PR NousResearch#2717 (the run_conversation portion only — trajectory
refactor dropped because it silently rewrote </tool_response> to </think>).

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Some catalog endpoints (OpenCode Zen, etc.) sit behind a WAF that
returns 403 for the default Python-urllib/<ver> User-Agent.  The
generic profile-based live fetch in providers/base.py was silently
failing for any such provider — falling through to the static catalog
and missing newly-launched models.

Set a generic 'hermes-cli/<version>' UA on the catalog probe so every
api_key provider profile benefits.  Verified live against opencode-zen:
before this change, profile.fetch_models() raised HTTP 403; after, it
returns 42 models including gpt-5.5, gpt-5.5-pro, kimi-k2.6, glm-5.1
and the *-free variants the static catalog doesn't list.

Also strip the now-stale comment in validate_requested_model() claiming
opencode-zen's /models returns 404 against the HTML marketing site —
the API endpoint at /zen/v1/models returns 200 with valid JSON.

Surfaced by NousResearch#2651 (@aashizpoudel) — fixes the same user-facing gap
their PR targeted, applied at the right layer so all api_key provider
profiles get live catalogs through the same code path.

Co-authored-by: Aashish Poudel <mr.aashiz@gmail.com>
Remove redundant inner `import re` and regex recompilation on every call in
_interpolate_env_vars. Add module-level _ENV_VAR_PATTERN compiled once.

Replace the separate _interpolate_value() in mcp_config.py (which used \w+
and would silently fail on env vars containing hyphens or dots) with the
shared _ENV_VAR_PATTERN from mcp_tool.py. Remove now-unused import re.
Replaces bare `except Exception: pass` with debug-level logging
so failures in local endpoint model discovery are diagnosable
instead of silently hidden.
Three asyncio.gather() calls in tools/web_tools.py ran without
return_exceptions=True. A single failing task (e.g. LLM rate limit on
one URL) would raise out of gather() and discard every other
successfully fetched/summarized result.

Pass return_exceptions=True and filter BaseException entries with a
warning log before unpacking. Affects:

- chunk summarization gather (large web_extract pages)
- firecrawl per-result LLM post-processing
- tavily crawl per-result LLM post-processing

Closes NousResearch#2744
PR NousResearch#2751 salvage. CI requires AUTHOR_MAP coverage for all
contributor commit emails.
When a user sends a Slack message like '/hermes   ' (trailing whitespace
after the slash) the legacy subcommand router hit `text.split()[0]` with
a truthy-but-whitespace-only `text`. `'   '.split()` returns `[]` →
IndexError, blowing up the slash handler before fallthrough to `/help`.

Switch to a two-step guard that materializes the parts list first and
indexes only if non-empty.

Salvaged from PR NousResearch#2752 by @nidhi-singh02. The PR's other two hunks
(`tools/file_operations.py`, `agent/anthropic_adapter.py`) are
unreachable in current code — `LINTERS` is a hardcoded constant dict
with no empty values, and the anthropic version-detection site is
already guarded by a `result.stdout.strip()` truthy check — so only the
slack hunk is taken.

Closes NousResearch#2745

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Wrap requests.post() in create_session() for browser_use, browserbase,
and firecrawl providers with requests.RequestException handling.
Connection timeouts and DNS resolution failures now surface as clean
RuntimeError messages instead of raw requests exception tracebacks.

Browser Use managed-gateway mode preserves raw exception propagation
so the existing idempotency-key retry semantics keep working.

Closes NousResearch#2746

Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
…_HOME into config.toml

Builds on @steezkelly's Bug A fix (NousResearch#25857, top-level default_permissions
via _insert_managed_block_at_top_level) by addressing the other two
config-corruption bugs described in NousResearch#26250:

Bug B (duplicate [plugins.X] tables)
  - Codex itself writes [plugins."<name>@<marketplace>"] tables to
    config.toml when the user runs `codex plugins enable` directly,
    before hermes-agent's managed block exists. On the next migrate run,
    _query_codex_plugins() re-discovers the same plugins via plugin/list
    and render_codex_toml_section() re-emits them inside the managed
    block. Codex's strict TOML parser then rejects the duplicate table
    header on startup.
  - Add _strip_unmanaged_plugin_tables() that drops [plugins.*] tables
    from the user-content portion of the file. Only run it when
    plugin/list succeeded — if the RPC failed we can't re-emit and
    must preserve the user's tables. plugin/list is the source of
    truth when it answers.

Bug C (HERMES_HOME pytest-tempdir leak into ~/.codex/config.toml)
  - _build_hermes_tools_mcp_entry() read HERMES_HOME directly from
    os.environ, so a sibling pytest's monkeypatch.setenv("HERMES_HOME",
    tmp_path) silently burned a transient pytest tempdir into the
    user's real ~/.codex/config.toml. After pytest reaped the tempdir,
    every codex-routed hermes-tools tool call failed silently.
  - Derive HERMES_HOME from get_hermes_home() (the canonical resolver
    that goes through the profile-aware path) and refuse to emit
    obvious test-tempdir paths via _looks_like_test_tempdir() as
    belt-and-suspenders for any other callsite that forgets to patch
    migrate().
  - test_enable_succeeds_when_codex_present in test_codex_runtime_switch.py
    invoked the real migrate() (no mock), writing to Path.home() / .codex
    using whatever HERMES_HOME the running pytest session had set. Add
    the same migrate patch the other apply() tests already use, so the
    suite stops touching the user's real ~/.codex/config.toml.

E2E verification (replicating the issue's repro):
  - Pre-state config.toml with user [mcp_servers.omx_team_run] +
    codex-installed [plugins."tasks@openai-curated"],
    HERMES_HOME="/private/var/folders/.../pytest-of-.../..."
  - On origin/main: tomllib refuses to load the result with
    "Cannot declare ('plugins', 'tasks@openai-curated') twice" AND
    the pytest-tempdir HERMES_HOME is burned in.
  - On this branch: file parses cleanly, default_permissions is
    top-level, exactly one [plugins."tasks@openai-curated"] table
    inside the managed block, no HERMES_HOME in the MCP env.

7 new regression tests covering all three bugs + the test-leak guard.
`bash scripts/run_tests.sh tests/hermes_cli/test_codex_runtime_*.py` —
95 passed, 0 failed.

Closes NousResearch#26250
…2345 salvage (NousResearch#26319)

PR NousResearch#22345 by @btorresgil authors commits as 'Brian Conklin
<brian@dralth.com>' (git config carries a different name/email than the
GitHub account). GitHub's commit-author mapping correctly attributes these
commits to @btorresgil based on the public-key registration, but Hermes'
release attribution audit reads the raw commit email, not the GitHub
mapping. Without this AUTHOR_MAP entry, salvaging NousResearch#22345 would fail
`scripts/contributor_audit.py` strict mode at release time.

Prerequisite for the langfuse trace fix salvage that cherry-picks
@btorresgil's commits onto current main.
…placeholder credentials (closes NousResearch#22342, NousResearch#22763) (NousResearch#26320)

* fix(langfuse): reject placeholder credentials with one-shot warning

When operators leave HERMES_LANGFUSE_PUBLIC_KEY / HERMES_LANGFUSE_SECRET_KEY
at a template value like 'placeholder', 'test-key', or 'your-langfuse-key',
the Langfuse SDK silently accepts the credentials at construction time and
drops every trace at flush time. No warning, no error — just an empty
Langfuse dashboard the operator only notices hours later.

Add prefix-based validation in _get_langfuse() against the documented
'pk-lf-' / 'sk-lf-' prefixes that Langfuse always issues server-side.
Anything else fires a single warning naming the offending env var(s)
with a log-safe value preview (full string for short placeholders so the
operator knows which template they left in place; truncated for long
values so a real secret pasted into the wrong field never hits the log),
then short-circuits via the existing _INIT_FAILED cache so the warning
fires once per process, not once per hook invocation.

The check sits after the 'Langfuse is None' SDK-installed guard so hosts
without the optional langfuse SDK don't see misleading 'set real keys'
hints when the actionable fix is 'pip install langfuse'. Missing
credentials remains the documented opt-out path and stays silent — no
log noise for unconfigured installs.

Fixes NousResearch#22763
Fixes NousResearch#23823

* fix(langfuse): use actual API request messages for generation input

on_pre_llm_request previously used the messages kwarg alone, which
could be None when Hermes passes the payload via request_messages,
conversation_history, or user_message instead. Add _coerce_request_messages
to pick the first available list across all variants, falling back to a
synthetic user message. Generations now show the real outbound payload
rather than an empty input.

* fix(langfuse): record tool call outputs in traces

Tool observations showed input (arguments) but output was always
undefined. Root cause: when tool_call_id is empty, pre_tool_call stored
observations under a unique time-based key that post_tool_call could
never reconstruct, so every tool span was closed without output by the
_finish_trace sweep.

Fix pre/post matching by routing empty-tool_call_id tools through a
per-name FIFO queue (pending_tools_by_name) instead of the time-based
key. Tools with a tool_call_id continue to use the id-keyed dict.

Also:
 - Preserve OpenAI-style nested function shape in serialized tool calls
   so Langfuse renders name/arguments correctly
 - Keep name + tool_call_id on role:tool messages for proper pairing
 - Backfill tool results onto the matching turn_tool_calls entry so the
   generation's tool-call record carries the result alongside arguments
 - Coerce request messages from whichever field the runtime provides
   (request_messages, messages, conversation_history, user_message)

* fix(langfuse): salvage-review polish — drop dead is_first_turn, shallow-copy request_messages, real threaded FIFO test

Self-review of the combined NousResearch#22345 + NousResearch#23831 salvage surfaced three issues
worth fixing in the same PR rather than as follow-ups:

1. Drop is_first_turn from the pre_api_request hook. The boolean expression
   `not bool(conversation_history)` was wrong: conversation_history is
   reassigned to None mid-run after compression (5 sites in run_agent.py),
   so the value flips False -> True mid-conversation on every post-compression
   API call. The langfuse plugin never consumed it, so the kwarg was both
   misleading AND dead.

2. Replace copy.deepcopy(request_messages) with shallow list() copy. The
   pre_api_request hook contract discards return values (invoke_hook never
   writes back to api_kwargs), and the langfuse plugin's _serialize_messages
   already builds its own snapshot dicts via _safe_value. A deepcopy on every
   API call would walk every tool result and base64 image — significant
   overhead for no real isolation benefit. Shallow copy of the outer list
   protects against later mutations of api_messages without paying for the
   inner-dict walk.

3. Rename test_empty_tool_call_id_concurrent_fifo_order ->
   test_empty_tool_call_id_observations_are_fifo_within_tool_name and add a
   real test_threaded_post_calls_preserve_fifo_under_lock that spawns 8
   threads behind a barrier to actually exercise _STATE_LOCK on the
   pending_tools_by_name queue. The original test was sequential and only
   validated Python list semantics; this one validates the lock discipline.

4. Fix stale 'Cleared by reset_cache_for_tests()' comment on _INIT_FAILED —
   that function does not exist. Tests reload the module via sys.modules.pop
   + importlib.import_module instead.

Tests: 37 langfuse plugin tests pass, 658 plugin tests overall pass.

---------

Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com>
Co-authored-by: Brian Conklin <brian@dralth.com>
@github-actions

Copy link
Copy Markdown

🚨 CRITICAL Supply Chain Risk Detected

This PR contains a pattern that has been used in real supply chain attacks. A maintainer must review the flagged code carefully before merging.

🚨 CRITICAL: Install-hook file added or modified

These files can execute code during package installation or interpreter startup.

Files:

hermes_cli/setup.py

Scanner only fires on high-signal indicators: .pth files, base64+exec/eval combos, subprocess with encoded commands, or install-hook files. Low-signal warnings were removed intentionally — if you're seeing this comment, the finding is worth inspecting.

@github-actions

Copy link
Copy Markdown

🔎 Lint report: sync/upstream-20260515 vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8274 on HEAD, 8328 on base (✅ -54)

🆕 New issues (43):

Rule Count
unresolved-attribute 20
unresolved-import 14
invalid-argument-type 5
unused-type-ignore-comment 2
invalid-assignment 1
unsupported-operator 1
First entries
tests/hermes_cli/test_proxy.py:254: [unresolved-import] unresolved-import: Cannot resolve imported module `aiohttp`
tests/agent/transports/test_codex_app_server_session.py:904: [invalid-argument-type] invalid-argument-type: Argument to function `_has_turn_aborted_marker` is incorrect: Expected `str`, found `None`
tests/hermes_cli/test_bedrock_model_picker.py:39: [unresolved-attribute] unresolved-attribute: Unresolved attribute `session` on type `ModuleType`
hermes_cli/proxy/server.py:22: [unresolved-import] unresolved-import: Cannot resolve imported module `aiohttp`
hermes_cli/proxy/server.py:231: [unresolved-attribute] unresolved-attribute: Attribute `TCPSite` is not defined on `None` in union `Unknown | None`
tests/agent/test_bedrock_adapter.py:27: [unresolved-attribute] unresolved-attribute: Unresolved attribute `get_session` on type `ModuleType`
hermes_cli/proxy/server.py:96: [unresolved-attribute] unresolved-attribute: Attribute `AppKey` is not defined on `None` in union `Unknown | None`
tests/hermes_cli/test_tools_config.py:720: [unresolved-attribute] unresolved-attribute: Attribute `get` is not defined on `str` in union `str | dict[str, str | list[Unknown] | bool | list[str]] | dict[str, str | list[Unknown]] | dict[str, str | list[dict[str, str]]] | Unknown`
hermes_cli/proxy/server.py:160: [unresolved-attribute] unresolved-attribute: Attribute `ClientTimeout` is not defined on `None` in union `Unknown | None`
hermes_cli/proxy/server.py:185: [unresolved-attribute] unresolved-attribute: Attribute `StreamResponse` is not defined on `None` in union `Unknown | None`
tests/hermes_cli/test_tools_config.py:534: [invalid-argument-type] invalid-argument-type: Argument to function `_configure_provider` is incorrect: Expected `dict[Unknown, Unknown]`, found `str | dict[str, str | list[Unknown] | bool | list[str]] | dict[str, str | list[Unknown]] | dict[str, str | list[dict[str, str]]] | Unknown`
hermes_cli/proxy/server.py:229: [unresolved-attribute] unresolved-attribute: Attribute `AppRunner` is not defined on `None` in union `Unknown | None`
tests/agent/transports/test_codex_app_server_session.py:965: [invalid-argument-type] invalid-argument-type: Argument to function `_classify_oauth_failure` is incorrect: Expected `str`, found `None`
gateway/platforms/discord.py:3730: [unresolved-attribute] unresolved-attribute: Attribute `Forbidden` is not defined on `None` in union `Unknown | None`
acp_adapter/auth.py:40: [unresolved-import] unresolved-import: Cannot resolve imported module `acp.schema`
hermes_cli/proxy/server.py:195: [unresolved-attribute] unresolved-attribute: Attribute `ClientError` is not defined on `None` in union `Unknown | None`
cli.py:1488: [invalid-assignment] invalid-assignment: Object of type `def _wrapped_get_color(self, key, fallback="") -> Unknown` is not assignable to attribute `get_color` of type `def get_color(self, key: str, fallback: str = "") -> str`
gateway/platforms/yuanbao.py:2621: [invalid-argument-type] invalid-argument-type: Argument is incorrect: Expected `MessageType`, found `Literal[MessageType.DOCUMENT] | Any | None`
tests/agent/test_bedrock_adapter.py:28: [unresolved-attribute] unresolved-attribute: Unresolved attribute `session` on type `ModuleType`
tests/gateway/test_whatsapp_group_gating.py:373: [invalid-argument-type] invalid-argument-type: Argument to function `WhatsAppAdapter._is_broadcast_chat` is incorrect: Expected `str`, found `None`
tests/test_install_sh_symlink_stomp.py:23: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
gateway/platforms/discord.py:5508: [unused-type-ignore-comment] unused-type-ignore-comment: Unused blanket `type: ignore` directive
plugins/platforms/simplex/adapter.py:632: [unresolved-import] unresolved-import: Cannot resolve imported module `websockets`
tests/hermes_cli/test_proxy.py:13: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/gateway/test_discord_clarify_buttons.py:20: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
... and 18 more

✅ Fixed issues (110):

Rule Count
unresolved-import 63
invalid-argument-type 13
unresolved-attribute 13
invalid-assignment 13
invalid-parameter-default 3
not-subscriptable 2
unsupported-operator 1
unresolved-reference 1
invalid-return-type 1
First entries
tools/rl_training_tool.py:1129: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(i: SupportsIndex, /) -> dict[str, str | int | float], (s: slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> list[dict[str, str | int | float]]]` cannot be called with key of type `Literal["max_token_length"]` on object of type `list[dict[str, str | int | float]]`
tests/tools/test_tts_kittentts.py:6: [unresolved-import] unresolved-import: Cannot resolve imported module `numpy`
tests/tools/test_rl_training_tool.py:49: [unresolved-attribute] unresolved-attribute: Object of type `RunState` has no attribute `api_log_file`
environments/benchmarks/yc_bench/yc_bench_env.py:60: [unresolved-import] unresolved-import: Cannot resolve imported module `pydantic`
tests/hermes_cli/test_tools_config.py:715: [not-subscriptable] not-subscriptable: Cannot subscript object of type `int` with no `__getitem__` method
environments/tool_call_parsers/longcat_parser.py:13: [unresolved-import] unresolved-import: Cannot resolve imported module `openai.types.chat.chat_completion_message_tool_call`
environments/tool_call_parsers/kimi_k2_parser.py:18: [unresolved-import] unresolved-import: Cannot resolve imported module `openai.types.chat.chat_completion_message_tool_call`
tools/rl_training_tool.py:231: [unresolved-attribute] unresolved-attribute: Attribute `exec_module` is not defined on `None` in union `Loader | None`
tools/rl_training_tool.py:253: [unresolved-import] unresolved-import: Cannot resolve imported module `atroposlib.envs.base`
environments/hermes_base_env.py:343: [unresolved-import] unresolved-import: Cannot resolve imported module `atroposlib.envs.server_handling.openai_server`
tools/rl_training_tool.py:401: [unresolved-attribute] unresolved-attribute: Unresolved attribute `env_log_file` on type `RunState`
environments/benchmarks/tblite/tblite_env.py:33: [unresolved-import] unresolved-import: Cannot resolve imported module `pydantic`
tools/rl_training_tool.py:231: [unresolved-attribute] unresolved-attribute: Attribute `loader` is not defined on `None` in union `ModuleSpec | None`
tools/rl_training_tool.py:693: [unresolved-attribute] unresolved-attribute: Attribute `get` is not defined on `list[dict[str, str | int | float]]`, `bool` in union `dict[str, str | int | float] | list[dict[str, str | int | float]] | bool`
tests/run_agent/test_agent_loop_vllm.py:98: [unresolved-import] unresolved-import: Cannot resolve imported module `transformers`
tests/tools/test_managed_server_tool_support.py:22: [unresolved-import] unresolved-import: Cannot resolve imported module `atroposlib`
tools/rl_training_tool.py:963: [unresolved-import] unresolved-import: Cannot resolve imported module `wandb`
tests/tools/test_rl_training_tool.py:57: [unresolved-attribute] unresolved-attribute: Unresolved attribute `api_log_file` on type `RunState`
rl_cli.py:27: [unresolved-import] unresolved-import: Cannot resolve imported module `fire`
tools/rl_training_tool.py:108: [unresolved-attribute] unresolved-attribute: Attribute `keys` is not defined on `list[dict[str, str | int | float]]`, `bool` in union `dict[str, str | int | float] | list[dict[str, str | int | float]] | bool`
environments/tool_call_parsers/glm45_parser.py:21: [unresolved-import] unresolved-import: Cannot resolve imported module `openai.types.chat.chat_completion_message_tool_call`
tools/rl_training_tool.py:973: [invalid-assignment] invalid-assignment: Invalid subscript assignment with key of type `Literal["history"]` and value of type `list[dict[Unknown, Unknown]]` on object of type `dict[str, str]`
tools/rl_training_tool.py:229: [invalid-argument-type] invalid-argument-type: Argument to function `module_from_spec` is incorrect: Expected `ModuleSpec`, found `ModuleSpec | None`
environments/benchmarks/terminalbench_2/terminalbench2_env.py:802: [unresolved-import] unresolved-import: Cannot resolve imported module `tqdm`
tools/rl_training_tool.py:778: [invalid-assignment] invalid-assignment: Invalid subscript assignment with key of type `Literal["wandb_name"]` and value of type `Any & ~AlwaysFalsy` on object of type `list[dict[str, str | int | float]]`
... and 85 more

Unchanged: 4279 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@bot-ted bot-ted merged commit a4ac4b5 into main May 15, 2026
15 of 19 checks passed
@bot-ted bot-ted deleted the sync/upstream-20260515 branch May 15, 2026 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.