fix(gemma4): recover raw tool calls when parser import fails by raritytiks · Pull Request #1 · 0xarkstar/hermes-agent

raritytiks · 2026-04-17T11:55:58Z

Summary

Adds the missing runtime fix for Gemma 4 tool-call recovery on top of NousResearch#7449.

NousResearch#7449 adds the gemma4 parser itself, but there was still a real CLI/runtime failure mode where Gemma tool-call markup reached assistant_message.content and Hermes did not recover it into structured tool_calls.

Problem

Gemma 4 can emit raw tool markup like <|tool_call>call:func_name(arg1: "value1", arg2: "value2")<tool_call|>.

Hermes is supposed to recover that through environments.tool_call_parsers, but in the failing runtime path:

raw Gemma tool markup was present in assistant_message.content
importing environments.tool_call_parsers could raise ModuleNotFoundError
that fallback failure was swallowed
no structured tool_calls were recovered
Hermes treated the markup as normal assistant text and streamed it to the user

This is why the parser work in NousResearch#7449 was necessary but not sufficient to fix the full Gemma 4 tool-calling issue in practice.

Solution

Adds a runtime-safe fallback in run_agent.py so Gemma tool calls are still recovered even when the parser module import fails.

Changes:

Try the normal parser registry first
Force-try Gemma parsing when Gemma raw markup is detected
If environments.tool_call_parsers cannot be imported, fall back to inline Gemma parsing
Suppress raw Gemma tool markup from streaming callbacks so it does not leak into user-visible output
Normalize recovered streamed tool-call turns to finish_reason="tool_calls"

This keeps the parser work from NousResearch#7449 intact and adds only the missing runtime recovery path on top.

Code Changes

run_agent.py
- added raw-tool-markup detection for Gemma responses
- added a parser helper that tries the normal registry first, then falls back to inline Gemma parsing on import failure
- updated the streaming path to suppress raw Gemma tool markup from streamed user-visible output
- updated streamed response recovery so recovered tool calls are surfaced as tool_calls with finish_reason="tool_calls"
- updated the non-streaming assistant-message path to recover raw Gemma tool calls from assistant_message.content before normal tool handling
tests/run_agent/test_streaming.py
- added regressions covering post-stream recovery of raw Gemma tool markup
- added regressions ensuring raw Gemma tool markup is not leaked through stream callbacks
- added regressions for the ModuleNotFoundError import-failure path using the inline fallback parser

Testing

Focused regression tests passed:

uv run --extra dev python -m pytest tests/run_agent/test_streaming.py tests/tools/test_tool_call_parsers.py -q

Result:

65 passed, 32 warnings

Regression coverage added for:

Raw Gemma tool markup parsed after streaming completes
Raw Gemma tool markup suppressed from the stream callback
Inline Gemma fallback still working when environments.tool_call_parsers is not importable

Context

This is intentionally a small follow-up PR on top of NousResearch#7449 rather than a rewrite of the parser work.

The parser implementation in NousResearch#7449 is still the correct fix for Gemma 4 format support. This PR adds the missing safeguard for the real-world case where parser-module imports are unavailable in the active CLI/runtime context.

Follow-up to NousResearch#7449
Related to NousResearch#6626

raritytiks · 2026-04-17T12:53:22Z

This is better than nothing - since tool calls actually work with this instead of failing entirely and showing raw tool call tokens - but I'm not sure if gemma-4-e4b-it is just dumb, or if there's still something wrong here. Sometimes it still fails executing tools properly.

raritytiks · 2026-04-17T19:50:13Z

too many issues with this

…r button + reaction tools Addresses architect's commit-9 next-pass note #1: the fetch_channel fallback in both `_send_button_message` and `_resolve_message` masked `discord.Forbidden` (bot lacks VIEW_CHANNEL permission) under a generic "channel not found or bot cannot access it" message. The fetch_message block in `_resolve_message` already differentiated Forbidden/NotFound via isinstance; this commit extends the same pattern to fetch_channel in both tools so users see specific errors: * `Forbidden` → "Bot lacks VIEW_CHANNEL permission for channel <id>." * `NotFound` → "Channel <id> does not exist or bot is not in its guild." * Otherwise → existing generic "not found or bot cannot access" message (preserves backward compat for callers that grep on "not found"). The fall-through path uses defensive `try: import discord` / `except ImportError: pass` so the module remains importable when discord.py is absent. Tests: * `test_add_reaction_channel_forbidden` (reaction tool) — fetch_channel raises discord.Forbidden, asserts "permission" or "VIEW_CHANNEL" substring in the error. * `test_channel_forbidden_returns_perm_error` (button tool) — same shape for the button outbound tool. * Existing `test_channel_not_found_returns_error` tests continue asserting the generic fallback (raise generic Exception). Mock helpers tightened: both test files now use stricter type checks (`isinstance(getattr(mod, "ui", None), SimpleNamespace) and isinstance(<that>.View, type)`) instead of `hasattr`, because `MagicMock()` returns truthy auto-attrs for any name. The previous `hasattr` checks would skip ui/ButtonStyle setup if a bare-MagicMock discord module had been installed by another test, leaving the `SkillButtonView` subclass with a MagicMock parent that produced empty `children`. Now both files agree on the same idempotent helper shape. Test totals (Track 1 + reaction tool combined): * 86 passed (parallel + serial), no xdist races * +2 new Forbidden-path tests * No regressions in adjacent suites (gateway, agent, skill_resolver, discord_interactions) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@dlkakbs

Fifth and final slice polish on top of @dlkakbs's docs + skill. Three things ship here: 1. Subscription renewal cron recipe (the #1 operational footgun). Microsoft Graph webhook subscriptions expire at 72 hours max and don't auto-renew. The shipped operator runbook mentioned `maintain-subscriptions --dry-run` as a "daily or periodic check" but never told operators how to actually automate it. Without a scheduled job, any production deployment silently stops ingesting meetings three days after go-live. Adds an "Automating subscription renewal (REQUIRED for production)" section to website/docs/guides/operate-teams-meeting-pipeline.md with three concrete options and copy-pasteable configs: - Option 1: Hermes cron (`hermes cron add --schedule "0 */12 * * *" --script-only --command "hermes teams-pipeline maintain-subscriptions"`) - Option 2: systemd service + timer (12h cadence, Persistent=true so missed runs catch up after reboots) - Option 3: plain crontab with a wrapper that sources .env for credentials Go-Live Checklist gains a bolded mandatory item for the schedule being in place, with a cross-link to the section. website/docs/user-guide/messaging/teams-meetings.md adds a `:::warning:::` admonition right after the manual `subscribe` examples so anyone who creates a subscription manually is told the same day that it will silently expire in 72 hours. 2. Sidebar wiring. Shela's new docs pages (teams-meetings.md and operate-teams-meeting-pipeline.md) weren't in website/sidebars.ts, so they were orphaned URLs — reachable only if someone knew the path. Wired teams-meetings into Messaging Platforms next to the existing teams entry, and operate-teams-meeting-pipeline into Guides & Tutorials next to microsoft-graph-app-registration from PR NousResearch#21922. Adjacent placement keeps the related pages discoverable from each other. 3. SKILL.md rewrite (v1.0.0 → v1.1.0). The original skill had five Turkish-only trigger phrases, which works in a Turkish-speaking session but doesn't match English triggers. Rewrote the skill to: - Describe triggers by intent instead of exact phrases, with explicit "works in any language" framing and example phrases in both English and Turkish. - Add a Decision Tree section covering the three most common user asks (missing summary, setup verification, re-run request) and the specific CLI command sequence for each. - Add a dedicated "Critical pitfall: Graph subscriptions expire in 72 hours" section that tells the agent exactly what to do when a user reports "worked yesterday, nothing today" — the most common operational failure mode. - Expand the command reference into three labeled groups (Status and inspection / Re-running and debugging / Subscription management) so the agent can reach for the right command without scanning. - Add cross-links to all four related docs pages (Azure app registration, webhook listener setup, full pipeline setup, operator runbook). Validation: - npm run build: all new pages route, anchor to #automating-subscription-renewal-required-for-production resolves from both the runbook TOC and the teams-meetings.md admonition. - scripts/run_tests.sh on the relevant test suites (607 tests): all pass.

…olve on first launch Three interrelated bugs from teknium1's first interactive chat on Windows: 1. **Snapshot/cwd file paths unquoted in bash command strings.** The session bootstrap and per-command wrapper interpolated ``self._snapshot_path`` / ``self._cwd_file`` unquoted into bash commands like ``export -p > C:/Users/ryanc/.../hermes-snap-xxx.sh``. Git Bash's MSYS2 layer handles ``C:/...`` paths correctly ONLY when quoted; unquoted, the colon and forward-slash get glob-parsed and the redirect targets a bogus path. Symptom: every terminal command emitted two ``C:/Users/.../hermes-snap-*.sh (No such file or directory)`` lines that bled into stdout (``stderr=STDOUT`` on the local backend) and corrupted file contents when the agent wrote to scratch paths via the terminal tool. Fix: ``shlex.quote()`` every interpolation of ``_snapshot_path`` and ``_cwd_file`` in base.py — no-op on POSIX (the paths contain no shell-metachars), critical on Windows. 2. **Stale PATH on first hermes launch after install.** ``install.ps1`` adds the PortableGit ``cmd`` / ``bin`` / ``usr\bin`` directories to the Windows **User** PATH via ``SetEnvironmentVariable(..., "User")``. That write propagates to newly *spawned* processes only — already-running shells (including the one the user types ``hermes`` into immediately after install) retain their old PATH. So hermes starts with a PATH that doesn't include bash, rg, grep, ssh — and ``search_files`` reports "rg/find not available" when the user clearly just installed them. Fix: new ``_augment_path_with_known_tools()`` helper called from ``configure_windows_stdio()`` on startup. Prepends the Hermes-managed Git directories + the WinGet Links directory (where ripgrep lands) to ``os.environ['PATH']`` if they exist on disk but aren't already in PATH. Subsequent subprocess calls (including bash spawns via ``_find_bash()``) inherit the augmented PATH and find everything. No-op on POSIX and when the directories don't exist. 3. **Root cause of "file content corruption".** #1 was the proximate cause. Errors like ``C:/Users/.../hermes-snap-xxx.sh: No such file or directory`` were emitted on stderr by the failed redirect, captured into stdout via ``stderr=subprocess.STDOUT``, and if the agent used terminal commands like ``cat > file`` the leaked error bytes became part of the file. Fixing #1 eliminates this entirely. ## Tests All 77 Windows-compat tests still pass on Linux (POSIX path is shlex.quote('/tmp/foo.sh') → '/tmp/foo.sh' — unchanged). ## Not addressed here (would need a bigger design) - Python file tools (``write_file``, ``read_file``) and the bash-backed terminal tool see DIFFERENT views of ``/tmp`` on Windows. Python treats ``/tmp`` as ``C:\tmp`` (drive-relative), Git Bash's MSYS2 treats it as a virtual mount to the PortableGit install's ``tmp\``. Would need a translation shim in the Python tools to resolve bash-virtual paths to their native-Windows equivalents. Workaround for users today: use absolute native paths (``C:\Users\you\...``) instead of ``/tmp/...`` when crossing between terminal and Python file tools.

@vmiss33

…search#22920) Found 18 real Hermes-Agent stories from HN, X, and Reddit not yet captured on the page. All URLs HTTP-verified to return 200 with matching titles. Reddit (15): r/hermesagent (Obsidian-as-memory writeup at 794 upvotes, LLM cheatsheet at 635 upvotes, Kanban game-changer post, OpenRouter #1 ranking, AMA from the Nous team, etc.); r/LocalLLaMA, r/Rag, r/openclaw, r/SideProject, r/LocalLLM threads where users describe their actual setups (Qwen3.5-9b on 16gb VRAM, 5060Ti + Telegram, smart routing tiers). X (3): @vmiss33's 'what I use Hermes for' guide, @HeyYanvi's X-to-NotebookLM podcast workflow, @ExileAI_0's spare-laptop Iris running RenPy + ComfyUI, @brucexu_eth's Hermes Inc. Telegram startup sim from the hackathon, Hype's deep-dive blog. HN (1): 'I'm using Hermes — sandbox it like any agent.' No component changes — all new entries fit the existing schema (real URL, real author, real date).

…3456) * feat(goals): /goal checklist + /subgoal user controls Two-phase judge for /goal — Phase A decomposes the goal into a detailed checklist on first turn; Phase B evaluates each pending item harshly against the agent's most recent response. The goal completes only when every item is in a terminal status (completed or impossible). Adds /subgoal so the user can append, complete, mark impossible, undo, remove, or clear items the judge missed or got wrong. Mechanics: - GoalState gains `checklist` and `decomposed` fields, both backwards compatible (old state_meta rows load unchanged). - Phase A: aux call writes a harsh, exhaustive checklist; biased toward more items not fewer. Falls through to legacy freeform judge when decompose fails. - Phase B: judge gets the checklist + last-response snippet + path to a per-session conversation dump at <HERMES_HOME>/goals/<sid>.json. A bounded read_file tool (max 5 calls per turn, restricted to that one file) lets the judge inspect history when the snippet is ambiguous. Stickiness in code: terminal items are frozen, only the user can revert via /subgoal undo. - Continuation prompt shows checklist progress when non-empty; reverts to old prompt when empty. - Status line shows M/N done counts. CLI + gateway + TUI gateway all pass the agent reference into evaluate_after_turn so the dump can be written. Gateway-side /subgoal is allowed mid-run since it only modifies the checklist the judge consults at turn boundaries. Tests: 24 new cases — backcompat round-trip, Phase A decompose, Phase B updates + new_items + stickiness, user override flows, conversation dump (incl. unsafe-sid sanitization), judge read_file restriction. Existing freeform-mode tests updated to patch the renamed `judge_goal_freeform` and skip Phase A explicitly. * fix(goals): off-by-one in judge index, message-list plumbing, prompt tuning Three live-test findings from running /goal end-to-end against gemini-3-flash-preview as the judge: 1. Off-by-one bug — the judge sees the checklist rendered with 1-based indices ('1. [ ] foo, 2. [ ] bar') but the apply layer indexed state.checklist as 0-based. Result: every judge update landed on the wrong item, evidence got attached to neighbouring rows, and the genuine 'first pending' item (usually #1) never got marked. Fix: convert 1 → 0 in _parse_evaluate_response. Also tightened the user prompt to call out the 1-based scheme explicitly. New tests cover the parser conversion + an end-to-end fake-judge round-trip. 2. Conversation dump never happened — _extract_agent_messages tried common AIAgent attribute names (.messages, .conversation_history, etc.) but AIAgent doesn't expose the message list as an instance attribute; it lives inside run_conversation()'s scope. Result: the judge's read_file tool always saw history_path=unavailable. Fix: added an explicit messages= kwarg to evaluate_after_turn that all three call sites (CLI, gateway, TUI gateway) now pass directly. Agent-attribute extraction kept as back-compat fallback. 3. Prompt was too harsh on simple goals. The original 'be HARSH, default to leaving items pending' wording made the judge refuse to mark 'file exists' completed even after the agent ran ls, test -f, os.path.isfile, and find — burning the entire 8-turn budget on a fizzbuzz task. Softened to 'strict but not absurd' with explicit guidance on what counts as evidence and a directive not to require re-proving items already established earlier. Re-tested live with the same fizzbuzz goal: now terminates in 2 turns with all 8 checklist items correctly attributed to their own evidence. /subgoal user-action flow (add / complete / undo / impossible) verified live as well.

…registries Both web_search_registry._resolve() and image_gen_registry.get_active_provider() walked their registered providers and returned the first one matching the capability flag — without checking whether that provider was actually usable. On a fresh install with no credentials at all, this meant get_active_search_provider() returned `brave-free` (legacy preference order) even though BRAVE_SEARCH_API_KEY was unset, leading the dispatcher to surface a "BRAVE_SEARCH_API_KEY is not set" error for a provider the user never chose. Same bug shape in image_gen for FAL. Resolution semantics now match tools.web_tools._get_backend(): 1. Explicit config name wins, ignoring is_available() — the dispatcher surfaces a precise "X_API_KEY is not set" error rather than silently switching backends. Matches user expectation: "I configured X, tell me what's wrong with X." 2. Fallback (no explicit config) walks the legacy preference order filtered by is_available() — pick the highest-priority backend the user actually has credentials for. is_available() is wrapped in a try/except so a buggy provider doesn't brick resolution. E2E verified: - No creds + no config: get_active_search_provider() -> None - Explicit brave-free + no key: get_active_search_provider() -> brave-free (and .is_available() correctly reports False) This fix was identified during the spike (NousResearch#25182 finding #1) and is fold-in to the same PR rather than a follow-up.

…sResearch#26672) The #1 confusing cause of the xAI 403 (per Teknium): X Premium+ subscribers see Grok inside the X app and assume API access is included. It is NOT — only standalone SuperGrok subscribers can use xai-oauth with Hermes today. Without calling this out, every Premium+ user hits the 403 with no idea why. PR NousResearch#26666's neutral 4-cause list was correct but buried the most common cause. Lead with the Premium+ gotcha, then list the other possibilities (no subscription, wrong tier, exhausted quota) as fallbacks. Same neutral framing — does not accuse anyone of being unsubscribed.

Three issues flagged by the Copilot review on this PR: 1. Double JSON emit on stage failure (Copilot #1, NousResearch#2). When -Stage <name> ran a worker that threw, Invoke-Stage's finally emitted a JSON result frame AND the entry-point catch emitted a second error frame -- producing two concatenated JSON objects on stdout and breaking the one-line-per-invocation contract that drivers parse against. Same issue applied to -Json mode on a full install (every stage's finally plus a final error frame missing duration_ms/skipped). Fix: Invoke-Stage's finally now sets $script:_StageEmittedErrorFrame when it emits a failure frame; the entry-point catch checks the flag and skips its own emit, still exit 1. 2. $prevEAP uninitialized on early try-block throw (Copilot NousResearch#3). In Install-Uv, Test-Python, Test-Node's winget fallback, _Run-NpmInstall, and the playwright block, '$prevEAP = $ErrorActionPreference' lived as the first statement INSIDE the try. If anything between 'try {' and that line threw (Write-Info on an unusual host, the npx-finding loop, etc.), the catch's 'if ($prevEAP) { ... }' restore was a no-op and EAP could remain relaxed. Fix: hoist '$prevEAP = $ErrorActionPreference' to the line immediately before 'try {' in all five sites. Catch's restore is now always meaningful regardless of where in the try the throw originated. No change to Invoke-Stage's success path or to the four lint-clean EAP sites (Test-Node was the only winget-related catch). All 19 metadata smoke tests still pass.

…r button + reaction tools Addresses architect's commit-9 next-pass note #1: the fetch_channel fallback in both `_send_button_message` and `_resolve_message` masked `discord.Forbidden` (bot lacks VIEW_CHANNEL permission) under a generic "channel not found or bot cannot access it" message. The fetch_message block in `_resolve_message` already differentiated Forbidden/NotFound via isinstance; this commit extends the same pattern to fetch_channel in both tools so users see specific errors: * `Forbidden` → "Bot lacks VIEW_CHANNEL permission for channel <id>." * `NotFound` → "Channel <id> does not exist or bot is not in its guild." * Otherwise → existing generic "not found or bot cannot access" message (preserves backward compat for callers that grep on "not found"). The fall-through path uses defensive `try: import discord` / `except ImportError: pass` so the module remains importable when discord.py is absent. Tests: * `test_add_reaction_channel_forbidden` (reaction tool) — fetch_channel raises discord.Forbidden, asserts "permission" or "VIEW_CHANNEL" substring in the error. * `test_channel_forbidden_returns_perm_error` (button tool) — same shape for the button outbound tool. * Existing `test_channel_not_found_returns_error` tests continue asserting the generic fallback (raise generic Exception). Mock helpers tightened: both test files now use stricter type checks (`isinstance(getattr(mod, "ui", None), SimpleNamespace) and isinstance(<that>.View, type)`) instead of `hasattr`, because `MagicMock()` returns truthy auto-attrs for any name. The previous `hasattr` checks would skip ui/ButtonStyle setup if a bare-MagicMock discord module had been installed by another test, leaving the `SkillButtonView` subclass with a MagicMock parent that produced empty `children`. Now both files agree on the same idempotent helper shape. Test totals (Track 1 + reaction tool combined): * 86 passed (parallel + serial), no xdist races * +2 new Forbidden-path tests * No regressions in adjacent suites (gateway, agent, skill_resolver, discord_interactions) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Four findings from Copilot's review on PR NousResearch#22891, all in the AX elements-array cap added by 22fa1ed: 1. The truncation note ("response truncated to N of M elements") was appended unconditionally — including in the som/vision multimodal path, whose response carries a screenshot rather than an `elements` array. The note described a payload field that wasn't present. Moved the note into the AX-text branch where the array actually appears. 2. `_format_elements(cap.elements)` ran on the full untrimmed list with its own `max_lines=40` cap, so a caller passing `max_elements=10` would see summary lines referencing `NousResearch#11..NousResearch#40` even though the JSON `elements` array only held #1..NousResearch#10. Format on `visible_elements` instead so the summary indices always exist in the response. 3. `_coerce_max_elements` enforced a lower bound but no upper bound, so `max_elements=10_000_000` silently disabled the safeguard and reintroduced the original context-blow-up. Added a hard cap (`_MAX_ALLOWED_MAX_ELEMENTS = 1000`) that clamps oversized values. 4. The schema string said "Default 100" but the property carried no `default` field, and claimed `max_elements` had no effect on som/ vision while the image-missing fallback path can still return an elements array. Added `"default": 100`, `"maximum": 1000`, and clarified the fallback-path wording. Each finding gets a regression test: - test_capture_ax_clamps_oversized_max_elements_to_hard_cap - test_capture_ax_summary_indices_match_returned_elements - test_capture_multimodal_summary_omits_truncation_note - test_schema_max_elements_documents_default_and_upper_bound Verified with `pytest tests/tools/test_computer_use.py` (53 passed, including the 5 new cases). Confirmed each new test fails on the pre-fix code path before applying the production change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…NousResearch#31416) PR NousResearch#31416 (avoid persisting borrowed credential secrets) added sanitize_borrowed_credential_payload, which strips access_token from any auth.json pool entry whose (provider, source) isn't in the _PERSISTABLE_PROVIDER_SOURCES allowlist. (copilot, gh_cli) is borrowed (not in the allowlist), so the test fixture's pre-seeded access_token now gets stripped at load_pool() time, leaving the pool empty. resolve_target('1') then fails with 'No credential #1. Provider: copilot.' Fix: align the test with the new contract. At runtime, copilot tokens are hydrated by resolve_copilot_token() — mock that path so the pool gets an entry the test can remove. The behavior under test (suppression of gh_cli + env variants on remove) is unchanged. CI repro on origin/main HEAD; reproduced locally with stock checkout.

…s reached After key #1 is marked exhausted the retry still called the API with key #1 due to env-var bias in _get_cached_client / resolve_api_key_provider_credentials. Fix: peek the pool and pass the active entry's key as explicit_api_key. Secondary: api_key_hint in mark_exhausted_and_rotate pins the correct entry under concurrent CLI+gateway calls; _is_payment_error matches GoUsageLimitError; extract_api_error_context parses "Resets in Xhr Ymin".

…ookies Mission-control style deploys reverse-proxy the dashboard at a path prefix (e.g. mission-control.tilos.com/hermes/* -> :9119) and inject X-Forwarded-Prefix: /hermes on every request. The SPA mount already honoured this for asset URLs and the bootstrap __HERMES_BASE_PATH__, but the OAuth gate didn't: 1. The gate's Location: header to /login and the 401 envelope's login_url were built bare ("/login?next=..."). Under a /hermes prefix the browser follows that to mission-control.tilos.com/login which the proxy doesn't route to the dashboard. 2. _redirect_uri (the OAuth callback URL handed to the IDP) used request.url_for() which doesn't honour X-Forwarded-Prefix (Starlette/uvicorn only proxy_headers Host + Proto + For). The IDP redirects back to /auth/callback instead of /hermes/auth/ callback → 404 in the user's browser. 3. Cookies were set with Path=/ which leaks them to other apps on the same origin and won't be sent back on requests under the prefix in the first place. Fix threads the normalised prefix through every boundary: * New hermes_cli/dashboard_auth/prefix.py — single source of truth for X-Forwarded-Prefix parsing. web_server._normalise_prefix becomes a re-export so the SPA mount, the gate, and the cookies helper all agree. * middleware._unauth_response builds login_url = f"{prefix}/login". * routes._redirect_uri splices the prefix into the path component of the IDP-bound URL (with full validation of the header). * cookies.{set,clear}_{session,pkce}_cookie now take prefix="". Path attribute switches to /hermes when set; cookie name switches name variant (see below). Every caller passes the request's normalised prefix. Cookie hardening (Teknium's lesser-note #1 in the PR review): adopt the __Host- / __Secure- cookie name prefixes per draft-west-cookie- prefixes. The variant is selected from (use_https, prefix): * Loopback HTTP → bare "hermes_session_at" (both prefixes require Secure, incompatible with HTTP). * HTTPS, direct deploy (Path=/) → "__Host-hermes_session_at". Strongest spec: bound to exact origin, no Domain attribute, Secure required. * HTTPS, behind a proxy prefix (Path=/hermes) → "__Secure-hermes_session_at". __Host- forbids Path != "/"; the explicit Path=/hermes covers same-origin app isolation. Setter and reader BOTH consult the prefix because the cookie *name* changes — a reader that looked up the bare name when the setter wrote __Secure- would never find the value. The reader falls back across all three variants so a request whose shape changed mid-session (e.g. post-deploy from no-prefix to /hermes) still picks up the existing cookie until it expires. Test coverage: - tests/hermes_cli/test_dashboard_auth_prefix.py — new file. 11 tests pinning: • Location: /hermes/login on the gate's HTML redirect • 401 envelope login_url carries the prefix • Malformed X-Forwarded-Prefix is ignored (header-injection defence; the script-tag value is normalised to empty string) • _redirect_uri splices /hermes into the path (the property that prevents the IDP-returns-to-404 failure) • PKCE cookie uses Path=/hermes + __Secure- when proxied • Session cookies use __Host- when direct, __Secure- when proxied, bare on loopback HTTP • End-to-end round trip with hand-managed PKCE cookie carriage (TestClient can't simulate a Path=/hermes cookie automatically) - tests/hermes_cli/test_dashboard_auth_cookies.py — rewritten to pin each (use_https, prefix) shape produces its expected cookie name, plus reader-side coverage that __Host- and __Secure- variants are both recognised. - Existing tests across middleware / 401-reauth / etc. updated to match the new cookie names (substring contains instead of startswith). Mutation-tested: reverting _unauth_response to build the bare "/login" URL trips exactly the two tests that pin the prefix carriage, confirming the suite discriminates the regression.

Two CI flakes surfaced on PR NousResearch#34572 (both in files this PR doesn't touch; pre-existing host-dependent flakes): 1. test_process_registry::TestPopenLeakOnSetupFailure — the failure-cleanup tests use a fake proc.pid (8888/9999) and assert proc.kill() runs. But spawn_local's primary cleanup is os.killpg(os.getpgid(pid), SIGKILL), falling back to proc.kill() only on ProcessLookupError/PermissionError/ OSError. When the fake PID happens to exist on a busy host, os.getpgid succeeds, os.killpg fires against an UNRELATED real process group, and proc.kill() is never reached -> flaky AssertionError (and a real risk of SIGKILLing an innocent process group from a unit test). Patch os.getpgid to raise ProcessLookupError so the fallback path runs deterministically and no real killpg is ever issued. 2. test_web_server::test_resize_escape_is_forwarded — the receive loop calls the blocking conn.receive_bytes() with no exception guard. Once the child prints its winsize and exits, the PTY closes; on a missed-marker run the next recv blocks until the 30s pytest-timeout instead of failing fast. Add a try/except break (matching the working sibling tests) and bump the child's pre-read sleep 0.15s -> 0.5s so the resize reliably lands first. Verified: 4/4 pass across 3 consecutive runs; root cause for #1 reproduced (os.getpgid(1) succeeds -> old code skips proc.kill).

Seven Copilot inline review comments on NousResearch#37679, four worth landing in a polish pass before merge: 1. _dispose_unused_adapter signature: 'BasePlatformAdapter' -> 'BasePlatformAdapter | None'. The function explicitly handles None and the reconnect watcher calls it with None in the except arm, so the annotation now matches the actual contract. 2. (duplicate of #1 on a different line) — same fix. 3. except Exception in _dispose_unused_adapter — the reviewer asked about asyncio.CancelledError swallowing. On Python 3.8+ (Hermes requires 3.13, see pyproject.toml), CancelledError inherits from BaseException, NOT Exception, so the existing 'except Exception' does NOT swallow task cancellation. Added an explicit comment explaining the contract so future readers don't repeat the analysis. We don't re-raise because the watcher loop intentionally treats dispose failures as best-effort: a failed dispose on an unowned adapter should not take down the watcher that's keeping the gateway alive. 4. _response_store = None after close in api_server.py — the reviewer flagged this for idempotency. Decided to keep the non-None state intentionally: setting it to None cascades to ~9 callers that access self._response_store without a None check, and 'close() is idempotent on a closed sqlite3 Connection' means the current code is already safe. The type stays stable; LSP doesn't flag a cascade of reportOptionalMemberAccess errors. (This matches the pre-existing pattern in the codebase — e.g. _mark_disconnected doesn't reset state to None either.) 5. _build_adapter_with_store: reviewer worried about disconnect() failing on the self.name property if __init__ wasn't called. Already handled: we set 'adapter.platform = Platform.API_SERVER' so the 'self.platform.value.title()' property returns 'Api_Server' without raising. The exception-swallowing branch in disconnect() does call self.name via the logger.debug format, so this is a real path that needs the platform attribute, and we have it. 6. test_disconnect_closes_response_store: bare 'pytest.raises(Exception)' -> 'pytest.raises(sqlite3.ProgrammingError)'. The bare Exception matcher would silently accept AttributeError, OperationalError, env-related issues, etc. The specific exception type ('Cannot operate on a closed database') is the actual signal we want — proves the SQLite conn is closed, not just that *something* raised. 7. test_nonretryable_failure_disposes_unowned_adapter: assertion tightened from '>= 1' to '== 1' on adapter._disconnect_calls. The docstring said 'exactly once', the assertion now matches. Catches the hypothetical 'watcher disposes the same adapter twice' regression that '>=' would have missed.

…ch#37677) Anthropic enforces two independent ceilings per image: 1. 5 MB encoded byte size 2. 8000 px longest side Hermes only guarded #1. A tall screenshot (e.g. 1200x12000 at 0.06 MB) passes every byte check but fails the pixel check, returning a non-retryable HTTP 400 that permanently bricks the conversation thread. Fixes: - error_classifier: add 'image dimensions exceed' pattern to _IMAGE_TOO_LARGE_PATTERNS so the 400 is classified as image_too_large and triggers the shrink/retry path instead of falling through to non-retryable error. - conversation_compression: check pixel dimensions (via Pillow) even when byte size is under the 4 MB target. If max(dims) > 8000, force shrink. - vision_tools._resize_image_for_vision: add optional max_dimension param. When set, images exceeding the pixel cap are downscaled even if they're under the byte budget. The resize loop now checks both byte AND pixel limits before accepting a candidate. Closes NousResearch#37677

…bes + test-leak fix (NousResearch#40909) * fix(gateway,windows): reliability — supervisor task, JOB breakaway, status --deep Three coordinated fixes for the Windows gateway reliability story: 1. CREATE_BREAKAWAY_FROM_JOB on every detached spawn The 'hermes update' triggered from the Electron Desktop GUI ran inside Electron's job object. Without breakaway, the post-update gateway watcher spawned by update — already DETACHED_PROCESS — was still reaped when Electron's job tore down, so the gateway never came back after a GUI-initiated update. Adds CREATE_BREAKAWAY_FROM_JOB (0x01000000) to: - hermes_cli/_subprocess_compat.py::windows_detach_flags() — used by every helper that calls windows_detach_popen_kwargs(), including launch_detached_profile_gateway_restart() - The watcher subprocess's own respawn snippet in hermes_cli/gateway.py (inlined flags so the watcher's child respawn also breaks away) _spawn_detached() in gateway_windows.py already had the flag; this change brings the rest of the codebase to parity. 2. Per-minute supervisor Scheduled Task — Windows equivalent of systemd Restart=always Introduces hermes_cli/gateway_supervisor.py and registers it as a second Scheduled Task ('Hermes_Gateway_Supervisor', SC MINUTE /MO 1, LIMITED rights) alongside the existing ONLOGON task. Every minute, the supervisor uses the same gateway.status.get_running_pid() probe as 'hermes gateway status' and, if no gateway is alive, calls gateway_windows._spawn_detached() (which now includes BREAKAWAY) to bring one back. Covers every crash mode, not just 'machine rebooted': taskkill, OOM, GUI update SIGTERM, parent job teardown. Cheap — one pythonw startup per minute when down, one PID-existence check per minute when up. Wired into both the schtasks-success and Startup-folder-fallback install paths via _install_supervisor_best_effort(), and removed in uninstall(). Best-effort: a failing supervisor install logs a warning but doesn't roll back the primary install. 3. 'hermes gateway status --deep' shows per-probe PASS/FAIL Replaces the existing terse '--deep' output (which only printed paths) with an actual diagnostic table: [1] PID file present [2] Lock file held by a live process [3] get_running_pid() result [4] _pid_exists(pid) — OS-level liveness [5] gateway_state.json (state + age) [6] Last lifecycle event from gateway-exit-diag.log When the high-level summary disagrees with reality, the user can see exactly which signal is lying. Test-leak fix ------------- tests/hermes_cli/test_gateway_wsl.py::TestGatewayCommandWSLMessages monkey-patched is_linux/is_wsl/supports_systemd_services to simulate WSL but did NOT stub is_windows(). On a Windows host, the dispatcher in _gateway_command_inner takes the is_windows() branch BEFORE the WSL guidance branch, so the test invoked gateway_windows.install() for real. install() writes to %APPDATA%\...\Startup\Hermes_Gateway.cmd — the REAL user Startup folder, never sandboxed by tmp_path — pointing at the test's pytest-of-<user>/pytest-<N>/.../gateway-service/ wrapper. When pytest tore down the tmp_path, every subsequent Windows login flashed a cmd.exe window that failed to find the missing target. Stubs is_windows=False on all four affected tests: test_install_wsl_no_systemd test_start_wsl_no_systemd test_status_wsl_running_manual test_status_wsl_not_running Defense-in-depth: _build_startup_launcher() now prefixes the launcher with 'if not exist <target> exit /b 0', so any future stale Startup entry silently no-ops instead of flashing a console window. Status enhancements ------------------- - status() now reports supervisor task presence alongside the existing schtasks/Startup info, and nudges the user to reinstall if the supervisor isn't registered. - Deep mode dumps both the supervisor task name + script path. * fix(gateway,windows): drop the per-minute supervisor task — keep breakaway + deep probes Earlier in this branch we added a per-minute schtasks-based supervisor to respawn the gateway after crashes / GUI-update SIGTERMs. The implementation flashed a brief console window on every firing, which stole window focus. We tried several variants: - cmd.exe wrapper invoking pythonw -> flashes (cmd.exe is console-subsystem) - schtasks /TR pointing at pythonw -> flashes (uv venv launcher pythonw is actually subsystem=Console, not GUI; it respawns the real pythonw) - schtasks /TR pointing at base uv -> still flashes (Task Scheduler-side conhost preallocation; documented Windows quirk) - XML registration with <Hidden>true> -> still flashes (<Hidden> only hides the task in the Task Scheduler UI, not the spawned window) Researched what leading projects do: - Ollama: GUI-subsystem tray exe + Startup-folder shortcut. No supervisor. - Tailscale: real Windows Service via SCM. Session 0, no console possible. - Syncthing: --no-console flag inside the binary + Startup folder. - openclaw: VBS Run(..., 0, False) wrapper. Suppresses the *window* but Super User Q971162 confirms focus-steal still occurs in some cases. None of these use a per-minute polling scheduled task. The 'auto-restart on crash' responsibility belongs INSIDE the daemon (Tailscale's in-process recovery / Ollama's monitor+worker pair) OR is delegated to the Windows Service Control Manager — not Task Scheduler. So this commit drops the supervisor entirely. The CREATE_BREAKAWAY_FROM_JOB fix in _subprocess_compat.py (from commit c1e5fa4) survives — that is the *real* fix for problem NousResearch#2 (GUI-update kills gateway): the post-update watcher in launch_detached_profile_gateway_restart() now breaks out of Electron's job object, so the gateway respawn watcher survives the GUI quit and successfully respawns the gateway. Surviving from c1e5fa4: * CREATE_BREAKAWAY_FROM_JOB in hermes_cli/_subprocess_compat.py (fixes NousResearch#2) * Inlined breakaway flag in the watcher respawn snippet in gateway.py * hermes gateway status --deep PASS/FAIL probes (fixes #1 — visibility) * 'if not exist <target> exit /b 0' guard in _build_startup_launcher (fixes NousResearch#3 — silent no-op for stale Startup entries) * tests/hermes_cli/test_gateway_wsl.py is_windows=False stubs (root cause of NousResearch#3 — pytest WSL tests no longer leak Startup entries on Win hosts) Removed in this commit: * hermes_cli/gateway_supervisor.py (entire file) * Supervisor section in hermes_cli/gateway_windows.py (~180 lines): get_supervisor_task_name, get_supervisor_script_path, _build_supervisor_cmd_script, _write_supervisor_script, _install_supervisor_task, is_supervisor_task_registered, _install_supervisor_best_effort * _install_supervisor_best_effort() calls in install() (3 spots) * supervisor cleanup block in uninstall() * supervisor display lines in status() / status(deep=True) Future direction (out of scope for this PR): the right place for Windows 'Restart=always' semantics is a real Windows Service installed via pywin32's win32serviceutil.ServiceFramework — session-0 isolation, SCM auto-restart, no console window possible. That's a meaningful next-PR project, not a band-aid. Tests: 51 pass / 2 pre-existing failures in tests/hermes_cli/test_gateway_{windows,wsl}.py (the 2 failures are TestSupportsSystemdServicesWSL cases that fail on origin/main too — unrelated to this PR).

fix(gemma4): recover raw tool calls when parser import fails

7c03944

raritytiks mentioned this pull request Apr 17, 2026

feat: add Gemma 4 tool call parser NousResearch/hermes-agent#7449

Closed

raritytiks closed this Apr 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gemma4): recover raw tool calls when parser import fails#1

fix(gemma4): recover raw tool calls when parser import fails#1
raritytiks wants to merge 1 commit into
0xarkstar:feat/gemma4-tool-call-parserfrom
raritytiks:gemma4-pr-followup

raritytiks commented Apr 17, 2026 •

edited

Loading

Uh oh!

raritytiks commented Apr 17, 2026

Uh oh!

raritytiks commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

raritytiks commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Code Changes

Testing

Context

Uh oh!

raritytiks commented Apr 17, 2026

Uh oh!

raritytiks commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

raritytiks commented Apr 17, 2026 •

edited

Loading