fix(memory): handle unescaped apostrophes in tool call JSON from llama.cpp#12093
Closed
truenorth-lj wants to merge 1 commit into
Closed
fix(memory): handle unescaped apostrophes in tool call JSON from llama.cpp#12093truenorth-lj wants to merge 1 commit into
truenorth-lj wants to merge 1 commit into
Conversation
…a.cpp When running against llama-server (llama.cpp), tool call arguments sometimes contain unescaped apostrophes, literal control characters, or single-quoted strings that cause json.loads to fail. This adds a repair_tool_call_json() utility that applies best-effort JSON sanitization before parsing, and replaces all raw json.loads(tool_call.function.arguments) call sites across run_agent.py, mini_swe_runner.py, and tui_gateway/server.py. Closes NousResearch#12068 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5 tasks
teknium1
added a commit
that referenced
this pull request
Apr 24, 2026
…#15356) Extends _repair_tool_call_arguments() to cover the most common local-model JSON corruption pattern: llama.cpp/Ollama backends emit literal tabs and newlines inside JSON string values (memory save summaries, file contents, etc.). Previously fell through to '{}' replacement, losing the call. Adds two repair passes: - Pass 0: json.loads(strict=False) + re-serialise to canonical wire form - Pass 4: escape 0x00-0x1F control chars inside string values, then retry Ports the core utility from #12068 / PR #12093 without the larger plumbing change (that PR also replaced json.loads at 8 call sites; current main's _repair_tool_call_arguments is already the single chokepoint, so the upgrade happens transparently for every existing caller). Credit: @truenorth-lj for the original utility design. 4 new regression tests covering literal newlines, tabs, re-serialisation to strict=True-valid output, and the trailing-comma + control-char combination case.
Contributor
|
Your core fix (strict=False + control-char escaping for JSON string values) was ported onto current main via #15356 (commit 2d444fc). The port folds your two repair passes into the existing |
donovan-yohan
added a commit
to donovan-yohan/hermes-agent
that referenced
this pull request
Apr 27, 2026
* fix: repair malformed tool call args in streaming assembly before flagging as truncated
When the streaming path (chat completions) assembled tool call deltas and
detected malformed JSON arguments, it set has_truncated_tool_args=True but
passed the broken args through unchanged. This triggered the truncation
handler which returned a partial result and killed the session (/new required).
_many_ malformations are repairable: trailing commas, unclosed brackets,
Python None, empty strings. _repair_tool_call_arguments() already existed
for the pre-API-request path but wasn't called during streaming assembly.
Now when JSON parsing fails during streaming assembly, we attempt repair
via _repair_tool_call_arguments() before flagging as truncated. If repair
succeeds (returns valid JSON), the tool call proceeds normally. Only truly
unrepairable args fall through to the truncation handler.
This prevents the most common session-killing failure mode for models like
GLM-5.1 that produce trailing commas or unclosed brackets.
Tests: 12 new streaming assembly repair tests, all 29 existing repair
tests still passing.
* chore(release): map q19dcp@gmail.com -> aj-nt in AUTHOR_MAP
* fix(run_agent): handle unescaped control chars in tool_call arguments (#15356)
Extends _repair_tool_call_arguments() to cover the most common local-model
JSON corruption pattern: llama.cpp/Ollama backends emit literal tabs and
newlines inside JSON string values (memory save summaries, file contents,
etc.). Previously fell through to '{}' replacement, losing the call.
Adds two repair passes:
- Pass 0: json.loads(strict=False) + re-serialise to canonical wire form
- Pass 4: escape 0x00-0x1F control chars inside string values, then retry
Ports the core utility from #12068 / PR #12093 without the larger plumbing
change (that PR also replaced json.loads at 8 call sites; current main's
_repair_tool_call_arguments is already the single chokepoint, so the
upgrade happens transparently for every existing caller).
Credit: @truenorth-lj for the original utility design.
4 new regression tests covering literal newlines, tabs, re-serialisation
to strict=True-valid output, and the trailing-comma + control-char
combination case.
* docs(faq): Update docs on backups
- update faq answer with new `backup` command in release 0.9.0
- move profile export section together with backup section so related information can be read more easily
- add table comparison between `profile export` and `backup` to assist users if understanding the nuances between both
* fix(skills): apply inline shell in skill_view
* fix(skills): drop raw_content to avoid doubling skill payload
skill_view response went to the model verbatim; duplicating the SKILL.md
body as raw_content on every tool call added token cost with no agent-facing
benefit. Remove the field and update tests to assert on content only.
The slash/preload caller (agent/skill_commands.py) already falls back to
content when raw_content is absent, and it calls skill_view(preprocess=False)
anyway, so content is already unrendered on that path.
* feat: add slash command for busy input mode
* fix: Ctrl+D deletes char under cursor, only exits on empty input (bash/zsh behaviour)
* fix(cli): keep Ctrl+D no-op when only attachments pending
Follow-up to @iRonin's Ctrl+D EOF fix. If the input text is empty but
the user has pending attached images, do nothing rather than exiting —
otherwise a stray Ctrl+D silently discards the attachments.
* chore(release): map iRonin personal email to GitHub login
* chore(release): map julia@alexland.us -> alexg0bot in AUTHOR_MAP (#15384)
* fix(gateway): honor queue mode in runner PRIORITY interrupt path
When display.busy_input_mode is 'queue', the runner-level PRIORITY block
in _handle_message was still calling running_agent.interrupt() for every
text follow-up to an active session. The adapter-level busy handler
already honors queue mode (commit 9d147f7fd), but this runner-level path
was an unconditional interrupt regardless of config.
Adds a queue-mode branch that queues the follow-up via
_queue_or_replace_pending_event() and returns without interrupting.
Salvages the useful part of #12070 (@knockyai). The config fan-out to
per-platform extra was redundant — runner already loads busy_input_mode
directly via _load_busy_input_mode().
* fix(gateway/config): coerce quoted boolean values in config parsing
* feat(cli): wrap /compress in _busy_command to block input during compression
Before this, typing during /compress was accepted by the classic CLI
prompt and landed in the next prompt after compression finished,
effectively consuming a keystroke for a prompt that was about to be
replaced. Wrapping the body in self._busy_command('Compressing
context...') blocks input rendering for the duration, matching the
pattern /skills install and other slow commands already use.
Salvages the useful part of #10303 (@iRonin). The `_compressing` flag
added to run_agent.py in the original PR was dead code (set in 3 spots,
read nowhere — not by cli.py, not by run_agent.py, not by the Ink TUI
which doesn't use _busy_command at all) and was dropped.
* fix(web_server): hold _oauth_sessions_lock during PKCE session state writes
_submit_anthropic_pkce() retrieved sess under _oauth_sessions_lock but
wrote back to sess["status"] and sess["error_message"] outside the lock.
A concurrent session GC or cancel could race with these writes, producing
inconsistent session state.
Wrap all 4 sess write sites in _oauth_sessions_lock:
- network exception path (Token exchange failed)
- missing access_token path
- credential save failure path
- success path (approved)
* fix(api-server): persist response snapshot on client disconnect when store=True
* fix(api-server): persist incomplete snapshot on asyncio.CancelledError too
Extends PR #15171 to also cover the server-side cancellation path (aiohttp
shutdown, request-level timeout) — previously only ConnectionResetError
triggered the incomplete-snapshot write, so cancellations left the store
stuck at the in_progress snapshot written on response.created.
Factors the incomplete-snapshot build into a _persist_incomplete_if_needed()
helper called from both the ConnectionResetError and CancelledError
branches; the CancelledError handler re-raises so cooperative cancellation
semantics are preserved.
Adds two regression tests that drive _write_sse_responses directly (the
TestClient disconnect path races the server handler, which makes the
end-to-end assertion flaky).
* chore(release): map ebukau84@gmail.com -> UgwujaGeorge in AUTHOR_MAP
* fix(auth): preserve corrupt auth.json and warn instead of silently resetting
_load_auth_store() caught all parse/read exceptions and silently
returned an empty store, making corruption look like a logout with
no diagnostic information and no way to recover the original file.
Now copies the corrupt file to auth.json.corrupt before resetting,
and logs a warning with the exception and backup path.
* fix(env): safely quote ~/ subpaths in wrapped cd commands
* fix(memory): skip external-provider sync on interrupted turns (#15218)
``run_conversation`` was calling ``memory_manager.sync_all(
original_user_message, final_response)`` at the end of every turn
where both args were present. That gate didn't consider the
``interrupted`` local flag, so an external memory backend received
partial assistant output, aborted tool chains, or mid-stream resets as
durable conversational truth. Downstream recall then treated the
not-yet-real state as if the user had seen it complete, poisoning the
trust boundary between "what the user took away from the turn" and
"what Hermes was in the middle of producing when the interrupt hit".
Extracted the inline sync block into a new private method
``AIAgent._sync_external_memory_for_turn(original_user_message,
final_response, interrupted)`` so the interrupt guard is a single
visible check at the top of the method instead of hidden in a
boolean-and at the call site. That also gives tests a clean seam to
assert on — the pre-fix layout buried the logic inside the 3,000-line
``run_conversation`` function where no focused test could reach it.
The new method encodes three independent skip conditions:
1. ``interrupted`` → skip entirely (the #15218 fix). Applies even
when ``final_response`` and ``original_user_message`` happen to
be populated — an interrupt may have landed between a streamed
reply and the next tool call, so the strings on disk are not
actually the turn the user took away.
2. No memory manager / no final_response / no user message →
preserve existing skip behaviour (nothing new for providerless
sessions, system-initiated refreshes, tool-only turns that never
resolved, etc.).
3. Sync_all / queue_prefetch_all exceptions → swallow. External
memory providers are strictly best-effort; a misconfigured or
offline backend must never block the user from seeing their
response.
The prefetch side-effect is gated on the same interrupt flag: the
user's next message is almost certainly a retry of the same intent,
and a prefetch keyed on the interrupted turn would fire against stale
context.
### Tests (16 new, all passing on py3.11 venv)
``tests/run_agent/test_memory_sync_interrupted.py`` exercises the
helper directly on a bare ``AIAgent`` (``__new__`` pattern that the
interrupt-propagation tests already use). Coverage:
- Interrupted turn with full-looking response → no sync (the fix)
- Interrupted turn with long assistant output → no sync (the interrupt
could have landed mid-stream; strings-on-disk lie)
- Normal completed turn → sync_all + queue_prefetch_all both called
with the right args (regression guard for the positive path)
- No final_response / no user_message / no memory manager → existing
pre-fix skip paths still apply
- sync_all raises → exception swallowed, prefetch still attempted
- queue_prefetch_all raises → exception swallowed after sync succeeded
- 8-case parametrised matrix across (interrupted × final_response ×
original_user_message) asserts sync fires iff interrupted=False AND
both strings are non-empty
Closes #15218
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(gateway/bluebubbles): align iMessage delivery with non-editable UX
* chore(release): map benjaminsehl noreply email in AUTHOR_MAP
* fix: add DeepSeek reasoning_content echo for tool-call messages
DeepSeek V4 thinking mode requires reasoning_content on every
assistant message that includes tool_calls. When this field is
missing from persisted history, replaying the session causes
HTTP 400: 'The reasoning_content in the thinking mode must be
passed back to the API.'
Two-part fix (refs #15250):
1. _copy_reasoning_content_for_api: Merge the Kimi-only and
DeepSeek detection into a single needs_tool_reasoning_echo
check. This handles already-poisoned persisted sessions by
injecting an empty reasoning_content on replay.
2. _build_assistant_message: Store reasoning_content='' on new
DeepSeek tool-call messages at creation time, preventing
future session poisoning at the source.
Additional fix:
3. _handle_max_iterations: Add missing call to
_copy_reasoning_content_for_api in the max-iterations flush
path (previously only main loop and flush_memories had it).
Detection covers:
- provider == 'deepseek'
- model name containing 'deepseek' (case-insensitive)
- base URL matching api.deepseek.com (for custom provider)
* chore(release): map chenzeshi@live.com -> chen1749144759 in AUTHOR_MAP
* refactor(deepseek-reasoning): consolidate detection into helpers + regression tests
Extracts _needs_kimi_tool_reasoning() for symmetry with the existing
_needs_deepseek_tool_reasoning() helper, so _copy_reasoning_content_for_api
uses the same detection logic as _build_assistant_message. Future changes
to either provider's signals now only touch one function.
Adds tests/run_agent/test_deepseek_reasoning_content_echo.py covering:
- All 3 DeepSeek detection signals (provider, model, host)
- Poisoned history replay (empty string fallback)
- Plain assistant turns NOT padded
- Explicit reasoning_content preserved
- Reasoning field promoted to reasoning_content
- Existing Kimi/Moonshot detection intact
- Non-thinking providers left alone
21 tests, all pass.
* fix(gateway): follow compression continuations during /resume
* chore(release): map simbamax99@gmail.com to @simbam99
* fix(skills): factor HERMES_HOME resolution into shared _hermes_home helper
The three google-workspace scripts (setup.py, google_api.py, gws_bridge.py)
each had their own way of resolving HERMES_HOME:
- setup.py imported hermes_constants (crashes outside Hermes process)
- google_api.py used os.getenv inline (no strip, no empty handling)
- gws_bridge.py defined its own local get_hermes_home() (duplicate)
Extract the common logic into _hermes_home.py which:
- Delegates to hermes_constants when available (profile support, etc.)
- Falls back to os.getenv with .strip() + empty-as-unset handling
- Provides display_hermes_home() with ~/ shortening for profiles
All three scripts now import from _hermes_home instead of duplicating.
7 regression tests cover the fallback path: env var override, default
~/.hermes, empty env var, display shortening, profile paths, and
custom non-home paths.
Closes #12722
* chore(release): map jerome.benoit@sap.com to jerome-benoit
* fix(skills): ship google-workspace deps as [google] extra; make setup.py 3.9-parseable
Closes #13626.
Two follow-ups on top of the _hermes_home helper from @jerome-benoit's #12729:
1. Declare a [google] optional extra in pyproject.toml
(google-api-python-client, google-auth-oauthlib, google-auth-httplib2) and
include it in [all]. Packagers (Nix flake, Homebrew) now ship the deps by
default, so `setup.py --check` does not need to shell out to pip at
runtime — the imports succeed and install_deps() is never reached.
This fixes the Nix breakage where pip/ensurepip are stripped.
2. Add `from __future__ import annotations` to setup.py so the PEP 604
`str | None` annotation parses on Python 3.9 (macOS system python).
Previously system python3 SyntaxError'd before any code ran.
install_deps() error message now also points users at the extra instead of
just the raw pip command.
* fix(/model): show provider-enforced context length, not raw models.dev (#15438)
/model gpt-5.5 on openai-codex showed 'Context: 1,050,000 tokens' because
the display block used ModelInfo.context_window directly from models.dev.
Codex OAuth actually enforces 272K for the same slug, and the agent's
compressor already runs at 272K via get_model_context_length() — so the
banner + real context budget said 272K while /model lied with 1M.
Route the display context through a new resolve_display_context_length()
helper that always prefers agent.model_metadata.get_model_context_length
(which knows about Codex OAuth, Copilot, Nous caps) and only falls back
to models.dev when that returns nothing.
Fix applied to all 3 /model display sites:
cli.py _handle_model_switch
gateway/run.py picker on_model_selected callback
gateway/run.py text-fallback confirmation
Reported by @emilstridell (Telegram, April 2026).
* fix(nix): use --rebuild in fix-lockfiles to bypass cached FOD store paths (#15444)
* fix(nix): use --rebuild in fix-lockfiles to bypass cached FOD store paths
fix-lockfiles checked npm lockfile hashes by running
`nix build .#<attr>.npmDeps`, but fetchNpmDeps is a fixed-output
derivation — if the old store path exists locally, Nix returns it from
cache without re-fetching. This caused the script to report "ok" even
when hashes were stale, while CI (with no cache) failed with a hash
mismatch.
Adding --rebuild forces Nix to re-derive and verify the output hash
against the declared one, catching staleness regardless of local cache
state. Also updates the tui and web npm deps hashes that were stale.
* fix(nix): regenerate ui-tui lockfile to add missing @emnapi entries
npm ci was failing because @emnapi/core and @emnapi/runtime were
missing from ui-tui/package-lock.json despite being required as peer
deps by @napi-rs/wasm-runtime (via @rolldown/binding-wasm32-wasi).
Running npm install --package-lock-only adds the missing entries.
The npmDepsHash reverts to its previous value since fetchNpmDeps was
already fetching these packages as transitive dependencies.
* fix(matrix): bind PgCryptoStore device_id so fresh E2EE installs work
PgCryptoStore.__init__ defaults _device_id to "" and put_account writes
that blank value into crypto_account. The UPSERT's ON CONFLICT DO UPDATE
clause deliberately does not touch device_id, so once the row is written
blank it stays blank forever — breaking every downstream device-scoped
olm operation. Peers' to-device olm ciphertext can't match our identity
key, no megolm sessions ever land, and the user sees "hermes is in the
room but never responds to encrypted messages".
Fix: call put_device_id(client.device_id) immediately after
crypto_store.open() and before olm.load(). This sets the store's
in-memory _device_id so the first put_account INSERT writes the correct
value from the start.
Observable symptoms without the fix, on a fresh crypto.db:
- crypto_account.device_id = ""
- crypto_tracked_user: 0 rows
- crypto_device: 0 rows
- crypto_olm_session: 0 rows
- crypto_megolm_inbound_session: 0 rows
- "No one-time keys nor device keys got when trying to share keys"
warning on every startup
- "olm event doesn't contain ciphertext for this device" DecryptionError
on any inbound to-device event
- Encrypted room messages arrive but never decrypt
After the fix (wiped crypto.db + restart):
- device_id populated with actual runtime device (e.g. CZIKTRFLOV)
- all counts populate from sync as expected
- encrypted DMs flow normally
Who hits this: anyone with a fresh crypto.db — includes first-time matrix
E2EE setup, nio→mautrix migrations (since matrix.py removes the legacy
pickle on startup, creating a fresh SQLite store), and anyone who wipes
crypto.db to start over. Existing installs that somehow already have a
non-blank device_id would be unaffected, but no prior code path writes
it correctly, so that set is likely empty.
* fix(matrix): drop needless DeviceID import + mock put_device_id in tests
Two adjustments to make CI pass:
- In gateway/platforms/matrix.py: `DeviceID` is `NewType("DeviceID", str)`,
so passing `client.device_id` directly (already a str) works identically
at runtime. The explicit import was cosmetic and tripped CI environments
where `mautrix.types` doesn't re-export DeviceID at the expected path
("cannot import name 'DeviceID' from 'mautrix.types' (unknown location)").
- In tests/gateway/test_matrix.py: add `put_device_id` to the hand-written
`PgCryptoStore` fake so the three encryption-path tests
(test_connect_with_access_token_and_encryption,
test_connect_uses_configured_device_id_over_whoami,
test_connect_registers_encrypted_event_handler_when_encryption_on) can
exercise the new crypto-store binding without AttributeError.
* fix(tui): proactive mouse disable on ConPTY + /mouse toggle command
On Windows WSL2, ConPTY implicitly enables mouse event injection when
the alternate screen buffer (DEC 1049) is entered, causing raw escape
sequences to appear in the transcript as ghost characters.
Fix (two parts):
1. ConPTY fix: send DISABLE_MOUSE_TRACKING immediately after entering
alt screen when mouse tracking is off (AlternateScreen.tsx)
2. Runtime toggle: add /mouse [on|off|toggle] slash command with config
persistence (display.tui_mouse) so users can manage this at runtime
The env var HERMES_TUI_DISABLE_MOUSE continues to work as the initial
default, but can now be overridden via /mouse and persisted to config.
Closes: upstream ConPTY mouse injection issue
Credits: OutThisLife / PR #13716 for the toggle concept
* docs(delegate): document max_concurrent_children and max_spawn_depth + cost warning
* fix(cli-config): keep delegation overrides commented in example
* fix(delegate): resolve subagent approval prompts without deadlocking parent TUI (#15491)
Subagents run inside a ThreadPoolExecutor. The CLI's interactive approval
callback lives in tools/terminal_tool.py's threading.local(), which worker
threads do not inherit. When a subagent hits a dangerous-command guard,
prompt_dangerous_approval() falls back to input() from the worker thread,
deadlocking against the parent's prompt_toolkit TUI that owns stdin.
Fix: install a non-interactive callback into every subagent worker thread
via ThreadPoolExecutor(initializer=set_approval_callback, initargs=(cb,)).
The callback is config-gated by delegation.subagent_auto_approve:
false (default) -> _subagent_auto_deny (safe; matches leaf tool blocklist)
true -> _subagent_auto_approve (opt-in YOLO for cron/batch)
Both emit a logger.warning audit line. Gateway sessions are unaffected
because they resolve approvals via tools/approval.py's per-session queue,
not through these TLS callbacks. Diagnosis credit: @MorAlekss (#14685).
- hermes_cli/config.py: DEFAULT_CONFIG.delegation.subagent_auto_approve: False
- cli-config.yaml.example: documented, commented (default)
- tools/delegate_tool.py: _subagent_auto_deny, _subagent_auto_approve,
_get_subagent_approval_callback, wired into the child timeout executor
- tests/tools/test_delegate.py: 7 tests covering defaults, truthy coercion,
and TLS scoping in the worker thread
* docs: consolidate dashboard themes and plugins into Extending the Dashboard (#15530)
The web-dashboard.md and dashboard-plugins.md pages had overlapping,
partial coverage of the theme and plugin systems. Themes were split
across two pages; the plugin docs had a minimal manifest reference but
no step-by-step guide, no slot catalog, and no theme+plugin demo.
New: user-guide/features/extending-the-dashboard.md — single navigable
reference for all three extension layers (themes, UI plugins, backend
plugins). Includes:
- Theme quick-start + full schema (palette, typography, layout, layout
variants, assets, componentStyles, colorOverrides, customCSS)
- Plugin quick-start + full schema (manifest, SDK, slots, tab.override,
tab.hidden, backend routes, custom CSS)
- 10-slot shell catalog with locations
- Plugin discovery + load lifecycle
- Combined theme+plugin walkthrough (Strike Freedom cockpit demo)
- API reference + troubleshooting
web-dashboard.md: trimmed to core tool docs (pages, REST API, CORS,
development). Theme/plugin content now points to the new page with a
built-in themes summary table.
dashboard-plugins.md: deleted (merged into extending-the-dashboard.md).
sidebars.ts: swap 'dashboard-plugins' → 'extending-the-dashboard' under
the Management group.
No user-facing behavior change; docs-only.
* fix: recalculate token budgets on model switch in ContextCompressor
update_model() recalculated threshold_tokens but left tail_token_budget
and max_summary_tokens at their __init__ values. When switching from a
200K model to 32K, the tail budget stayed at ~20K tokens (62% of 32K)
instead of the intended ~10%.
Adds budget recalculation in update_model() and 2 regression tests.
* fix(tools): normalize numeric entries and clear stale no_mcp in _save_platform_tools
YAML parses bare numeric toolset names (e.g. 12306:) as int, causing
TypeError in sorted() since the read path normalizes to str but the
save path did not.
The no_mcp sentinel was preserved in existing entries even when the
user re-enabled MCP servers, causing MCP to stay silently disabled.
* fix the reset of model change by /model.
* chore(release): map Readon's git email to GitHub login
* feat(installer): FHS layout for root installs on Linux (#15608)
Root installs on Linux now put the code at /usr/local/lib/hermes-agent and
the hermes command at /usr/local/bin/hermes. HERMES_HOME (~/.hermes) stays
state-only. Matches Claude Code / Codex CLI / OpenClaw, keeps Docker
bind-mounted /root/ volumes lean, and puts the command on every shell's
default PATH without touching shell RC files.
- Non-root users and macOS root: unchanged
- Existing root installs at $HERMES_HOME/hermes-agent: preserved in-place
(detected via .git dir) — no auto-migration, no breakage
- Explicit --dir / $HERMES_INSTALL_DIR: always wins, never overridden
- Termux: unchanged (package manager manages /data/data/...)
Requested by @souly9999 (Discord). Our own Dockerfile already uses this
split (code at /opt/hermes, data at /opt/data volume); the user-install
path now matches.
* feat(cron): add context_from field for cron job output chaining
* test(cron): add PermissionError coverage for context_from
* fix(cron): silent skip when context_from job has no output yet
* fix(cron): wire context_from through the update action
The tool schema promised 'On update, pass an empty array to clear' but the
update branch ignored the context_from kwarg entirely — users could set
the field at create time and never modify or clear it afterward.
- tools/cronjob_tools.py: handle context_from in the update branch the
same way script/enabled_toolsets/workdir are handled: normalize str/list
to refs, validate each referenced job exists (same check the create
branch does), store as list-or-None to match create_job()'s shape.
Empty string or empty list clears the field.
- tests/cron/test_cron_context_from.py: 6 new tests covering add/change/
clear (both shapes)/bad-ref/preserve-across-unrelated-update.
* fix(tools): recover non-configurable toolsets from composite resolution
The reverse-mapping loop in _get_platform_tools only checked
CONFIGURABLE_TOOLSETS, silently dropping platform-specific toolsets
like discord and feishu_doc whose tools were in the composite but
had no configurable key. Add a second pass over TOOLSETS that picks
up unclaimed toolsets whose tools are present in the resolved
composite.
* feat(discord): split discord_server into discord + discord_admin tools
Split the monolithic discord_server tool (14 actions) into two:
- discord: core actions (fetch_messages, search_members, create_thread)
that are useful for the agent's normal operation. Auto-enabled on
the discord platform via the pipeline fix.
- discord_admin: server management actions (list channels/roles, pins,
role assignment) that require explicit opt-in via hermes tools.
Added to CONFIGURABLE_TOOLSETS and _DEFAULT_OFF_TOOLSETS.
* feat(feishu): wire feishu doc/drive tools into hermes-feishu composite
The feishu_doc and feishu_drive tools were registered in the tool
registry but never added to the hermes-feishu composite toolset.
The pipeline fix from the prior commit now recovers them automatically
once they are in the composite.
* feat(session): add guild_id/parent_chat_id/message_id to SessionSource
Groundwork for injecting raw platform identifiers into the agent's
system prompt. Currently only `thread_id` is exposed as a raw ID —
callers in a Discord thread had to guess `channel_id == thread_id`
(which happens to work because threads are channels in Discord's REST
API) and had no way to reference the parent channel, guild, or the
triggering message.
Adds three optional fields:
- `guild_id` — Discord guild / Slack workspace / Matrix server scope
- `parent_chat_id` — parent channel when chat_id refers to a thread
- `message_id` — ID of the triggering message (pin/reply/react)
Extends `BasePlatformAdapter.build_source()` to accept + forward them
and teaches `to_dict`/`from_dict` to serialize them. Behaviourally a
no-op: nothing reads the fields yet and they default to None.
* feat(discord): populate guild_id, parent_chat_id, message_id on SessionSource
Discord knows all four identifiers for every inbound message — guild,
channel (or thread), parent channel when in a thread, and the
triggering message. Pass them into ``SessionSource`` via the new
``build_source()`` kwargs so downstream code (context-prompt builder,
delivery, logging) can use them without re-resolving from discord.py
objects.
For auto-threaded messages, remember the original channel as the
parent before swapping ``chat_id`` to the freshly created thread.
Behavioural: still a no-op — nothing consumes these fields yet.
* fix(session): gate stale "no Discord APIs" note on DISCORD_BOT_TOKEN
The Discord platform note in the session context prompt claimed the
agent has no server-management APIs — pre-dating the discord tool.
With a bot token configured the agent actually has fetch_messages,
search_members, create_thread, and optionally the discord_admin tool;
telling the model otherwise causes it to refuse or apologise for
calls it is fully able to make.
Gate the disclaimer on DISCORD_BOT_TOKEN being unset, matching the
tool's own ``check_fn``. Without a token the note still appears and
remains accurate; with a token the model is no longer gaslit into
refusing valid tool calls.
* feat(session): inject Discord IDs block when discord tool is loaded
When DISCORD_BOT_TOKEN is set — meaning the discord tool actually
loads — emit a dedicated IDs block in the session context prompt so
the agent can call ``fetch_messages``, ``pin_message``, etc. with
real identifiers instead of probing.
Currently only ``thread_id`` was exposed as a raw ID (via the
``description`` string). The agent in a Discord thread had to guess
that the thread ID doubles as a channel ID for the REST API (it
does), and it had no way to reference the parent channel, the guild,
or the triggering message at all.
The block adapts to context:
- Thread: guild / parent channel / thread / message
- Channel: guild / channel / message
- (DM has no guild/channel IDs worth listing; only message)
Discord isn't in _PII_SAFE_PLATFORMS, so IDs ship unredacted.
* feat(tools): make discord/discord_admin opt-in, Discord-only
Both discord (read/participate) and discord_admin (server admin) are now
configurable via `hermes tools` with default-OFF. Previously the core
discord tool (fetch_messages, search_members, create_thread) auto-loaded
on every Discord install with DISCORD_BOT_TOKEN set — 19 tools the user
never opted into.
Adds a platform-scoping mechanism (_TOOLSET_PLATFORM_RESTRICTIONS) so
the discord toolsets only show up in the Discord platform's checklist,
not on CLI/Telegram/Slack/etc. Applied at four gates:
- _prompt_toolset_checklist: checklist filter
- _get_platform_tools: resolution filter (both branches)
- _save_platform_tools: save-time filter (covers 'Configure all
platforms' and hand-edited config.yaml)
- tools_disable_enable_command: rejects `hermes tools enable discord`
on non-Discord platforms with a clear error
build_session_context_prompt now injects the Discord IDs block only
when both conditions hold: the discord/discord_admin toolset is
enabled AND DISCORD_BOT_TOKEN is set. Toolset alone isn't enough —
the tool's check_fn gates on the token at registry time, so opting
in without a token yields no tools and the IDs block would lie.
Otherwise keep the stale-API disclaimer.
* fix(flush_memories): strip temperature from codex_responses fallback (#15620)
The memory-flush fallback for api_mode='codex_responses' was unconditionally
adding `temperature` to codex_kwargs before calling _run_codex_stream. The
Responses API does not accept temperature on any supported backend:
- chatgpt.com/backend-api/codex rejects it outright
- api.openai.com + gpt-5/o-series reasoning models reject it
- Copilot Responses rejects it on reasoning models
The CodexAuxiliaryClient adapter and the codex_responses transport both
correctly omit temperature — the flush fallback was the only path putting
it back. On errors from the primary aux path (e.g. expired OAuth token),
users saw `⚠ Auxiliary memory flush failed: HTTP 400: Unsupported parameter:
temperature`.
Reported by Garik [NOUS] on GPT-5.5 via Codex OAuth Pro.
* fix(auxiliary): retry without temperature when any provider rejects it
Universal reactive fix for 'HTTP 400: Unsupported parameter: temperature'
across all providers/models — not just Codex Responses.
The same backend can accept temperature for some models and reject it for
others (e.g. gpt-5.4 accepts but gpt-5.5 rejects on the same OpenAI
endpoint; similar patterns on Copilot, OpenRouter reasoning routes, and
Anthropic Opus 4.7+ via OAI-compat). An allow/deny-list by model name does
not scale.
call_llm / async_call_llm now detect the concrete 'unsupported parameter:
temperature' 400 and transparently retry once without temperature. Kimi's
server-managed omission and Opus 4.7+'s proactive strip stay in place —
this is the safety net for everything else.
Changes:
- agent/auxiliary_client.py: add _is_unsupported_temperature_error helper;
wire into both sync and async call_llm paths before the existing
max_tokens/payment/auth retry ladder
- tests/agent/test_unsupported_temperature_retry.py: 19 tests covering
detector phrasings, sync + async retry, no-retry-without-temperature,
and non-temperature 400s not triggering the retry
Builds on PR #15620 (codex_responses fallback) which stripped temperature
up front for that one api_mode. This PR closes the gap for every other
provider/model combo via reactive retry.
Credit: retry approach and detector originate from @BlueBirdBack's PR #15578.
Co-authored-by: BlueBirdBack <BlueBirdBack@users.noreply.github.com>
* chore(release): map ash@users.noreply.github.com to ash
* fix(compression): reserve system+tools headroom when aux binds threshold (#15631)
When the auxiliary compression model's context is smaller than the main
model's compression threshold, _check_compression_model_feasibility
auto-lowers the session threshold. Previously it set:
new_threshold = aux_context
This let the raw message list grow to exactly aux_context tokens. But
compression and flush_memories actually send system_prompt + tool_schemas
+ messages to the aux model. With 50+ tools that overhead is 25-30K
tokens, so the full request overflowed aux with HTTP 400.
Subtract a headroom estimate from aux_context before setting the new
threshold: the actual tool-schema token count (from
estimate_request_tokens_rough) plus a 12K allowance for the system
prompt (not yet built at __init__ time) and flush-instruction overhead.
Clamp to MINIMUM_CONTEXT_LENGTH so the session still starts even with
an unusually heavy tool schema.
This fixes the 'flush_memories overflow on busy toolsets' path that
Teknium flagged — where main and aux can be nominally the same model
but still 400 because the threshold left no room for the request
overhead. Same fix also protects the normal compression summarisation
request on the same binding aux.
Tests: two new regression tests cover the headroom reservation and the
MINIMUM_CONTEXT_LENGTH floor. Two existing tests updated for the new
(lower) threshold values now that empty-tools still produces a 12K
static headroom deduction.
* fix(auxiliary): generalize unsupported-parameter detector and harden max_tokens retry (#15633)
Generalize the temperature-specific 400 retry that shipped in PR #15621 so
the same reactive strategy covers any provider that rejects an arbitrary
request parameter — — not just temperature.
- agent/auxiliary_client.py:
* New _is_unsupported_parameter_error(exc, param): matches the same six
phrasings the old temperature detector did plus 'unrecognized parameter'
and 'invalid parameter', against any named param.
* _is_unsupported_temperature_error is now a thin back-compat wrapper so
existing imports and tests keep working.
* The max_tokens → max_completion_tokens retry branch in call_llm and
async_call_llm now (a) gates on 'max_tokens is not None' so we do not
pop a key that was never set and silently substitute a None value on
the retry, and (b) also matches the generic helper in addition to the
legacy 'max_tokens' / 'unsupported_parameter' substring checks — picking
up phrasings like 'Unknown parameter: max_tokens' that previously slipped
through.
- tests/agent/test_unsupported_parameter_retry.py: 18 new tests covering
the generic detector across params, the back-compat wrapper, and the two
hardenings to the max_tokens retry branch (None gate + generic phrasing).
Credit: retry-generalization pattern from @nicholasrae's PR #15416. That PR
also proposed the reactive temperature retry which landed independently via
PR #15621 + #15623 (co-authored with @BlueBirdBack). This commit salvages
the remaining hardening ideas onto current main.
* fix(tools): dedupe bundled plugin toolsets with built-in entries (#15634)
`hermes tools` → "reconfigure existing" listed Spotify twice because
the Apr 24 refactor that moved Spotify into plugins/spotify/ (PR #15174)
left the entry in CONFIGURABLE_TOOLSETS. _get_effective_configurable_toolsets()
unconditionally appended get_plugin_toolsets() on top, so the same
'spotify' key showed up from both sources.
Dedupe by key — built-in CONFIGURABLE_TOOLSETS entry wins (it has the
nicer label and description). Also guards against future bundled plugins
that share a toolset key with a built-in.
* fix(update): poll is-active instead of one-shot sleep(3) after gateway restart (#15639)
The auto-restart path in `hermes update` verifies systemd unit health with
`time.sleep(3)` + a single `systemctl is-active` call. The unit's
Stopped -> Started transition after a graceful SIGUSR1 exit (or a hard
restart) is not always complete inside that 3s window, so the verify
races and reports 'drained but didn't relaunch' even though systemd is
about to bring the unit back up a fraction of a second later. Users
then see a spurious warning, a redundant fallback `systemctl restart`
fires, and adapters (Discord, WhatsApp) get restarted twice.
Replace the three sleep+oneshot sites with a small `_wait_for_service_active()`
closure that polls `is-active` every 0.5s for up to 10s. Behaviour
is unchanged when the unit is healthy or truly dead — only the race
window around a clean restart is now handled correctly.
Tests: tests/hermes_cli/test_update_gateway_restart.py (41/41).
* fix(terminal): three-layer defense against watch_patterns notification spam (#15642)
* fix(terminal): three-layer defense against watch_patterns notification spam
Background processes that stack notify_on_complete=True with watch_patterns
can flood the user with duplicate, delayed notifications — matches deliver
asynchronously via the completion queue and continue arriving minutes after
the process has exited. The docstring warning against this (PR #12113) has
proven insufficient; agents still misuse the combination.
Three layered defenses, each sufficient on its own:
1. Mutual exclusion (terminal_tool.py): When both flags are set on a
background process, drop watch_patterns with a warning. notify_on_complete
wins because 'let me know when it's done' is the more useful signal and
fires exactly once. Extracted as _resolve_notification_flag_conflict() so
the rule is testable in isolation.
2. Suppress-after-exit (process_registry.py): _check_watch_patterns() now
bails the moment session.exited is True. Post-exit chunks (buffered reads
draining after the process is gone) no longer produce notifications. This
is the fix flagged as future work in session 20260418_020302_79881c.
3. Global circuit breaker (process_registry.py): Per-session rate limits don't
catch the sibling-flood case — N concurrent processes can each stay under
8/10s and still collectively spam. New WATCH_GLOBAL_MAX_PER_WINDOW=15 cap
trips a 30-second cooldown across ALL sessions, emits a single
watch_overflow_tripped event, silently counts dropped events, and emits a
watch_overflow_released summary when the cooldown ends.
Also updates the tool schema + docstring to document the new behavior.
Tests: 8 new tests covering all three fixes (suppress-after-exit x2,
mutual-exclusion resolver x4, global breaker trip/cooldown/release x2).
All 60 tests across test_watch_patterns.py, test_notify_on_complete.py,
test_terminal_tool.py pass.
Real-world trigger: self-inflicted in session 20260425_051924 — three
concurrent hermes-sweeper review subprocesses each set watch_patterns=
['failed validation', 'errored'] AND notify_on_complete=True, then iterated
over multiple items, producing enough matches per process to defeat the
per-session cap while staying under the global cap that didn't yet exist.
* fix(terminal): aggressive 1-per-15s watch_patterns rate limit + strike-3 promotion
Per Teknium's direction, the watch_patterns rate limit is now much more
aggressive and self-healing.
## New rule — per session
- HARD cap: 1 watch-match notification per 15 seconds per process.
- Any match arriving inside the cooldown window is dropped and counts as
ONE strike for that window (many drops in the same window still = 1 strike).
- After 3 consecutive strike windows, watch_patterns is permanently disabled
for the session and the session is auto-promoted to notify_on_complete
semantics — exactly one notification when the process actually exits.
- A cooldown window that expires with zero drops resets the consecutive
strike counter — healthy cadence is forgiven.
## Schema + docstring rewritten
The tool schema description now gives the model explicit guidance:
- notify_on_complete is 'the right choice for almost every long-running task'
- watch_patterns is for RARE one-shot signals on LONG-LIVED processes
- Do NOT use watch_patterns with loops/batch jobs — error patterns fire every
iteration and will hit the strike limit fast
- Mutual exclusion is stated on both parameter descriptions
- 1/15s cooldown and 3-strike promotion are stated in the watch_patterns
description so the model sees the contract every turn
## Removed
- WATCH_MAX_PER_WINDOW (8/10s) and WATCH_OVERLOAD_KILL_SECONDS (45) — the
new 1/15s limit subsumes both; keeping them would double-count.
- _watch_window_hits / _watch_window_start / _watch_overload_since fields
on ProcessSession. Replaced by _watch_last_emit_at / _watch_cooldown_until
/ _watch_strike_candidate / _watch_consecutive_strikes.
## Kept
- Global circuit breaker across all sessions (15/10s → 30s cooldown) as a
secondary safety net for concurrent siblings. Still valuable when 20
short-lived processes each fire once — none individually violates the
per-session limit.
- Suppress-after-exit guard.
- Mutual exclusion resolver at the tool entry point.
## Tests
- 6 new tests in TestPerSessionRateLimit covering: first match delivers,
second in cooldown suppressed, multi-drop = single strike, 3 strikes
disables + promotes, clean window resets counter, suppressed count
carried to next emit.
- Global circuit breaker tests rewritten to use fresh sessions instead of
hacking removed per-window fields.
- 50/50 watch_patterns + notify_on_complete tests pass.
- 60/60 including test_terminal_tool.py pass.
* feat(dashboard): page-scoped plugin slots for built-in pages (#15658)
* fix(terminal): three-layer defense against watch_patterns notification spam
Background processes that stack notify_on_complete=True with watch_patterns
can flood the user with duplicate, delayed notifications — matches deliver
asynchronously via the completion queue and continue arriving minutes after
the process has exited. The docstring warning against this (PR #12113) has
proven insufficient; agents still misuse the combination.
Three layered defenses, each sufficient on its own:
1. Mutual exclusion (terminal_tool.py): When both flags are set on a
background process, drop watch_patterns with a warning. notify_on_complete
wins because 'let me know when it's done' is the more useful signal and
fires exactly once. Extracted as _resolve_notification_flag_conflict() so
the rule is testable in isolation.
2. Suppress-after-exit (process_registry.py): _check_watch_patterns() now
bails the moment session.exited is True. Post-exit chunks (buffered reads
draining after the process is gone) no longer produce notifications. This
is the fix flagged as future work in session 20260418_020302_79881c.
3. Global circuit breaker (process_registry.py): Per-session rate limits don't
catch the sibling-flood case — N concurrent processes can each stay under
8/10s and still collectively spam. New WATCH_GLOBAL_MAX_PER_WINDOW=15 cap
trips a 30-second cooldown across ALL sessions, emits a single
watch_overflow_tripped event, silently counts dropped events, and emits a
watch_overflow_released summary when the cooldown ends.
Also updates the tool schema + docstring to document the new behavior.
Tests: 8 new tests covering all three fixes (suppress-after-exit x2,
mutual-exclusion resolver x4, global breaker trip/cooldown/release x2).
All 60 tests across test_watch_patterns.py, test_notify_on_complete.py,
test_terminal_tool.py pass.
Real-world trigger: self-inflicted in session 20260425_051924 — three
concurrent hermes-sweeper review subprocesses each set watch_patterns=
['failed validation', 'errored'] AND notify_on_complete=True, then iterated
over multiple items, producing enough matches per process to defeat the
per-session cap while staying under the global cap that didn't yet exist.
* fix(terminal): aggressive 1-per-15s watch_patterns rate limit + strike-3 promotion
Per Teknium's direction, the watch_patterns rate limit is now much more
aggressive and self-healing.
## New rule — per session
- HARD cap: 1 watch-match notification per 15 seconds per process.
- Any match arriving inside the cooldown window is dropped and counts as
ONE strike for that window (many drops in the same window still = 1 strike).
- After 3 consecutive strike windows, watch_patterns is permanently disabled
for the session and the session is auto-promoted to notify_on_complete
semantics — exactly one notification when the process actually exits.
- A cooldown window that expires with zero drops resets the consecutive
strike counter — healthy cadence is forgiven.
## Schema + docstring rewritten
The tool schema description now gives the model explicit guidance:
- notify_on_complete is 'the right choice for almost every long-running task'
- watch_patterns is for RARE one-shot signals on LONG-LIVED processes
- Do NOT use watch_patterns with loops/batch jobs — error patterns fire every
iteration and will hit the strike limit fast
- Mutual exclusion is stated on both parameter descriptions
- 1/15s cooldown and 3-strike promotion are stated in the watch_patterns
description so the model sees the contract every turn
## Removed
- WATCH_MAX_PER_WINDOW (8/10s) and WATCH_OVERLOAD_KILL_SECONDS (45) — the
new 1/15s limit subsumes both; keeping them would double-count.
- _watch_window_hits / _watch_window_start / _watch_overload_since fields
on ProcessSession. Replaced by _watch_last_emit_at / _watch_cooldown_until
/ _watch_strike_candidate / _watch_consecutive_strikes.
## Kept
- Global circuit breaker across all sessions (15/10s → 30s cooldown) as a
secondary safety net for concurrent siblings. Still valuable when 20
short-lived processes each fire once — none individually violates the
per-session limit.
- Suppress-after-exit guard.
- Mutual exclusion resolver at the tool entry point.
## Tests
- 6 new tests in TestPerSessionRateLimit covering: first match delivers,
second in cooldown suppressed, multi-drop = single strike, 3 strikes
disables + promotes, clean window resets counter, suppressed count
carried to next emit.
- Global circuit breaker tests rewritten to use fresh sessions instead of
hacking removed per-window fields.
- 50/50 watch_patterns + notify_on_complete tests pass.
- 60/60 including test_terminal_tool.py pass.
* feat(dashboard): page-scoped plugin slots for built-in pages
Dashboard plugins can now inject components into specific built-in
pages (Sessions, Analytics, Logs, Cron, Skills, Config, Env, Docs,
Chat) without overriding the whole route.
Previously, plugins could only:
1. Add new tabs (tab.path)
2. Replace whole built-in pages (tab.override)
3. Inject into global shell slots (header-*, footer-*, pre-main, ...)
None of those let a plugin add a banner, card, or widget to an
existing page. The new <page>:top / <page>:bottom slots close that
gap, reusing the existing registerSlot() API.
Changes
- web/src/plugins/slots.ts: 18 new KNOWN_SLOT_NAMES entries
(sessions:top, sessions:bottom, analytics:top, ..., chat:bottom),
grouped under "Shell-wide" vs "Page-scoped" in the docblock
- web/src/pages/*: each built-in page now renders
<PluginSlot name="<page>:top" />
as the first child of its outer wrapper and
<PluginSlot name="<page>:bottom" />
as the last child -- zero visual cost when no plugin registers
- plugins/example-dashboard: registers a demo banner into
sessions:top via registerSlot(), with matching slots entry in
the manifest -- so freshly-setup users can see what page-scoped
slots look like without writing any plugin code
- website/docs: new "Page-scoped slots" table in the plugin
authoring guide, with a worked example
- tests/hermes_cli/test_web_server.py: round-trip test for
colon-bearing slot names (sessions:top, analytics:bottom, ...)
Validation
- npm run build: clean (tsc -b + vite build, 2761 modules)
- scripts/run_tests.sh tests/hermes_cli/test_web_server.py::TestDashboardPluginManifestExtensions: 5/5 pass
* docs(dashboard): document page-scoped plugin slots (#15662)
Follow-up to PR #15658. The feature PR introduced page-scoped slots
(<page>:top / <page>:bottom inside every built-in page) but only
touched the Shell slots catalogue. Adds proper narrative coverage so
plugin authors find the feature.
Changes
- extending-the-dashboard.md:
- Frontmatter description + intro bullet now mention page-scoped slots
- New TOC entry "Augmenting built-in pages (page-scoped slots)"
- New dedicated subsection after "Replacing built-in pages"
explaining the heavy-vs-light tradeoff, listing the pages that
expose slots, and showing a worked manifest + IIFE example with
tab.hidden: true
- Cross-link from the tab.override section pointing readers to the
lighter augmentation option
- web-dashboard.md:
- Bullet mentioning "page-scoped slots (inject widgets into
built-in pages without overriding them)"
Validation
- TOC anchor "#augmenting-built-in-pages-page-scoped-slots" matches
the generated heading slug
- Code fences balanced (64, even)
- Pre-existing docusaurus build errors (skills.json, api-server.md
link) reproduce on bare main -- not introduced here
* fix(compression): pass provider to context length resolver in feasibility check
_check_compression_model_feasibility calls get_model_context_length
without provider=, so Codex OAuth users get 1,050,000 (from models.dev
for 'openai') instead of the actual 272,000 limit. This happens because
_infer_provider_from_url maps chatgpt.com → 'openai' (not 'openai-codex'),
skipping the Codex-specific resolution branch entirely.
Result: compression threshold set at 85% of 1.05M = 892K — conversations
never trigger compression, the context grows unbounded, and when gateway
hygiene eventually forces compression, the Codex endpoint drops the
oversized streaming request ('peer closed connection without sending
complete message body').
Fix: forward self.provider to get_model_context_length so provider-
specific resolution branches (Codex OAuth 272K, Copilot live /models,
Nous suffix-match) fire correctly.
Reported by user on GPT 5.5 via Codex OAuth Pro (paste.rs/vsra3).
* refactor(memory): remove flush_memories entirely (#15696)
The AIAgent.flush_memories pre-compression save, the gateway
_flush_memories_for_session, and everything feeding them are
obsolete now that the background memory/skill review handles
persistent memory extraction.
Problems with flush_memories:
- Pre-dates the background review loop. It was the only memory-save
path when introduced; the background review now fires every 10 user
turns on CLI and gateway alike, which is far more frequent than
compression or session reset ever triggered flush.
- Blocking and synchronous. Pre-compression flush ran on the live agent
before compression, blocking the user-visible response.
- Cache-breaking. Flush built a temporary conversation prefix
(system prompt + memory-only tool list) that diverged from the live
conversation's cached prefix, invalidating prompt caching. The
gateway variant spawned a fresh AIAgent with its own clean prompt
for each finalized session — still cache-breaking, just in a
different process.
- Redundant. Background review runs in the live conversation's
session context, gets the same content, writes to the same memory
store, and doesn't break the cache. Everything flush_memories
claimed to preserve is already covered.
What this removes:
- AIAgent.flush_memories() method (~248 LOC in run_agent.py)
- Pre-compression flush call in _compress_context
- flush_memories call sites in cli.py (/new + exit)
- GatewayRunner._flush_memories_for_session + _async_flush_memories
(and the 3 call sites: session expiry watcher, /new, /resume)
- 'flush_memories' entry from DEFAULT_CONFIG auxiliary tasks,
hermes tools UI task list, auxiliary_client docstrings
- _memory_flush_min_turns config + init
- #15631's headroom-deduction math in
_check_compression_model_feasibility (headroom was only needed
because flush dragged the full main-agent system prompt along;
the compression summariser sends a single user-role prompt so
new_threshold = aux_context is safe again)
- The dedicated test files and assertions that exercised
flush-specific paths
What this renames (with read-time backcompat on sessions.json):
- SessionEntry.memory_flushed -> SessionEntry.expiry_finalized.
The session-expiry watcher still uses the flag to avoid re-running
finalize/eviction on the same expired session; the new name
reflects what it now actually gates. from_dict() reads
'expiry_finalized' first, falls back to the legacy 'memory_flushed'
key so existing sessions.json files upgrade seamlessly.
Supersedes #15631 and #15638.
Tested: 383 targeted tests pass across run_agent/, agent/, cli/,
and gateway/ session-boundary suites. No behavior regressions —
background memory review continues to handle persistent memory
extraction on both CLI and gateway.
* feat: add `hermes -z <prompt>` one-shot mode (#15702)
* feat: add `hermes -z <prompt>` one-shot mode
Top-level flag that runs a single prompt and prints ONLY the final
response text to stdout. No banner, no spinner, no tool previews, no
session_id line — stdout is machine-readable, stderr is silent.
Tools, memory, rules, and AGENTS.md in the CWD are loaded as normal.
Approvals are auto-bypassed (sets HERMES_YOLO_MODE=1 for the call).
Bypasses cli.py entirely — goes straight to AIAgent.chat().
* feat(oneshot): handle interactive-callback gaps explicitly
Document (and where needed, patch) the interactive surfaces that have
no user to answer in oneshot mode:
- clarify — inject a callback that tells the agent to pick the
best default and continue (previously returned a
generic 'not available in this execution context'
error that wastes a tool call)
- sudo password — terminal_tool already gates on HERMES_INTERACTIVE
(we don't set it); sudo fails gracefully
- shell hooks — HERMES_ACCEPT_HOOKS=1 auto-approves; also falls
back to deny on non-tty stdin
- dangerous cmd — HERMES_YOLO_MODE=1 short-circuits before input()
- secret capture— tool returns gracefully when no callback wired
Live-tested: agent asked clarify(['red','blue']) and got 'red' back,
replied with only 'red'.
* feat(oneshot): add --model / --provider / HERMES_INFERENCE_MODEL (#15704)
Makes hermes -z usable by sweeper without mutating user config.
- Top-level -m/--model and --provider flags that apply to -z/--oneshot
(mirrors hermes chat's plumbing).
- HERMES_INFERENCE_MODEL env var as the parallel to HERMES_INFERENCE_PROVIDER
for CI / scripted invocations.
- resolve_runtime_provider() gets the requested provider; when --model is
given without --provider, detect_provider_for_model() auto-selects the
provider that serves it (same semantic as /model in an interactive session).
- --provider without --model errors out with exit 2 — carrying a config
model across to a different provider is usually wrong, and silently
picking the provider's catalog default hides the mismatch.
Config defaults still used when both flags are omitted (existing behavior).
Validation (all live against OpenRouter):
-z 'x' ....................... uses config default (opus-4.7)
-z 'x' --model haiku-4.5 ..... haiku-4.5 via auto-detected openrouter
-z 'x' --model ... --provider pair as given
HERMES_INFERENCE_MODEL=... -z haiku-4.5 via env var
-z 'x' --provider anthropic .. exits 2 with error to stderr
* fix(update): honor RestartSec when polling for gateway respawn (#15707)
The post-graceful-drain is-active poll used a fixed 10s timeout, but
systemd's hermes-gateway.service has RestartSec=30 — so systemd won't
respawn the unit for 30s after exit-75, and our poll gives up during
the cooldown. Result: every 'hermes update' printed
⚠ hermes-gateway drained but didn't relaunch — forcing restart
followed by a redundant 'systemctl restart' that kicked the newly-
respawning gateway again (and re-started WhatsApp / Discord a second
time in the process).
Fix: read RestartUSec from the unit via 'systemctl show' and set the
poll budget to max(10s, RestartSec + 10s slack). Units without
RestartSec set (or value=infinity) fall back to the original 10s.
Observed timeline from journalctl before fix:
08:56:22.262 old PID exits 75
08:56:32.707 systemd logs Stopped -> Started (10.4s gap, > 10s budget)
After fix the poll covers 40s — comfortably inside RestartSec + slack.
Validation:
- RestartUSec parser tested against '30s', '100ms', '1min 30s',
'infinity', '', 'garbage', '500us', '2min' — all correct.
- Against the live hermes-gateway.service: parses to 30.0s.
- tests/hermes_cli/test_update_gateway_restart.py: 41/41 pass.
* fix: /stop now immediately aborts streaming retry loop
When a user sends /stop during a streaming API call, the outer poll loop
detects _interrupt_requested and closes the HTTP connection. However, the
inner _call() thread catches the connection error and enters its retry
loop — opening a FRESH connection without checking the interrupt flag.
On slow providers like ollama-cloud, each retry attempt blocks for the
full stream-read timeout (120s+). With 3 retry attempts this caused
510+ second delays between /stop and actual response — the agent appeared
completely unresponsive despite the stop being acknowledged.
Fix: add an _interrupt_requested check at the top of the streaming retry
loop so the agent exits immediately instead of retrying.
Also fix log truncation: all session key logging in gateway/run.py used
[:20] or [:30] slices, which truncated 'agent:main:telegram:dm:5690190437'
(33 chars) to 'agent:main:telegram:' — losing the identifying chat type
and user ID. Replace with full keys to make logs debuggable.
Reported by user Sidharth Pulipaka via Telegram on ollama-cloud provider.
* fix: use output_text for assistant message content in Codex Responses API (#15690)
The Codex Responses API rejects input_text inside assistant messages —
only output_text and refusal are valid content types for assistant role.
_chat_content_to_responses_parts() previously hardcoded all text content
to input_text regardless of the message role. When an assistant message
had list-format content (multimodal or structured), this produced invalid
input_text parts that the API rejected with:
Invalid value: 'input_text'. Supported values are: 'output_text' and 'refusal'.
Fix: add a role parameter to _chat_content_to_responses_parts() that
selects output_text for assistant messages and input_text for user
messages. Thread this through _chat_messages_to_responses_input() and
_preflight_codex_input_items().
Fixes #15687
* fix(agent): ordering fix in _copy_reasoning_content_for_api — cross-provider reasoning isolation
Fix logic-ordering bug where normalized_reasoning promotion returns
before the DeepSeek/Kimi needs_empty_reasoning guard, causing
cross-provider reasoning content (MiniMax → DeepSeek) to leak into
reasoning_content and trigger HTTP 400.
Changes:
- Reorder branching: existing reasoning_content check first
- Add 'not has_reasoning' guard so poisoned histories (no reasoning)
still get '' injected for DeepSeek/Kimi
- Healthy same-provider reasoning promotion path unchanged
Refs: #15250, #15213
* fix(tui): honor launch model overrides
* fix(tui): preserve provider precedence on startup
* fix(tui): address startup provider review
* fix(tui): avoid network lookup during startup
* fix(tui): share static model detection
* fix(tui): bind provider as model alias
* fix(tui): apply ui-tui fix pass and restore type-check
- run the requested ui-tui lint+format pass and include resulting formatting updates
- guard text-measure cache eviction key in hermes-ink so ui-tui type-check stays green
* fix(tui): resolve startup model aliases statically
- expand short model aliases like sonnet/opus via static catalogs during startup runtime resolution
- keep startup alias resolution network-free and add regression tests in models and tui gateway suites
* fix(tui): share overlay close controls
- add reusable overlay key and help-text helpers for picker-style overlays
- make model, session, skills, and pager hints consistently support Esc/q close behavior
* fix(tui): sync inference model after switches
- keep HERMES_INFERENCE_MODEL aligned with HERMES_MODEL after in-TUI model switches
- clarify static provider detection remapping docs
* refactor(tui): tighten overlay helpers
- rename overlay help text component to match its role
- share picker window math across model, session, and skills overlays
* fix(tui): align overlay q shortcut casing
Keep shared overlay close behavior consistent with pager and agents overlays by binding lowercase q only.
* fix(tui): honor client copy shortcut over ssh
- accept forwarded Cmd+C for selection copy in SSH sessions even when Hermes runs on Linux
- keep local Linux Alt+C from acting as copy and update TUI hotkey hints for remote shells
* refactor(tui): share remote shell detection
Reuse the platform helper for SSH-aware copy hints so hotkey display and input handling cannot drift.
* fix(tui): trim whitespace-only selection chrome
- clamp selection highlight to real row content so blank drag margins do not render or copy
- keep successful copy actions quiet while preserving usage and failure feedback
* refactor(tui): simplify remote copy hotkey hints
Use an explicit conditional table instead of spread casting for SSH copy hint rows.
* fix(tui): preserve rendered indentation in selections
- trim only empty edge rows instead of full selected text
- bound selection paint using unwritten cells so rendered indentation remains copyable
* fix(tui): preserve code block indentation in selection
Render code indentation spaces as selectable cells so copied fenced code keeps its leading whitespace.
* fix(tui): track rendered spaces for selection copy
- add a written-cell bitmap so selection can distinguish rendered spaces from blank padding
- preserve code indentation without markdown-specific rendering hacks
* refactor(tui): format screen imports
Keep screen.ts import ordering aligned with the ui-tui formatter.
* fix(tui): clamp copied selection bounds
Clamp copied selection columns to the screen width before scanning rendered cells.
* Update run_agent.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: remove has_reasoning guard — inject empty reasoning_content for DeepSeek/Kimi tool_calls unconditionally
* docs(obliteratus): link YouTube video guide in SKILL.md (#15808)
Adds a 'Video Guide' section pointing at the walkthrough of a Hermes agent
abliterating Gemma with OBLITERATUS, so the agent can surface it when the
user wants a visual overview before running the workflow.
* docs: embed tutorial videos on webhooks + auxiliary models pages (#15809)
- webhooks.md: adds a Video Tutorial section under the intro with a
responsive YouTube iframe (WNYe5mD4fY8).
- configuration.md: adds a Video Tutorial subsection under Auxiliary
Models with a responsive YouTube iframe (NoF-YajElIM).
Both use a 16:9 aspect-ratio wrapper so the embeds scale cleanly on
mobile. Verified with `npm run build` — MDX parses clean, no new
warnings or broken links introduced.
* fix: DeepSeek/Kimi thinking mode requires reasoning_content on ALL assistant messages
Previously _copy_reasoning_content_for_api only padded reasoning_content
when the assistant message had tool_calls. DeepSeek V4 thinking mode
requires the field on every assistant turn, including plain text replies
without tool_calls.
- Remove the 'sou…
ulasbilgen
pushed a commit
to ulasbilgen/hermes-adhd-agent
that referenced
this pull request
May 1, 2026
…NousResearch#15356) Extends _repair_tool_call_arguments() to cover the most common local-model JSON corruption pattern: llama.cpp/Ollama backends emit literal tabs and newlines inside JSON string values (memory save summaries, file contents, etc.). Previously fell through to '{}' replacement, losing the call. Adds two repair passes: - Pass 0: json.loads(strict=False) + re-serialise to canonical wire form - Pass 4: escape 0x00-0x1F control chars inside string values, then retry Ports the core utility from NousResearch#12068 / PR NousResearch#12093 without the larger plumbing change (that PR also replaced json.loads at 8 call sites; current main's _repair_tool_call_arguments is already the single chokepoint, so the upgrade happens transparently for every existing caller). Credit: @truenorth-lj for the original utility design. 4 new regression tests covering literal newlines, tabs, re-serialisation to strict=True-valid output, and the trailing-comma + control-char combination case.
donald131
pushed a commit
to donald131/hermes-agent
that referenced
this pull request
May 2, 2026
…NousResearch#15356) Extends _repair_tool_call_arguments() to cover the most common local-model JSON corruption pattern: llama.cpp/Ollama backends emit literal tabs and newlines inside JSON string values (memory save summaries, file contents, etc.). Previously fell through to '{}' replacement, losing the call. Adds two repair passes: - Pass 0: json.loads(strict=False) + re-serialise to canonical wire form - Pass 4: escape 0x00-0x1F control chars inside string values, then retry Ports the core utility from NousResearch#12068 / PR NousResearch#12093 without the larger plumbing change (that PR also replaced json.loads at 8 call sites; current main's _repair_tool_call_arguments is already the single chokepoint, so the upgrade happens transparently for every existing caller). Credit: @truenorth-lj for the original utility design. 4 new regression tests covering literal newlines, tabs, re-serialisation to strict=True-valid output, and the trailing-comma + control-char combination case.
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
…NousResearch#15356) Extends _repair_tool_call_arguments() to cover the most common local-model JSON corruption pattern: llama.cpp/Ollama backends emit literal tabs and newlines inside JSON string values (memory save summaries, file contents, etc.). Previously fell through to '{}' replacement, losing the call. Adds two repair passes: - Pass 0: json.loads(strict=False) + re-serialise to canonical wire form - Pass 4: escape 0x00-0x1F control chars inside string values, then retry Ports the core utility from NousResearch#12068 / PR NousResearch#12093 without the larger plumbing change (that PR also replaced json.loads at 8 call sites; current main's _repair_tool_call_arguments is already the single chokepoint, so the upgrade happens transparently for every existing caller). Credit: @truenorth-lj for the original utility design. 4 new regression tests covering literal newlines, tabs, re-serialisation to strict=True-valid output, and the trailing-comma + control-char combination case.
dannyJ848
pushed a commit
to dannyJ848/hermes-agent
that referenced
this pull request
May 17, 2026
…NousResearch#15356) Extends _repair_tool_call_arguments() to cover the most common local-model JSON corruption pattern: llama.cpp/Ollama backends emit literal tabs and newlines inside JSON string values (memory save summaries, file contents, etc.). Previously fell through to '{}' replacement, losing the call. Adds two repair passes: - Pass 0: json.loads(strict=False) + re-serialise to canonical wire form - Pass 4: escape 0x00-0x1F control chars inside string values, then retry Ports the core utility from NousResearch#12068 / PR NousResearch#12093 without the larger plumbing change (that PR also replaced json.loads at 8 call sites; current main's _repair_tool_call_arguments is already the single chokepoint, so the upgrade happens transparently for every existing caller). Credit: @truenorth-lj for the original utility design. 4 new regression tests covering literal newlines, tabs, re-serialisation to strict=True-valid output, and the trailing-comma + control-char combination case.
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
…NousResearch#15356) Extends _repair_tool_call_arguments() to cover the most common local-model JSON corruption pattern: llama.cpp/Ollama backends emit literal tabs and newlines inside JSON string values (memory save summaries, file contents, etc.). Previously fell through to '{}' replacement, losing the call. Adds two repair passes: - Pass 0: json.loads(strict=False) + re-serialise to canonical wire form - Pass 4: escape 0x00-0x1F control chars inside string values, then retry Ports the core utility from NousResearch#12068 / PR NousResearch#12093 without the larger plumbing change (that PR also replaced json.loads at 8 call sites; current main's _repair_tool_call_arguments is already the single chokepoint, so the upgrade happens transparently for every existing caller). Credit: @truenorth-lj for the original utility design. 4 new regression tests covering literal newlines, tabs, re-serialisation to strict=True-valid output, and the trailing-comma + control-char combination case.
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
…NousResearch#15356) Extends _repair_tool_call_arguments() to cover the most common local-model JSON corruption pattern: llama.cpp/Ollama backends emit literal tabs and newlines inside JSON string values (memory save summaries, file contents, etc.). Previously fell through to '{}' replacement, losing the call. Adds two repair passes: - Pass 0: json.loads(strict=False) + re-serialise to canonical wire form - Pass 4: escape 0x00-0x1F control chars inside string values, then retry Ports the core utility from NousResearch#12068 / PR NousResearch#12093 without the larger plumbing change (that PR also replaced json.loads at 8 call sites; current main's _repair_tool_call_arguments is already the single chokepoint, so the upgrade happens transparently for every existing caller). Credit: @truenorth-lj for the original utility design. 4 new regression tests covering literal newlines, tabs, re-serialisation to strict=True-valid output, and the trailing-comma + control-char combination case.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
repair_tool_call_json()utility inutils.pythat applies best-effort JSON sanitization (handles unescaped control characters viastrict=False, single-quoted strings, and other common llama.cpp output quirks) before falling back to{}.json.loads(tool_call.function.arguments)call sites acrossrun_agent.py(5 locations),mini_swe_runner.py(2 locations), andtui_gateway/server.py(1 location) with the new repair function.Changes
utils.py: Newrepair_tool_call_json()and_escape_invalid_chars_in_json_strings()functionsrun_agent.py: Import + userepair_tool_call_jsonat all 5 tool-call argument parsing sitesmini_swe_runner.py: Userepair_tool_call_jsonat 2 parsing sitestui_gateway/server.py: Userepair_tool_call_jsonat 1 parsing sitetests/test_repair_tool_call_json.py: 13 regression testsTest plan
pytest tests/test_repair_tool_call_json.py -v)pytest tests/test_utils_truthy_values.py -v)Closes #12068
🤖 Generated with Claude Code