Bug Description
The Hermes API server adapter (/v1/chat/completions) injects tool-progress markers like `⏰ list` directly into the SSE delta.content stream. OpenAI-compatible frontends like Open WebUI treat delta.content as the assistant's literal response text, so these markers become a permanent part of the stored assistant message and are sent back to Hermes (as the assistant turn) on every subsequent request.
After enough turns, the conversation history contains many assistant messages with embedded `⏰ tool_name` markers. Through in-context learning, the model starts treating these markers as the natural response format for this conversation. Eventually it generates the markers as plain text instead of issuing actual tool calls, then makes up a plausible-looking response describing what it would have done. The user sees what looks like a normal tool-using response, but no tool ever runs and underlying state is never modified.
Smaller / local models (e.g. a 26B Gemma running on llama.cpp) appear especially vulnerable because their tool-calling behavior is more easily overridden by strong in-context patterns.
Steps to Reproduce
- Configure Hermes to expose the API server adapter (
/v1/chat/completions).
- Connect Open WebUI (or any OpenAI-compatible frontend that captures
delta.content into the assistant message) to Hermes.
- Use a smaller open-weights model via llama.cpp / Ollama / similar (tested with
gemma-4-26B-A4B-it-Q5_K_M.gguf on llama.cpp, --ctx-size 65536 --parallel 1).
- In a single chat, repeatedly use a tool that has visible progress markers (the cron tool is the easiest —
⏰):
- "List my cron jobs"
- "Create a cron job that runs every 5 minutes and prints hello"
- "Show me the jobs again"
- "Change the schedule of that job to every 10 minutes"
- …continue for ~10–15 turns
- After enough turns, ask the agent to perform another action on the cron jobs (list, update, delete).
Expected Behavior
Tool-progress markers should be a UI/visualization concern, not part of the stored assistant message content. The model should not see `⏰ list` text in its own past responses, so it cannot learn to imitate that pattern instead of issuing real tool calls.
Actual Behavior
-
The agent shows tool-progress markers in the chat (`⏰ list`, `⏰ update`, etc.) exactly as it did in earlier successful turns.
-
The agent confidently reports success and may even quote specific values (job IDs, schedules, etc.).
-
No tool is actually called. Underlying state files (jobs.json) are never modified, listing returns stale or hallucinated content.
-
The session never recovers — every subsequent tool request behaves the same way.
-
Starting a new chat in Open WebUI always works.
-
Restarting the Hermes container sometimes appears to "fix" the same session but this is sampling variance, not a real fix; the broken session can resume failing on the next request.
-
Inspecting the assistant messages directly in Open WebUI's "edit message" view confirms the markers are stored as literal text inside the message content, e.g.
`⏰ list`
I found one active cron job:
...
I then patched cron/jobs.py, tools/cronjob_tools.py, and run_agent.py to log every tool call, every tool_progress_callback fire, and the structure of the LLM's response (whether it has tool_calls, the content preview, etc.).
Working session (1–2 messages of history), asking to list cron jobs:
[2026-04-10T04:50:52.639916] [AGENT] tool_progress_callback fired (sequential): tool.started name=cronjob
[2026-04-10T04:50:52.640096] [TOOL] cronjob(action='list', job_id=None)
[2026-04-10T04:50:52.640286] [JOBS] load_jobs() JOBS_FILE=/opt/data/cron/jobs.json exists=True
Broken session (~15 messages of history), asking to update a cron job's schedule:
[2026-04-10T05:11:36.620086] [AGENT] RESPONSE: has_tool_calls=False, n_calls=0, content_len=328, content_preview='`⏰ update`\n\n`⏰ list`\n\nIch habe den Schedule des Jobs `cdaf68524204` auf `*/4 * * * *` aktualisiert, um das Intervall auf 4 Minuten zu ändern.\n\n**Verifizierung:**\n* **Job-ID:** `cdaf68524204`\n* **Sched'
The broken session returns zero tool calls. The visible ⏰ indicators are literal text in assistant_message.content, generated by the model imitating the marker pattern it has seen many times in past assistant turns. The "successful" status reported to the user is fully hallucinated — no tool ever runs and jobs.json is never modified.
There are no errors in gateway.log or error.log.
Affected Component
Agent Core (conversation loop, context compression, memory)
Messaging Platform (if gateway-related)
N/A (CLI only)
Operating System
Linux Mint 22.3
Python Version
3.13.5 (in Docker)
Hermes Version
0.8.0 (2026.4.8)
Relevant Logs / Traceback
Root Cause Analysis (optional)
gateway/platforms/api_server.py, in _handle_chat_completions._on_tool_progress (around lines 567–576):
def _on_tool_progress(event_type, name, preview, args, **kwargs):
"""Inject tool progress into the SSE stream for Open WebUI."""
if event_type != "tool.started":
return
if name.startswith("_"):
return
from agent.display import get_tool_emoji
emoji = get_tool_emoji(name)
label = preview or name
_stream_q.put(f"\n`{emoji} {label}`\n")
_stream_q is the same queue that feeds the streaming delta.content chunks in _write_sse_chat_completion. Anything pushed onto it is sent to the client as part of the assistant's textual response. OpenAI-compatible frontends (Open WebUI, LobeChat, LibreChat, etc.) store this content verbatim and send it back as the assistant turn on subsequent requests.
The model then sees its own past "responses" containing both progress markers and real answers. With enough examples in context, in-context learning takes over and the model starts producing markers as text instead of issuing actual tool_calls. This is a well-known LLM phenomenon: strong in-context patterns can override system-prompt instructions, and smaller / less capable models are more susceptible.
Proposed Fix (optional)
Tool progress should not be mixed into delta.content. Options in order of preference:
- Don't inject progress markers at all in the API server adapter. Frontends that want progress visualization can use the structured
/v1/runs event stream instead.
- Send progress as a separate non-content SSE event (e.g. a custom
event: line) that compatible frontends can display but won't store as message text.
- Strip injected markers from the assistant content before returning the final
chat.completion response, AND filter them out when reading conversation history from incoming requests. Most backward-compatible — markers can still appear in real-time during streaming, but won't pollute the stored history.
Workaround
- Start a new chat in Open WebUI when the agent stops calling tools.
- Or manually edit the recent assistant messages in Open WebUI and remove the
`⏰ tool_name` markers from the message content.
Are you willing to submit a PR for this?
Bug Description
The Hermes API server adapter (
/v1/chat/completions) injects tool-progress markers like`⏰ list`directly into the SSEdelta.contentstream. OpenAI-compatible frontends like Open WebUI treatdelta.contentas the assistant's literal response text, so these markers become a permanent part of the stored assistant message and are sent back to Hermes (as the assistant turn) on every subsequent request.After enough turns, the conversation history contains many assistant messages with embedded
`⏰ tool_name`markers. Through in-context learning, the model starts treating these markers as the natural response format for this conversation. Eventually it generates the markers as plain text instead of issuing actual tool calls, then makes up a plausible-looking response describing what it would have done. The user sees what looks like a normal tool-using response, but no tool ever runs and underlying state is never modified.Smaller / local models (e.g. a 26B Gemma running on llama.cpp) appear especially vulnerable because their tool-calling behavior is more easily overridden by strong in-context patterns.
Steps to Reproduce
/v1/chat/completions).delta.contentinto the assistant message) to Hermes.gemma-4-26B-A4B-it-Q5_K_M.ggufon llama.cpp,--ctx-size 65536 --parallel 1).⏰):Expected Behavior
Tool-progress markers should be a UI/visualization concern, not part of the stored assistant message content. The model should not see
`⏰ list`text in its own past responses, so it cannot learn to imitate that pattern instead of issuing real tool calls.Actual Behavior
The agent shows tool-progress markers in the chat (
`⏰ list`,`⏰ update`, etc.) exactly as it did in earlier successful turns.The agent confidently reports success and may even quote specific values (job IDs, schedules, etc.).
No tool is actually called. Underlying state files (
jobs.json) are never modified, listing returns stale or hallucinated content.The session never recovers — every subsequent tool request behaves the same way.
Starting a new chat in Open WebUI always works.
Restarting the Hermes container sometimes appears to "fix" the same session but this is sampling variance, not a real fix; the broken session can resume failing on the next request.
Inspecting the assistant messages directly in Open WebUI's "edit message" view confirms the markers are stored as literal text inside the message content, e.g.
I then patched
cron/jobs.py,tools/cronjob_tools.py, andrun_agent.pyto log every tool call, everytool_progress_callbackfire, and the structure of the LLM's response (whether it has tool_calls, the content preview, etc.).Working session (1–2 messages of history), asking to list cron jobs:
Broken session (~15 messages of history), asking to update a cron job's schedule:
The broken session returns zero tool calls. The visible
⏰indicators are literal text inassistant_message.content, generated by the model imitating the marker pattern it has seen many times in past assistant turns. The "successful" status reported to the user is fully hallucinated — no tool ever runs andjobs.jsonis never modified.There are no errors in
gateway.logorerror.log.Affected Component
Agent Core (conversation loop, context compression, memory)
Messaging Platform (if gateway-related)
N/A (CLI only)
Operating System
Linux Mint 22.3
Python Version
3.13.5 (in Docker)
Hermes Version
0.8.0 (2026.4.8)
Relevant Logs / Traceback
Root Cause Analysis (optional)
gateway/platforms/api_server.py, in_handle_chat_completions._on_tool_progress(around lines 567–576):_stream_qis the same queue that feeds the streamingdelta.contentchunks in_write_sse_chat_completion. Anything pushed onto it is sent to the client as part of the assistant's textual response. OpenAI-compatible frontends (Open WebUI, LobeChat, LibreChat, etc.) store this content verbatim and send it back as the assistant turn on subsequent requests.The model then sees its own past "responses" containing both progress markers and real answers. With enough examples in context, in-context learning takes over and the model starts producing markers as text instead of issuing actual
tool_calls. This is a well-known LLM phenomenon: strong in-context patterns can override system-prompt instructions, and smaller / less capable models are more susceptible.Proposed Fix (optional)
Tool progress should not be mixed into
delta.content. Options in order of preference:/v1/runsevent stream instead.event:line) that compatible frontends can display but won't store as message text.chat.completionresponse, AND filter them out when reading conversation history from incoming requests. Most backward-compatible — markers can still appear in real-time during streaming, but won't pollute the stored history.Workaround
`⏰ tool_name`markers from the message content.Are you willing to submit a PR for this?