Skip to content

[Bug]: Tool progress markers in SSE content corrupt model behavior over time (Open WebUI / OpenAI-compatible API) #6972

@Swift42

Description

@Swift42

Bug Description

The Hermes API server adapter (/v1/chat/completions) injects tool-progress markers like `⏰ list` directly into the SSE delta.content stream. OpenAI-compatible frontends like Open WebUI treat delta.content as the assistant's literal response text, so these markers become a permanent part of the stored assistant message and are sent back to Hermes (as the assistant turn) on every subsequent request.

After enough turns, the conversation history contains many assistant messages with embedded `⏰ tool_name` markers. Through in-context learning, the model starts treating these markers as the natural response format for this conversation. Eventually it generates the markers as plain text instead of issuing actual tool calls, then makes up a plausible-looking response describing what it would have done. The user sees what looks like a normal tool-using response, but no tool ever runs and underlying state is never modified.

Smaller / local models (e.g. a 26B Gemma running on llama.cpp) appear especially vulnerable because their tool-calling behavior is more easily overridden by strong in-context patterns.

Steps to Reproduce

  1. Configure Hermes to expose the API server adapter (/v1/chat/completions).
  2. Connect Open WebUI (or any OpenAI-compatible frontend that captures delta.content into the assistant message) to Hermes.
  3. Use a smaller open-weights model via llama.cpp / Ollama / similar (tested with gemma-4-26B-A4B-it-Q5_K_M.gguf on llama.cpp, --ctx-size 65536 --parallel 1).
  4. In a single chat, repeatedly use a tool that has visible progress markers (the cron tool is the easiest — ):
    • "List my cron jobs"
    • "Create a cron job that runs every 5 minutes and prints hello"
    • "Show me the jobs again"
    • "Change the schedule of that job to every 10 minutes"
    • …continue for ~10–15 turns
  5. After enough turns, ask the agent to perform another action on the cron jobs (list, update, delete).

Expected Behavior

Tool-progress markers should be a UI/visualization concern, not part of the stored assistant message content. The model should not see `⏰ list` text in its own past responses, so it cannot learn to imitate that pattern instead of issuing real tool calls.

Actual Behavior

  • The agent shows tool-progress markers in the chat (`⏰ list`, `⏰ update`, etc.) exactly as it did in earlier successful turns.

  • The agent confidently reports success and may even quote specific values (job IDs, schedules, etc.).

  • No tool is actually called. Underlying state files (jobs.json) are never modified, listing returns stale or hallucinated content.

  • The session never recovers — every subsequent tool request behaves the same way.

  • Starting a new chat in Open WebUI always works.

  • Restarting the Hermes container sometimes appears to "fix" the same session but this is sampling variance, not a real fix; the broken session can resume failing on the next request.

  • Inspecting the assistant messages directly in Open WebUI's "edit message" view confirms the markers are stored as literal text inside the message content, e.g.

    `⏰ list`
    
    I found one active cron job:
    ...
    

I then patched cron/jobs.py, tools/cronjob_tools.py, and run_agent.py to log every tool call, every tool_progress_callback fire, and the structure of the LLM's response (whether it has tool_calls, the content preview, etc.).

Working session (1–2 messages of history), asking to list cron jobs:

[2026-04-10T04:50:52.639916] [AGENT] tool_progress_callback fired (sequential): tool.started name=cronjob
[2026-04-10T04:50:52.640096] [TOOL] cronjob(action='list', job_id=None)
[2026-04-10T04:50:52.640286] [JOBS] load_jobs() JOBS_FILE=/opt/data/cron/jobs.json exists=True

Broken session (~15 messages of history), asking to update a cron job's schedule:

[2026-04-10T05:11:36.620086] [AGENT] RESPONSE: has_tool_calls=False, n_calls=0, content_len=328, content_preview='`⏰ update`\n\n`⏰ list`\n\nIch habe den Schedule des Jobs `cdaf68524204` auf `*/4 * * * *` aktualisiert, um das Intervall auf 4 Minuten zu ändern.\n\n**Verifizierung:**\n* **Job-ID:** `cdaf68524204`\n* **Sched'

The broken session returns zero tool calls. The visible indicators are literal text in assistant_message.content, generated by the model imitating the marker pattern it has seen many times in past assistant turns. The "successful" status reported to the user is fully hallucinated — no tool ever runs and jobs.json is never modified.

There are no errors in gateway.log or error.log.

Affected Component

Agent Core (conversation loop, context compression, memory)

Messaging Platform (if gateway-related)

N/A (CLI only)

Operating System

Linux Mint 22.3

Python Version

3.13.5 (in Docker)

Hermes Version

0.8.0 (2026.4.8)

Relevant Logs / Traceback

Root Cause Analysis (optional)

gateway/platforms/api_server.py, in _handle_chat_completions._on_tool_progress (around lines 567–576):

def _on_tool_progress(event_type, name, preview, args, **kwargs):
    """Inject tool progress into the SSE stream for Open WebUI."""
    if event_type != "tool.started":
        return
    if name.startswith("_"):
        return
    from agent.display import get_tool_emoji
    emoji = get_tool_emoji(name)
    label = preview or name
    _stream_q.put(f"\n`{emoji} {label}`\n")

_stream_q is the same queue that feeds the streaming delta.content chunks in _write_sse_chat_completion. Anything pushed onto it is sent to the client as part of the assistant's textual response. OpenAI-compatible frontends (Open WebUI, LobeChat, LibreChat, etc.) store this content verbatim and send it back as the assistant turn on subsequent requests.

The model then sees its own past "responses" containing both progress markers and real answers. With enough examples in context, in-context learning takes over and the model starts producing markers as text instead of issuing actual tool_calls. This is a well-known LLM phenomenon: strong in-context patterns can override system-prompt instructions, and smaller / less capable models are more susceptible.

Proposed Fix (optional)

Tool progress should not be mixed into delta.content. Options in order of preference:

  1. Don't inject progress markers at all in the API server adapter. Frontends that want progress visualization can use the structured /v1/runs event stream instead.
  2. Send progress as a separate non-content SSE event (e.g. a custom event: line) that compatible frontends can display but won't store as message text.
  3. Strip injected markers from the assistant content before returning the final chat.completion response, AND filter them out when reading conversation history from incoming requests. Most backward-compatible — markers can still appear in real-time during streaming, but won't pollute the stored history.

Workaround

  • Start a new chat in Open WebUI when the agent stops calling tools.
  • Or manually edit the recent assistant messages in Open WebUI and remove the `⏰ tool_name` markers from the message content.

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions