[Bug]: Tool progress markers in SSE content corrupt model behavior over time (Open WebUI / OpenAI-compatible API)

### Bug Description

The Hermes API server adapter (`/v1/chat/completions`) injects tool-progress markers like `` `⏰ list` `` directly into the SSE `delta.content` stream. OpenAI-compatible frontends like Open WebUI treat `delta.content` as the assistant's literal response text, so these markers become a permanent part of the stored assistant message and are sent back to Hermes (as the assistant turn) on every subsequent request.

After enough turns, the conversation history contains many assistant messages with embedded `` `⏰ tool_name` `` markers. Through in-context learning, the model starts treating these markers as the natural response format for this conversation. Eventually it generates the markers as plain text instead of issuing actual tool calls, then makes up a plausible-looking response describing what it would have done. The user sees what looks like a normal tool-using response, but no tool ever runs and underlying state is never modified.

Smaller / local models (e.g. a 26B Gemma running on llama.cpp) appear especially vulnerable because their tool-calling behavior is more easily overridden by strong in-context patterns.


### Steps to Reproduce

1. Configure Hermes to expose the API server adapter (`/v1/chat/completions`).
2. Connect Open WebUI (or any OpenAI-compatible frontend that captures `delta.content` into the assistant message) to Hermes.
3. Use a smaller open-weights model via llama.cpp / Ollama / similar (tested with `gemma-4-26B-A4B-it-Q5_K_M.gguf` on llama.cpp, `--ctx-size 65536 --parallel 1`).
4. In a single chat, repeatedly use a tool that has visible progress markers (the cron tool is the easiest — `⏰`):
   - "List my cron jobs"
   - "Create a cron job that runs every 5 minutes and prints hello"
   - "Show me the jobs again"
   - "Change the schedule of that job to every 10 minutes"
   - …continue for ~10–15 turns
5. After enough turns, ask the agent to perform another action on the cron jobs (list, update, delete).


### Expected Behavior

Tool-progress markers should be a UI/visualization concern, not part of the stored assistant message content. The model should not see `` `⏰ list` `` text in its own past responses, so it cannot learn to imitate that pattern instead of issuing real tool calls.


### Actual Behavior

- The agent shows tool-progress markers in the chat (`` `⏰ list` ``, `` `⏰ update` ``, etc.) exactly as it did in earlier successful turns.
- The agent confidently reports success and may even quote specific values (job IDs, schedules, etc.).
- **No tool is actually called.** Underlying state files (`jobs.json`) are never modified, listing returns stale or hallucinated content.
- The session never recovers — every subsequent tool request behaves the same way.
- Starting a **new** chat in Open WebUI always works.
- Restarting the Hermes container sometimes appears to "fix" the same session but this is sampling variance, not a real fix; the broken session can resume failing on the next request.
- Inspecting the assistant messages directly in Open WebUI's "edit message" view confirms the markers are stored as literal text inside the message content, e.g.

  ```
  `⏰ list`

  I found one active cron job:
  ...
  ```

I then patched `cron/jobs.py`, `tools/cronjob_tools.py`, and `run_agent.py` to log every tool call, every `tool_progress_callback` fire, and the structure of the LLM's response (whether it has tool_calls, the content preview, etc.).

**Working session (1–2 messages of history), asking to list cron jobs:**

```
[2026-04-10T04:50:52.639916] [AGENT] tool_progress_callback fired (sequential): tool.started name=cronjob
[2026-04-10T04:50:52.640096] [TOOL] cronjob(action='list', job_id=None)
[2026-04-10T04:50:52.640286] [JOBS] load_jobs() JOBS_FILE=/opt/data/cron/jobs.json exists=True
```

**Broken session (~15 messages of history), asking to update a cron job's schedule:**

```
[2026-04-10T05:11:36.620086] [AGENT] RESPONSE: has_tool_calls=False, n_calls=0, content_len=328, content_preview='`⏰ update`\n\n`⏰ list`\n\nIch habe den Schedule des Jobs `cdaf68524204` auf `*/4 * * * *` aktualisiert, um das Intervall auf 4 Minuten zu ändern.\n\n**Verifizierung:**\n* **Job-ID:** `cdaf68524204`\n* **Sched'
```

The broken session returns **zero tool calls**. The visible `⏰` indicators are literal text in `assistant_message.content`, generated by the model imitating the marker pattern it has seen many times in past assistant turns. The "successful" status reported to the user is fully hallucinated — no tool ever runs and `jobs.json` is never modified.

There are no errors in `gateway.log` or `error.log`.

### Affected Component

Agent Core (conversation loop, context compression, memory)

### Messaging Platform (if gateway-related)

N/A (CLI only)

### Operating System

Linux Mint 22.3

### Python Version

3.13.5 (in Docker)

### Hermes Version

0.8.0 (2026.4.8)

### Relevant Logs / Traceback

```shell

```

### Root Cause Analysis (optional)

`gateway/platforms/api_server.py`, in `_handle_chat_completions._on_tool_progress` (around lines 567–576):

```python
def _on_tool_progress(event_type, name, preview, args, **kwargs):
    """Inject tool progress into the SSE stream for Open WebUI."""
    if event_type != "tool.started":
        return
    if name.startswith("_"):
        return
    from agent.display import get_tool_emoji
    emoji = get_tool_emoji(name)
    label = preview or name
    _stream_q.put(f"\n`{emoji} {label}`\n")
```

`_stream_q` is the same queue that feeds the streaming `delta.content` chunks in `_write_sse_chat_completion`. Anything pushed onto it is sent to the client as part of the assistant's textual response. OpenAI-compatible frontends (Open WebUI, LobeChat, LibreChat, etc.) store this content verbatim and send it back as the assistant turn on subsequent requests.

The model then sees its own past "responses" containing both progress markers and real answers. With enough examples in context, in-context learning takes over and the model starts producing markers as text instead of issuing actual `tool_calls`. This is a well-known LLM phenomenon: strong in-context patterns can override system-prompt instructions, and smaller / less capable models are more susceptible.


### Proposed Fix (optional)

Tool progress should not be mixed into `delta.content`. Options in order of preference:

1. **Don't inject progress markers at all** in the API server adapter. Frontends that want progress visualization can use the structured `/v1/runs` event stream instead.
2. **Send progress as a separate non-content SSE event** (e.g. a custom `event:` line) that compatible frontends can display but won't store as message text.
3. **Strip injected markers from the assistant content** before returning the final `chat.completion` response, AND filter them out when reading conversation history from incoming requests. Most backward-compatible — markers can still appear in real-time during streaming, but won't pollute the stored history.

## Workaround

- Start a new chat in Open WebUI when the agent stops calling tools.
- Or manually edit the recent assistant messages in Open WebUI and remove the `` `⏰ tool_name` `` markers from the message content.

### Are you willing to submit a PR for this?

- [ ] I'd like to fix this myself and submit a PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Tool progress markers in SSE content corrupt model behavior over time (Open WebUI / OpenAI-compatible API) #6972

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Operating System

Python Version

Hermes Version

Relevant Logs / Traceback

Root Cause Analysis (optional)

Proposed Fix (optional)

Workaround

Are you willing to submit a PR for this?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug]: Tool progress markers in SSE content corrupt model behavior over time (Open WebUI / OpenAI-compatible API) #6972

Description

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Operating System

Python Version

Hermes Version

Relevant Logs / Traceback

Root Cause Analysis (optional)

Proposed Fix (optional)

Workaround

Are you willing to submit a PR for this?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions