[Bug] Ollama concurrency limit causes literal '(empty)' responses on Telegram

**Date observed:** 2026-04-09  
**Hermes version:** v0.7.0 (2026.4.3)  
**Model in use:** `minimax-m2.7:cloud` via `http://localhost:11434/v1` (Ollama custom provider)  
**Platform where bug occurs:** Telegram (gateway mode)  
**Not reproducible via:** CLI (`hermes chat -q "..."`) — works fine there

---

## Symptoms

- Hermes replies with the literal text `(empty)` on Telegram.
- Happens mid-conversation, not on the first message.
- Can happen multiple times in a row for the same message.

---

## Root Cause

The Ollama cloud endpoint has a **parallel request limit of 3 concurrent model calls**. When multiple agents (Hermes + OpenClaw) are running simultaneously against the same endpoint, the API returns HTTP 200 with an empty `choices[0].message.content` — no tool calls, no reasoning.

When this happens, `run_agent.py` (around line 8896) substitutes the literal string `"(empty)"` as the response, which gets forwarded to the user on Telegram verbatim.

The existing retry logic (`_try_activate_fallback`) only triggers on HTTP-level failures (null response, missing `choices`), not on a successful HTTP 200 response where the model returned empty content.

### Relevant code path (`run_agent.py`):
```
line ~8835: final_response = assistant_message.content or ""
line ~8840: if not self._has_content_after_think_block(final_response):
              ...
              # After exhausting prefill retries:
line ~8896:   assistant_msg["content"] = "(empty)"
line ~8905:   final_response = "(empty)"
              break  # ← sent to user as-is
```

---

## Proposed Fix

### Part 1 — Exponential backoff retry (in `run_agent.py`, around line 9121)

After incrementing `_empty_content_retries`, retry with backoff before giving up:

```python
if _truly_empty and not _has_structured and self._empty_content_retries < 3:
    self._empty_content_retries += 1
    backoff = min(5 * (2 ** (self._empty_content_retries - 1)), 20)  # 2s, 4s, 8s, max 20s
    self._vprint(
        f"{self.log_prefix}↻ Empty response — retrying ({self._empty_content_retries}/3) "
        f"after {backoff}s backoff..."
    )
    time.sleep(backoff)
    continue
```

### Part 2 — User-friendly final message

If all retries are exhausted, replace `(empty)` with a user-friendly message:
```python
# Instead of: final_response = "(empty)"
# Use:
final_response = "⚠️ The model returned an empty response. Please try again in a moment."
```

### Part 3 — Status callback (optional enhancement)

On the first empty retry, signal the gateway so the user sees a typing indicator:
```python
if self._empty_content_retries == 1 and self.status_callback:
    try:
        self.status_callback("typing")  # or ("warning", "Model overloaded, retrying...")
    except Exception:
        pass
```

---

## Testing

**To reproduce:** Run both Hermes (gateway) and OpenClaw simultaneously with `minimax-m2.7:cloud` against the same Ollama endpoint that has a 3-request parallel limit. Any message after both are active will likely trigger the empty response.

**Proposed test cases** (new file `tests/test_empty_response.py`):
1. Mock API client to return empty content once, then valid response → verify 1 retry with backoff succeeds
2. Mock API client to return empty content 3 times, then valid response → verify 3 retries then graceful failure
3. Mock API client to return reasoning-only (no visible content) → verify falls back to reasoning text
4. Verify: after 3 exhausted retries, response is user-friendly message NOT literal `"(empty)"`

---

## Workaround (current)

- Avoid running OpenClaw and Hermes simultaneously to stay under the 3-parallel-request limit.
- Configure a `fallback_model` in `~/.hermes/config.yaml` pointing to a second model/provider that can absorb overflow.

---

## Evidence

- Session file: `~/.hermes/sessions/session_20260409_074150_69a8f1d3.json` — multiple assistant messages with `"content": "(empty)"` at messages 56, 58, 111, 113, 115, 142, 144.
- Agent log entries at time of occurrence (`~/.hermes/logs/agent.log`):
  ```
  10:13:28 response ready: platform=telegram chat=11****9 time=7.3s api_calls=1 response=7 chars
  ```
  (7 chars = "(empty)")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Ollama concurrency limit causes literal '(empty)' responses on Telegram #6559

Symptoms

Root Cause

Relevant code path (`run_agent.py`):

Proposed Fix

Part 1 — Exponential backoff retry (in `run_agent.py`, around line 9121)

Part 2 — User-friendly final message

Part 3 — Status callback (optional enhancement)

Testing

Workaround (current)

Evidence

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug] Ollama concurrency limit causes literal '(empty)' responses on Telegram #6559

Description

Symptoms

Root Cause

Relevant code path (run_agent.py):

Proposed Fix

Part 1 — Exponential backoff retry (in run_agent.py, around line 9121)

Part 2 — User-friendly final message

Part 3 — Status callback (optional enhancement)

Testing

Workaround (current)

Evidence

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Relevant code path (`run_agent.py`):

Part 1 — Exponential backoff retry (in `run_agent.py`, around line 9121)