Skip to content

[Bug] Ollama concurrency limit causes literal '(empty)' responses on Telegram #6559

@davidvv

Description

@davidvv

Date observed: 2026-04-09
Hermes version: v0.7.0 (2026.4.3)
Model in use: minimax-m2.7:cloud via http://localhost:11434/v1 (Ollama custom provider)
Platform where bug occurs: Telegram (gateway mode)
Not reproducible via: CLI (hermes chat -q "...") — works fine there


Symptoms

  • Hermes replies with the literal text (empty) on Telegram.
  • Happens mid-conversation, not on the first message.
  • Can happen multiple times in a row for the same message.

Root Cause

The Ollama cloud endpoint has a parallel request limit of 3 concurrent model calls. When multiple agents (Hermes + OpenClaw) are running simultaneously against the same endpoint, the API returns HTTP 200 with an empty choices[0].message.content — no tool calls, no reasoning.

When this happens, run_agent.py (around line 8896) substitutes the literal string "(empty)" as the response, which gets forwarded to the user on Telegram verbatim.

The existing retry logic (_try_activate_fallback) only triggers on HTTP-level failures (null response, missing choices), not on a successful HTTP 200 response where the model returned empty content.

Relevant code path (run_agent.py):

line ~8835: final_response = assistant_message.content or ""
line ~8840: if not self._has_content_after_think_block(final_response):
              ...
              # After exhausting prefill retries:
line ~8896:   assistant_msg["content"] = "(empty)"
line ~8905:   final_response = "(empty)"
              break  # ← sent to user as-is

Proposed Fix

Part 1 — Exponential backoff retry (in run_agent.py, around line 9121)

After incrementing _empty_content_retries, retry with backoff before giving up:

if _truly_empty and not _has_structured and self._empty_content_retries < 3:
    self._empty_content_retries += 1
    backoff = min(5 * (2 ** (self._empty_content_retries - 1)), 20)  # 2s, 4s, 8s, max 20s
    self._vprint(
        f"{self.log_prefix}↻ Empty response — retrying ({self._empty_content_retries}/3) "
        f"after {backoff}s backoff..."
    )
    time.sleep(backoff)
    continue

Part 2 — User-friendly final message

If all retries are exhausted, replace (empty) with a user-friendly message:

# Instead of: final_response = "(empty)"
# Use:
final_response = "⚠️ The model returned an empty response. Please try again in a moment."

Part 3 — Status callback (optional enhancement)

On the first empty retry, signal the gateway so the user sees a typing indicator:

if self._empty_content_retries == 1 and self.status_callback:
    try:
        self.status_callback("typing")  # or ("warning", "Model overloaded, retrying...")
    except Exception:
        pass

Testing

To reproduce: Run both Hermes (gateway) and OpenClaw simultaneously with minimax-m2.7:cloud against the same Ollama endpoint that has a 3-request parallel limit. Any message after both are active will likely trigger the empty response.

Proposed test cases (new file tests/test_empty_response.py):

  1. Mock API client to return empty content once, then valid response → verify 1 retry with backoff succeeds
  2. Mock API client to return empty content 3 times, then valid response → verify 3 retries then graceful failure
  3. Mock API client to return reasoning-only (no visible content) → verify falls back to reasoning text
  4. Verify: after 3 exhausted retries, response is user-friendly message NOT literal "(empty)"

Workaround (current)

  • Avoid running OpenClaw and Hermes simultaneously to stay under the 3-parallel-request limit.
  • Configure a fallback_model in ~/.hermes/config.yaml pointing to a second model/provider that can absorb overflow.

Evidence

  • Session file: ~/.hermes/sessions/session_20260409_074150_69a8f1d3.json — multiple assistant messages with "content": "(empty)" at messages 56, 58, 111, 113, 115, 142, 144.
  • Agent log entries at time of occurrence (~/.hermes/logs/agent.log):
    10:13:28 response ready: platform=telegram chat=11****9 time=7.3s api_calls=1 response=7 chars
    
    (7 chars = "(empty)")

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/agentCore agent loop, run_agent.py, prompt builderplatform/telegramTelegram bot adapterprovider/ollamaOllama / local modelssweeper:implemented-on-mainSweeper: behavior already present on current maintype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions