Skip to content

fix: empty response recovery for reasoning models (mimo, qwen, GLM)#8609

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-81f1601a
Apr 12, 2026
Merged

fix: empty response recovery for reasoning models (mimo, qwen, GLM)#8609
teknium1 merged 1 commit into
mainfrom
hermes/hermes-81f1601a

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

Fixes the (empty) response bug affecting open reasoning models on OpenRouter (mimo-v2-pro, qwen3.5, GLM).

Root cause: Models like mimo-v2-pro always return reasoning/reasoning_details fields via OpenRouter — even on transient empty responses. The recovery chain had a not _has_structured guard on the retry path that blocked retries for ANY model with reasoning after the 2 prefill attempts were exhausted. Combined with retry counters that never reset during tool-calling turns, this caused permanent (empty) in long sessions.

Fixes

1. Allow retries after prefill exhaustion

# Before: retry path blocked when reasoning present
if _truly_empty and not _has_structured and self._empty_content_retries < 3:

# After: allow retries once prefill is exhausted
_prefill_exhausted = _has_structured and self._thinking_prefill_retries >= 2
if _truly_empty and (not _has_structured or _prefill_exhausted) and self._empty_content_retries < 3:

2. Reset counters on tool-call recovery
When prefill succeeds (model makes proper tool calls after prefill), reset both _thinking_prefill_retries and _empty_content_retries. Previously these only reset on successful content, never during tool-calling turns — so a model cycling empty→prefill→tools→empty burned all prefill attempts permanently.

3. Strip think blocks before truly-empty check
_truly_empty now uses _strip_think_blocks() so inline <think> content doesn't make the string falsely non-empty.

Reproduction

qwen/qwen3.5-9b on OpenRouter reliably produces the triggering response: model emits tool calls as XML inside its reasoning field instead of proper function calls. Response has content=None, tool_calls=None, reasoning='...<tool_call>...'. Prefill recovery works on first attempt, but counter accumulation caused permanent (empty) in sustained sessions.

Test plan

  • Updated 2 existing tests to match new recovery depth (6 attempts instead of 3)
  • python -m pytest tests/run_agent/test_run_agent.py -k 'empty or prefill or thinking' — 22 passed

Three fixes for the (empty) response bug affecting open reasoning models:

1. Allow retries after prefill exhaustion — models like mimo-v2-pro always
   populate reasoning fields via OpenRouter, so the old 'not _has_structured'
   guard on the retry path blocked retries for EVERY reasoning model after
   the 2 prefill attempts.  Now: 2 prefills + 3 retries = 6 total attempts
   before (empty).

2. Reset prefill/retry counters on tool-call recovery — the counters
   accumulated across the entire conversation, never resetting during
   tool-calling turns.  A model cycling empty→prefill→tools→empty burned
   both prefill attempts and the third empty got zero recovery.  Now
   counters reset when prefill succeeds with tool calls.

3. Strip think blocks before _truly_empty check — inline <think> content
   made the string non-empty, skipping both retry paths.

Reported by users on Telegram with xiaomi/mimo-v2-pro and qwen3.5 models.
Reproduced: qwen3.5-9b emits tool calls as XML in reasoning field instead
of proper function calls, causing content=None + tool_calls=None + reasoning
with embedded <tool_call> XML.  Prefill recovery works but counter
accumulation caused permanent (empty) in long sessions.
@teknium1 teknium1 merged commit d6785dc into main Apr 12, 2026
4 of 6 checks passed
@teknium1 teknium1 deleted the hermes/hermes-81f1601a branch April 12, 2026 22:38
forsonny pushed a commit to forsonny/hermes-agent that referenced this pull request Apr 12, 2026
…o, qwen, GLM)

Upstream commit d6785dc (PR NousResearch#8609):
- Fix empty response retry logic that blocked retries for reasoning models
  after prefill exhaustion. Models like mimo-v2-pro always populate reasoning
  fields via OpenRouter, so the old not _has_structured guard prevented
  retries for every reasoning model after prefill.
- Remove tool timing footer (upstream cleanup)
- Remove format_tool_timing_footer import
- Replace logger.debug with pass for non-critical failures
- 134 files changed, net -16K lines (mostly test/data cleanup)

Self-improve: automated upstream merge
ulasbilgen pushed a commit to ulasbilgen/hermes-adhd-agent that referenced this pull request May 1, 2026
…ousResearch#8609)

Three fixes for the (empty) response bug affecting open reasoning models:

1. Allow retries after prefill exhaustion — models like mimo-v2-pro always
   populate reasoning fields via OpenRouter, so the old 'not _has_structured'
   guard on the retry path blocked retries for EVERY reasoning model after
   the 2 prefill attempts.  Now: 2 prefills + 3 retries = 6 total attempts
   before (empty).

2. Reset prefill/retry counters on tool-call recovery — the counters
   accumulated across the entire conversation, never resetting during
   tool-calling turns.  A model cycling empty→prefill→tools→empty burned
   both prefill attempts and the third empty got zero recovery.  Now
   counters reset when prefill succeeds with tool calls.

3. Strip think blocks before _truly_empty check — inline <think> content
   made the string non-empty, skipping both retry paths.

Reported by users on Telegram with xiaomi/mimo-v2-pro and qwen3.5 models.
Reproduced: qwen3.5-9b emits tool calls as XML in reasoning field instead
of proper function calls, causing content=None + tool_calls=None + reasoning
with embedded <tool_call> XML.  Prefill recovery works but counter
accumulation caused permanent (empty) in long sessions.
aj-nt pushed a commit to aj-nt/hermes-agent that referenced this pull request May 1, 2026
…ousResearch#8609)

Three fixes for the (empty) response bug affecting open reasoning models:

1. Allow retries after prefill exhaustion — models like mimo-v2-pro always
   populate reasoning fields via OpenRouter, so the old 'not _has_structured'
   guard on the retry path blocked retries for EVERY reasoning model after
   the 2 prefill attempts.  Now: 2 prefills + 3 retries = 6 total attempts
   before (empty).

2. Reset prefill/retry counters on tool-call recovery — the counters
   accumulated across the entire conversation, never resetting during
   tool-calling turns.  A model cycling empty→prefill→tools→empty burned
   both prefill attempts and the third empty got zero recovery.  Now
   counters reset when prefill succeeds with tool calls.

3. Strip think blocks before _truly_empty check — inline <think> content
   made the string non-empty, skipping both retry paths.

Reported by users on Telegram with xiaomi/mimo-v2-pro and qwen3.5 models.
Reproduced: qwen3.5-9b emits tool calls as XML in reasoning field instead
of proper function calls, causing content=None + tool_calls=None + reasoning
with embedded <tool_call> XML.  Prefill recovery works but counter
accumulation caused permanent (empty) in long sessions.
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…ousResearch#8609)

Three fixes for the (empty) response bug affecting open reasoning models:

1. Allow retries after prefill exhaustion — models like mimo-v2-pro always
   populate reasoning fields via OpenRouter, so the old 'not _has_structured'
   guard on the retry path blocked retries for EVERY reasoning model after
   the 2 prefill attempts.  Now: 2 prefills + 3 retries = 6 total attempts
   before (empty).

2. Reset prefill/retry counters on tool-call recovery — the counters
   accumulated across the entire conversation, never resetting during
   tool-calling turns.  A model cycling empty→prefill→tools→empty burned
   both prefill attempts and the third empty got zero recovery.  Now
   counters reset when prefill succeeds with tool calls.

3. Strip think blocks before _truly_empty check — inline <think> content
   made the string non-empty, skipping both retry paths.

Reported by users on Telegram with xiaomi/mimo-v2-pro and qwen3.5 models.
Reproduced: qwen3.5-9b emits tool calls as XML in reasoning field instead
of proper function calls, causing content=None + tool_calls=None + reasoning
with embedded <tool_call> XML.  Prefill recovery works but counter
accumulation caused permanent (empty) in long sessions.
olympus-terminal pushed a commit to olympus-terminal/hermes-agent that referenced this pull request May 16, 2026
…ousResearch#8609)

Three fixes for the (empty) response bug affecting open reasoning models:

1. Allow retries after prefill exhaustion — models like mimo-v2-pro always
   populate reasoning fields via OpenRouter, so the old 'not _has_structured'
   guard on the retry path blocked retries for EVERY reasoning model after
   the 2 prefill attempts.  Now: 2 prefills + 3 retries = 6 total attempts
   before (empty).

2. Reset prefill/retry counters on tool-call recovery — the counters
   accumulated across the entire conversation, never resetting during
   tool-calling turns.  A model cycling empty→prefill→tools→empty burned
   both prefill attempts and the third empty got zero recovery.  Now
   counters reset when prefill succeeds with tool calls.

3. Strip think blocks before _truly_empty check — inline <think> content
   made the string non-empty, skipping both retry paths.

Reported by users on Telegram with xiaomi/mimo-v2-pro and qwen3.5 models.
Reproduced: qwen3.5-9b emits tool calls as XML in reasoning field instead
of proper function calls, causing content=None + tool_calls=None + reasoning
with embedded <tool_call> XML.  Prefill recovery works but counter
accumulation caused permanent (empty) in long sessions.
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…ousResearch#8609)

Three fixes for the (empty) response bug affecting open reasoning models:

1. Allow retries after prefill exhaustion — models like mimo-v2-pro always
   populate reasoning fields via OpenRouter, so the old 'not _has_structured'
   guard on the retry path blocked retries for EVERY reasoning model after
   the 2 prefill attempts.  Now: 2 prefills + 3 retries = 6 total attempts
   before (empty).

2. Reset prefill/retry counters on tool-call recovery — the counters
   accumulated across the entire conversation, never resetting during
   tool-calling turns.  A model cycling empty→prefill→tools→empty burned
   both prefill attempts and the third empty got zero recovery.  Now
   counters reset when prefill succeeds with tool calls.

3. Strip think blocks before _truly_empty check — inline <think> content
   made the string non-empty, skipping both retry paths.

Reported by users on Telegram with xiaomi/mimo-v2-pro and qwen3.5 models.
Reproduced: qwen3.5-9b emits tool calls as XML in reasoning field instead
of proper function calls, causing content=None + tool_calls=None + reasoning
with embedded <tool_call> XML.  Prefill recovery works but counter
accumulation caused permanent (empty) in long sessions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant