Bug Description
When using GLM models (e.g., glm-4, glm-5.1) served through an Ollama proxy, the agent sometimes exits the response loop prematurely — the user sees tool-call progress in the logs but never receives the final text response.
Root Cause
Two issues combine to cause this:
1. Ollama returns finish_reason="stop" on truncated output
Ollama's OpenAI-compatible API returns finish_reason="stop" when a GLM model's output is actually truncated (hit max output tokens). Per the OpenAI API spec, truncated responses should return finish_reason="length". Hermes correctly handles "length" by requesting a continuation, but "stop" causes Hermes to treat the truncated output as complete and exit the loop.
2. Recent change made the behavior more visible
Commit 0d25e1c1 (PR #10472) restricted _last_content_with_tools fallback to housekeeping-only tool turns. Previously, when GLM returned content alongside substantive tools (terminal, search_files, etc.) and then went silent on the next turn, the fallback would display that mid-task narration as a "final answer." After the change, this fallback no longer applies to substantive tools, so the model's silence is exposed — the user sees tool execution log but no final response.
In the old version, the bug was masked by a fallback that displayed incomplete narration. The new version correctly removes that fallback (it was showing "I'll scan the directory..." as a final answer), but now there's nothing to catch the GLM truncation case.
Evidence
From agent.log, tasks that should produce lengthy responses show very short final outputs:
api_calls=22 response=171 chars
api_calls=17 response=132 chars
The model was making tool calls throughout the conversation but the final text response was cut off mid-sentence without any natural ending punctuation.
Suggested Fix
Add a truncation-detection heuristic for models that incorrectly report finish_reason="stop" on truncated output:
- When
finish_reason="stop" with no tool_calls, check if the response ends with natural punctuation (., 。, !, ?, ?, \n, :, ))
- If no natural ending → treat as truncated and append a continuation message
- Retry up to 2 times to prevent infinite loops
This is defensive programming on the Hermes side. The proper fix should be in Ollama (returning "length" when appropriate), but since Hermes may encounter other non-compliant backends, adding a generic truncation-detection heuristic improves robustness.
This approach is consistent with the existing _is_qwen_portal() and _qwen_prepare_chat_messages() patterns in the codebase for model-specific workarounds.
Environment
- Hermes Agent v0.9.0 (commit fb903b8)
- Model:
glm-5.1:cloud via Ollama proxy
- Backend: Ollama on localhost:11434
Related
Bug Description
When using GLM models (e.g.,
glm-4,glm-5.1) served through an Ollama proxy, the agent sometimes exits the response loop prematurely — the user sees tool-call progress in the logs but never receives the final text response.Root Cause
Two issues combine to cause this:
1. Ollama returns
finish_reason="stop"on truncated outputOllama's OpenAI-compatible API returns
finish_reason="stop"when a GLM model's output is actually truncated (hit max output tokens). Per the OpenAI API spec, truncated responses should returnfinish_reason="length". Hermes correctly handles"length"by requesting a continuation, but"stop"causes Hermes to treat the truncated output as complete and exit the loop.2. Recent change made the behavior more visible
Commit
0d25e1c1(PR #10472) restricted_last_content_with_toolsfallback to housekeeping-only tool turns. Previously, when GLM returned content alongside substantive tools (terminal, search_files, etc.) and then went silent on the next turn, the fallback would display that mid-task narration as a "final answer." After the change, this fallback no longer applies to substantive tools, so the model's silence is exposed — the user sees tool execution log but no final response.In the old version, the bug was masked by a fallback that displayed incomplete narration. The new version correctly removes that fallback (it was showing "I'll scan the directory..." as a final answer), but now there's nothing to catch the GLM truncation case.
Evidence
From
agent.log, tasks that should produce lengthy responses show very short final outputs:The model was making tool calls throughout the conversation but the final text response was cut off mid-sentence without any natural ending punctuation.
Suggested Fix
Add a truncation-detection heuristic for models that incorrectly report
finish_reason="stop"on truncated output:finish_reason="stop"with notool_calls, check if the response ends with natural punctuation (.,。,!,?,?,\n,:,))This is defensive programming on the Hermes side. The proper fix should be in Ollama (returning
"length"when appropriate), but since Hermes may encounter other non-compliant backends, adding a generic truncation-detection heuristic improves robustness.This approach is consistent with the existing
_is_qwen_portal()and_qwen_prepare_chat_messages()patterns in the codebase for model-specific workarounds.Environment
glm-5.1:cloudvia Ollama proxyRelated
0d25e1c1(PR fix: prevent premature loop exit when weak models return empty after substantive tool calls #10472) — the change that made this bug more visiblefinish_reasonreturning wrong values (null instead of stop/length)_is_qwen_portal()/_qwen_prepare_chat_messages()workarounds already in the codebase