Skip to content

Bug: Ollama returns finish_reason='stop' on truncated GLM responses, causing agent to silently drop final output #10711

@peter20201011-cmyk

Description

@peter20201011-cmyk

Bug Description

When using GLM models (e.g., glm-4, glm-5.1) served through an Ollama proxy, the agent sometimes exits the response loop prematurely — the user sees tool-call progress in the logs but never receives the final text response.

Root Cause

Two issues combine to cause this:

1. Ollama returns finish_reason="stop" on truncated output

Ollama's OpenAI-compatible API returns finish_reason="stop" when a GLM model's output is actually truncated (hit max output tokens). Per the OpenAI API spec, truncated responses should return finish_reason="length". Hermes correctly handles "length" by requesting a continuation, but "stop" causes Hermes to treat the truncated output as complete and exit the loop.

2. Recent change made the behavior more visible

Commit 0d25e1c1 (PR #10472) restricted _last_content_with_tools fallback to housekeeping-only tool turns. Previously, when GLM returned content alongside substantive tools (terminal, search_files, etc.) and then went silent on the next turn, the fallback would display that mid-task narration as a "final answer." After the change, this fallback no longer applies to substantive tools, so the model's silence is exposed — the user sees tool execution log but no final response.

In the old version, the bug was masked by a fallback that displayed incomplete narration. The new version correctly removes that fallback (it was showing "I'll scan the directory..." as a final answer), but now there's nothing to catch the GLM truncation case.

Evidence

From agent.log, tasks that should produce lengthy responses show very short final outputs:

api_calls=22  response=171 chars
api_calls=17  response=132 chars

The model was making tool calls throughout the conversation but the final text response was cut off mid-sentence without any natural ending punctuation.

Suggested Fix

Add a truncation-detection heuristic for models that incorrectly report finish_reason="stop" on truncated output:

  1. When finish_reason="stop" with no tool_calls, check if the response ends with natural punctuation (., , !, , ?, \n, :, ))
  2. If no natural ending → treat as truncated and append a continuation message
  3. Retry up to 2 times to prevent infinite loops

This is defensive programming on the Hermes side. The proper fix should be in Ollama (returning "length" when appropriate), but since Hermes may encounter other non-compliant backends, adding a generic truncation-detection heuristic improves robustness.

This approach is consistent with the existing _is_qwen_portal() and _qwen_prepare_chat_messages() patterns in the codebase for model-specific workarounds.

Environment

  • Hermes Agent v0.9.0 (commit fb903b8)
  • Model: glm-5.1:cloud via Ollama proxy
  • Backend: Ollama on localhost:11434

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions