Bug: Ollama returns finish_reason='stop' on truncated GLM responses, causing agent to silently drop final output

## Bug Description

When using GLM models (e.g., `glm-4`, `glm-5.1`) served through an Ollama proxy, the agent sometimes exits the response loop prematurely — the user sees tool-call progress in the logs but never receives the final text response.

## Root Cause

Two issues combine to cause this:

### 1. Ollama returns `finish_reason="stop"` on truncated output
Ollama's OpenAI-compatible API returns `finish_reason="stop"` when a GLM model's output is actually **truncated** (hit max output tokens). Per the OpenAI API spec, truncated responses should return `finish_reason="length"`. Hermes correctly handles `"length"` by requesting a continuation, but `"stop"` causes Hermes to treat the truncated output as complete and exit the loop.

### 2. Recent change made the behavior more visible
Commit `0d25e1c1` (PR #10472) restricted `_last_content_with_tools` fallback to housekeeping-only tool turns. Previously, when GLM returned content alongside substantive tools (terminal, search_files, etc.) and then went silent on the next turn, the fallback would display that mid-task narration as a "final answer." After the change, this fallback no longer applies to substantive tools, so the model's silence is exposed — the user sees tool execution log but no final response.

In the old version, the bug was masked by a fallback that displayed incomplete narration. The new version correctly removes that fallback (it was showing "I'll scan the directory..." as a final answer), but now there's nothing to catch the GLM truncation case.

## Evidence

From `agent.log`, tasks that should produce lengthy responses show very short final outputs:
```
api_calls=22  response=171 chars
api_calls=17  response=132 chars
```

The model was making tool calls throughout the conversation but the final text response was cut off mid-sentence without any natural ending punctuation.

## Suggested Fix

Add a truncation-detection heuristic for models that incorrectly report `finish_reason="stop"` on truncated output:

1. When `finish_reason="stop"` with no `tool_calls`, check if the response ends with natural punctuation (`.`, `。`, `!`, `？`, `?`, `\n`, `:`, `)`)
2. If no natural ending → treat as truncated and append a continuation message
3. Retry up to 2 times to prevent infinite loops

This is defensive programming on the Hermes side. The proper fix should be in Ollama (returning `"length"` when appropriate), but since Hermes may encounter other non-compliant backends, adding a generic truncation-detection heuristic improves robustness.

This approach is consistent with the existing `_is_qwen_portal()` and `_qwen_prepare_chat_messages()` patterns in the codebase for model-specific workarounds.

## Environment

- Hermes Agent v0.9.0 (commit fb903b8f)
- Model: `glm-5.1:cloud` via Ollama proxy
- Backend: Ollama on localhost:11434

## Related

- Commit `0d25e1c1` (PR #10472) — the change that made this bug more visible
- Ollama issue #7547 — `finish_reason` returning wrong values (null instead of stop/length)
- Similar pattern to `_is_qwen_portal()` / `_qwen_prepare_chat_messages()` workarounds already in the codebase

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Ollama returns finish_reason='stop' on truncated GLM responses, causing agent to silently drop final output #10711

Bug Description

Root Cause

1. Ollama returns `finish_reason="stop"` on truncated output

2. Recent change made the behavior more visible

Evidence

Suggested Fix

Environment

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Bug: Ollama returns finish_reason='stop' on truncated GLM responses, causing agent to silently drop final output #10711

Description

Bug Description

Root Cause

1. Ollama returns finish_reason="stop" on truncated output

2. Recent change made the behavior more visible

Evidence

Suggested Fix

Environment

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. Ollama returns `finish_reason="stop"` on truncated output