Ollama `glm-5.1:cloud` stalls after tool results in agent runs while direct Ollama App chat works

### Summary

When OpenClaw is configured to use Ollama with `glm-5.1:cloud`, normal chat in the Ollama App works, but agent runs in OpenClaw can stall after a successful tool execution. The UI shows tool output, then no assistant response arrives for a long time, and the run eventually ends with timeout/abort/network errors.

This appears to be specifically in the agent/tool loop path (`toolResult -> next model response`), not in simple direct chat.

### Environment

- OpenClaw: `2026.5.7`
- Model: `ollama/glm-5.1:cloud`
- API type: `ollama`
- Primary Ollama base URL: `http://127.0.0.1:11434`
- Agent defaults include:
  - `streaming: true`
  - `thinkingDefault: high`
- I also had multiple Ollama fallbacks configured (`ollama2` .. `ollama5`) pointing to the same model/provider family before falling back to NVIDIA.

### What works

- `glm-5.1:cloud` in the Ollama App works normally for direct chat.
- Short/simple OpenClaw prompts may also work.

### What fails

- Multi-step OpenClaw agent runs that use tools.
- After a tool completes successfully and the tool result is visible, OpenClaw sometimes never gets the next usable assistant response from `glm-5.1:cloud`.
- From the user side this looks like a freeze/hang.

### Expected behavior

After a successful tool call and tool result, OpenClaw should receive the next assistant response or fail over quickly and visibly.

### Actual behavior

The run can stall between `toolResult` and the next assistant message, then eventually fail with abort/timeout/network errors.

### Minimal repro pattern

1. Configure OpenClaw primary model as `ollama/glm-5.1:cloud` using `api: "ollama"`.
2. Run an agent task that performs several tool calls.
3. Observe that tool calls execute and tool output is shown.
4. After one of the `toolResult` messages, the run may stop producing assistant output.
5. Eventually it ends with timeout/abort/network errors, or only recovers after retries/compaction/fallback.

### Evidence from local logs

1. A session where a tool finishes successfully and the next model step aborts immediately:

```json
{"type":"message","message":{"role":"toolResult","toolName":"exec","isError":false}}
{"type":"custom","customType":"openclaw:prompt-error","data":{"provider":"ollama","model":"glm-5.1:cloud","api":"ollama","error":"aborted | cron: job execution timed out"}}
{"type":"message","message":{"role":"assistant","stopReason":"error","errorMessage":"This operation was aborted"}}
```

2. Timeout/failover path in gateway logs:

```text
[agent/embedded] embedded run failover decision: decision=fallback_model reason=timeout from=ollama/glm-5.1:cloud
[diagnostic] lane task error: error="FailoverError: LLM request timed out."
[model-fallback/decision] candidate=ollama/glm-5.1:cloud reason=timeout next=nvidia/z-ai/glm-5.1
```

3. Network-level failures from the same model/provider path:

```text
error=LLM request failed: network connection was interrupted. rawError=fetch failed | read ECONNRESET
error=LLM request failed: network connection error. rawError=fetch failed | Client network socket disconnected before secure TLS connection was established
```

4. In longer tool loops, context pressure also shows up:

```text
Context overflow: estimated context size exceeds safe threshold during tool loop.
context overflow detected (attempt 1/3); attempting auto-compaction for ollama/glm-5.1:cloud
```

### Why I think this is not just a generic Ollama/App problem

- The same model works in the Ollama App for direct chat.
- The breakage is most visible in OpenClaw agent orchestration after tool results.
- The failure pattern is silent enough that from the UI it looks like the agent is frozen, even though the underlying issue seems to be timeout/abort/network handling in the model handoff after tools.

### Questions

- Is there a known incompatibility or instability with `glm-5.1:cloud` in the OpenClaw tool loop path?
- Should OpenClaw fail over earlier/more explicitly after `toolResult -> next prompt` stalls?
- Is there any recommended config for cloud Ollama models in agent mode (reduced thinking, no streaming, lower context pressure, different timeout strategy)?

If useful, I can provide a redacted config excerpt and additional redacted session/gateway logs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ollama `glm-5.1:cloud` stalls after tool results in agent runs while direct Ollama App chat works #79350

Summary

Environment

What works

What fails

Expected behavior

Actual behavior

Minimal repro pattern

Evidence from local logs

Why I think this is not just a generic Ollama/App problem

Questions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Ollama glm-5.1:cloud stalls after tool results in agent runs while direct Ollama App chat works #79350

Description

Summary

Environment

What works

What fails

Expected behavior

Actual behavior

Minimal repro pattern

Evidence from local logs

Why I think this is not just a generic Ollama/App problem

Questions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Ollama `glm-5.1:cloud` stalls after tool results in agent runs while direct Ollama App chat works #79350