Bug Description
When usingNVIDIA Build provider with model z-ai/glm4.7 , the API returns Thinking Budget Exhausted error, The model ID works on OpenClaw, but Hermes cannot callit successfully.
Steps to Reproduce
-
Run hermes setup and configure custom with NVIDIA Build API key
-
Select model z-ai/glm4.7
-
Try to chat: hernes chat or via gateway
-
Error: ⚠️ Response truncated (finish_reason='length') - model hit max output tokens
💭 Reasoning exhausted the output token budget — no visible response was produced.
─ ⚕ Hermes ─────────────────────────────────────────────────────────────
⚠️ Thinking Budget Exhausted
The model used all its output tokens on reasoning and had none left
for the actual response.
To fix this:
→ Lower reasoning effort: /thinkon low or /thinkon minimal
→ Increase the output token limit: set model.max_tokens in
config.yaml
5.return to Ollama(cloud) model minimax-m2.7 ,changing nothing,it works
Expected Behavior
⚠️ Response truncated (finish_reason='length') - model hit max output tokens
💭 Reasoning exhausted the output token budget — no visible response was produced.
─ ⚕ Hermes ─────────────────────────────────────────────────────────────
⚠️ **Thinking Budget Exhausted**
The model used all its output tokens on reasoning and had none left
for the actual response.
To fix this:
→ Lower reasoning effort: `/thinkon low` or `/thinkon minimal`
→ Increase the output token limit: set `model.max_tokens` in
config.yaml
Actual Behavior
Configuration (configyaml, env, hermes setup)
Affected Component
Setup / Installation
Messaging Platform (if gateway-related)
No response
Operating System
docker in synology(DSM 7.2.2-72806) docker
Python Version
3.13.5
Hermes Version
Hermes Agent v0.8.0 (2026.4.8)
Relevant Logs / Traceback
⚠️ Response truncated (finish_reason='length') - model hit max output tokens
💭 Reasoning exhausted the output token budget — no visible response was produced.
─ ⚕ Hermes ─────────────────────────────────────────────────────────────
⚠️ **Thinking Budget Exhausted**
The model used all its output tokens on reasoning and had none left
for the actual response.
To fix this:
→ Lower reasoning effort: `/thinkon low` or `/thinkon minimal`
→ Increase the output token limit: set `model.max_tokens` in
config.yaml
Root Cause Analysis (optional)
No response
Proposed Fix (optional)
No response
Are you willing to submit a PR for this?
Bug Description
When usingNVIDIA Build provider with model z-ai/glm4.7 , the API returns Thinking Budget Exhausted error, The model ID works on OpenClaw, but Hermes cannot callit successfully.
Steps to Reproduce
Run hermes setup and configure custom with NVIDIA Build API key
Select model z-ai/glm4.7
Try to chat: hernes chat or via gateway
Error:⚠️ Response truncated (finish_reason='length') - model hit max output tokens
💭 Reasoning exhausted the output token budget — no visible response was produced.
─ ⚕ Hermes ─────────────────────────────────────────────────────────────
The model used all its output tokens on reasoning and had none left
for the actual response.
To fix this:
→ Lower reasoning effort:
/thinkon lowor/thinkon minimal→ Increase the output token limit: set
model.max_tokensinconfig.yaml
5.return to Ollama(cloud) model minimax-m2.7 ,changing nothing,it works
Expected Behavior
💭 Reasoning exhausted the output token budget — no visible response was produced.
─ ⚕ Hermes ─────────────────────────────────────────────────────────────
Actual Behavior
Configuration (configyaml, env, hermes setup)
Affected Component
Setup / Installation
Messaging Platform (if gateway-related)
No response
Operating System
docker in synology(DSM 7.2.2-72806) docker
Python Version
3.13.5
Hermes Version
Hermes Agent v0.8.0 (2026.4.8)
Relevant Logs / Traceback
Root Cause Analysis (optional)
No response
Proposed Fix (optional)
No response
Are you willing to submit a PR for this?