[Bug]: NVIDIA Build API modle z-ai/glm4.7 returns Thinking Budget Exhausted

### Bug Description

When usingNVIDIA Build  provider with model z-ai/glm4.7 , the API returns Thinking Budget Exhausted error, The model ID works on  OpenClaw, but Hermes cannot callit successfully.

### Steps to Reproduce

1. Run hermes setup and configure custom with NVIDIA Build API key
2. Select model  z-ai/glm4.7
3. Try to chat: hernes chat or via gateway
4.  Error:  ⚠️  Response truncated (finish_reason='length') - model hit max output tokens
  💭 Reasoning exhausted the output token budget — no visible response was produced.
   ─  ⚕ Hermes  ───────────────────────────────────────────────────────────── 
                                                                              
     ⚠️ **Thinking Budget Exhausted**                                         
                                                                              
     The model used all its output tokens on reasoning and had none left      
     for the actual response.                                                 
                                                                              
     To fix this:                                                             
     → Lower reasoning effort: `/thinkon low` or `/thinkon minimal`           
     → Increase the output token limit: set `model.max_tokens` in             
     config.yaml
5.return to Ollama（cloud） model  minimax-m2.7  ，changing nothing，it works


### Expected Behavior

 ⚠️  Response truncated (finish_reason='length') - model hit max output tokens
  💭 Reasoning exhausted the output token budget — no visible response was produced.
   ─  ⚕ Hermes  ───────────────────────────────────────────────────────────── 
                                                                              
     ⚠️ **Thinking Budget Exhausted**                                         
                                                                              
     The model used all its output tokens on reasoning and had none left      
     for the actual response.                                                 
                                                                              
     To fix this:                                                             
     → Lower reasoning effort: `/thinkon low` or `/thinkon minimal`           
     → Increase the output token limit: set `model.max_tokens` in             
     config.yaml

### Actual Behavior

Configuration (configyaml, env, hermes setup)

### Affected Component

Setup / Installation

### Messaging Platform (if gateway-related)

_No response_

### Operating System

docker in synology（DSM 7.2.2-72806） docker

### Python Version

3.13.5

### Hermes Version

Hermes Agent v0.8.0 (2026.4.8)

### Relevant Logs / Traceback

```shell
⚠️  Response truncated (finish_reason='length') - model hit max output tokens
  💭 Reasoning exhausted the output token budget — no visible response was produced.
   ─  ⚕ Hermes  ───────────────────────────────────────────────────────────── 
                                                                              
     ⚠️ **Thinking Budget Exhausted**                                         
                                                                              
     The model used all its output tokens on reasoning and had none left      
     for the actual response.                                                 
                                                                              
     To fix this:                                                             
     → Lower reasoning effort: `/thinkon low` or `/thinkon minimal`           
     → Increase the output token limit: set `model.max_tokens` in             
     config.yaml
```

### Root Cause Analysis (optional)

_No response_

### Proposed Fix (optional)

_No response_

### Are you willing to submit a PR for this?

- [ ] I'd like to fix this myself and submit a PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: NVIDIA Build API modle z-ai/glm4.7 returns Thinking Budget Exhausted #7729

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Operating System

Python Version

Hermes Version

Relevant Logs / Traceback

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug]: NVIDIA Build API modle z-ai/glm4.7 returns Thinking Budget Exhausted #7729

Description

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Operating System

Python Version

Hermes Version

Relevant Logs / Traceback

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions