[Bug]: NVIDIA Build API modle z-ai/glm4.7 returns model hit max output tokens

### Bug Description

When usingNVIDIA Build provider with model z-ai/glm4.7 , the API returns model hit max output tokens error, The model ID works on OpenClaw, but Hermes cannot callit successfully.

### Steps to Reproduce

1.Run hermes setup and configure custom with NVIDIA Build API key

2.Select model z-ai/glm4.7

3.Try to chat: hernes chat or via gateway

4.Error: ⚠️  Response truncated (finish_reason='length') - model hit max output tokens
 ⚠️  Response truncated (finish_reason='length') - model hit max output tokens
 ⚠️  Response truncated (finish_reason='length') - model hit max output tokens
   ─  ⚕ Hermes  ───────────────────────────────────────────────────────────── 
                                                                              
     Error: Response remained truncated after 3 continuation attempts   
5.return to Ollama（cloud） model minimax-m2.7 ，changing nothing，it works

### Expected Behavior

⚠️  Response truncated (finish_reason='length') - model hit max output tokens
 ⚠️  Response truncated (finish_reason='length') - model hit max output tokens
 ⚠️  Response truncated (finish_reason='length') - model hit max output tokens
   ─  ⚕ Hermes  ───────────────────────────────────────────────────────────── 
                                                                              
     Error: Response remained truncated after 3 continuation attempts   

### Actual Behavior

Configuration (configyaml, env, hermes setup)

### Affected Component

Setup / Installation

### Messaging Platform (if gateway-related)

_No response_

### Operating System

docker in synology（DSM 7.2.2-72806） docker

### Python Version

3.13.5

### Hermes Version

Hermes Agent v0.9.0 (v2026.4.13)

### Relevant Logs / Traceback

```shell
⚠️  Response truncated (finish_reason='length') - model hit max output tokens
 ⚠️  Response truncated (finish_reason='length') - model hit max output tokens
 ⚠️  Response truncated (finish_reason='length') - model hit max output tokens
   ─  ⚕ Hermes  ───────────────────────────────────────────────────────────── 
                                                                              
     Error: Response remained truncated after 3 continuation attempts
```

### Root Cause Analysis (optional)

_No response_

### Proposed Fix (optional)

_No response_

### Are you willing to submit a PR for this?

- [ ] I'd like to fix this myself and submit a PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: NVIDIA Build API modle z-ai/glm4.7 returns model hit max output tokens #9372

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Operating System

Python Version

Hermes Version

Relevant Logs / Traceback

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug]: NVIDIA Build API modle z-ai/glm4.7 returns model hit max output tokens #9372

Description

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Operating System

Python Version

Hermes Version

Relevant Logs / Traceback

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions