Bug Description
When usingNVIDIA Build provider with model z-ai/glm4.7 , the API returns model hit max output tokens error, The model ID works on OpenClaw, but Hermes cannot callit successfully.
Steps to Reproduce
1.Run hermes setup and configure custom with NVIDIA Build API key
2.Select model z-ai/glm4.7
3.Try to chat: hernes chat or via gateway
4.Error: ⚠️ Response truncated (finish_reason='length') - model hit max output tokens
⚠️ Response truncated (finish_reason='length') - model hit max output tokens
⚠️ Response truncated (finish_reason='length') - model hit max output tokens
─ ⚕ Hermes ─────────────────────────────────────────────────────────────
Error: Response remained truncated after 3 continuation attempts
5.return to Ollama(cloud) model minimax-m2.7 ,changing nothing,it works
Expected Behavior
⚠️ Response truncated (finish_reason='length') - model hit max output tokens
⚠️ Response truncated (finish_reason='length') - model hit max output tokens
⚠️ Response truncated (finish_reason='length') - model hit max output tokens
─ ⚕ Hermes ─────────────────────────────────────────────────────────────
Error: Response remained truncated after 3 continuation attempts
Actual Behavior
Configuration (configyaml, env, hermes setup)
Affected Component
Setup / Installation
Messaging Platform (if gateway-related)
No response
Operating System
docker in synology(DSM 7.2.2-72806) docker
Python Version
3.13.5
Hermes Version
Hermes Agent v0.9.0 (v2026.4.13)
Relevant Logs / Traceback
⚠️ Response truncated (finish_reason='length') - model hit max output tokens
⚠️ Response truncated (finish_reason='length') - model hit max output tokens
⚠️ Response truncated (finish_reason='length') - model hit max output tokens
─ ⚕ Hermes ─────────────────────────────────────────────────────────────
Error: Response remained truncated after 3 continuation attempts
Root Cause Analysis (optional)
No response
Proposed Fix (optional)
No response
Are you willing to submit a PR for this?
Bug Description
When usingNVIDIA Build provider with model z-ai/glm4.7 , the API returns model hit max output tokens error, The model ID works on OpenClaw, but Hermes cannot callit successfully.
Steps to Reproduce
1.Run hermes setup and configure custom with NVIDIA Build API key
2.Select model z-ai/glm4.7
3.Try to chat: hernes chat or via gateway
4.Error:⚠️ Response truncated (finish_reason='length') - model hit max output tokens
⚠️ Response truncated (finish_reason='length') - model hit max output tokens
⚠️ Response truncated (finish_reason='length') - model hit max output tokens
─ ⚕ Hermes ─────────────────────────────────────────────────────────────
5.return to Ollama(cloud) model minimax-m2.7 ,changing nothing,it works
Expected Behavior
─ ⚕ Hermes ─────────────────────────────────────────────────────────────
Actual Behavior
Configuration (configyaml, env, hermes setup)
Affected Component
Setup / Installation
Messaging Platform (if gateway-related)
No response
Operating System
docker in synology(DSM 7.2.2-72806) docker
Python Version
3.13.5
Hermes Version
Hermes Agent v0.9.0 (v2026.4.13)
Relevant Logs / Traceback
Root Cause Analysis (optional)
No response
Proposed Fix (optional)
No response
Are you willing to submit a PR for this?