Summary
Hermes' zai provider successfully calls Z.AI's GLM models (e.g. glm-5.1 via the
coding plan endpoint https://open.bigmodel.cn/api/coding/paas/v4), but
reasoning_content is always empty — even on questions that clearly trigger
chain-of-thought when the same model is hit directly.
Root cause: Hermes signals "reasoning capable" for open.bigmodel.cn in
_supports_reasoning_extra_body (run_agent.py:7501), which causes the chat-
completions transport to inject:
extra_body["reasoning"] = {"enabled": True, "effort": "medium"}
But Z.AI's API doesn't recognize the OpenRouter-style reasoning field. It
silently ignores it. Z.AI's documented thinking parameter is:
extra_body["thinking"] = {"type": "enabled"}
(same shape as the existing is_kimi branch in chat_completions.py:232-240).
Without thinking set, GLM-5.1 / GLM-4.6 / GLM-4.5 produce regular completions
with no reasoning_content deltas in streaming responses, and no
message.reasoning_content field in non-streaming.
Reproduction
Direct cURL against the same endpoint, same model, same payload:
curl -sS https://open.bigmodel.cn/api/coding/paas/v4/chat/completions \
-H "Authorization: Bearer \$GLM_API_KEY" \
-d '{
"model": "glm-5.1",
"messages": [{"role":"user","content":"Xiao Ming has 3 apples at \$2 each and 2 lbs of bananas at \$5/lb. Walk through the math step by step."}],
"thinking": {"type": "enabled"},
"stream": true
}'
→ 542 streamed `reasoning_content` chunks + 118 `content` chunks.
Same payload through Hermes' agent loop with the same model and endpoint:
→ 0 `reasoning_content` chunks, only `content`.
The only difference between the working cURL and Hermes' wire payload is the
`thinking` field. Verified by dumping Hermes' actual `stream_kwargs` right
before the SDK call and replaying it via the OpenAI Python SDK 2.32.0 — adding
`extra_body={"thinking": {"type": "enabled"}}` to the same payload
immediately produces hundreds of `reasoning_content` deltas.
Proposed fix
Mirror the `is_kimi` branch for Z.AI in `chat_completions.py`:
-
In `run_agent.py` (next to `_is_kimi`), compute:
```python
_is_zai = base_url_host_matches(self._base_url_lower, "open.bigmodel.cn")
```
Pass `is_zai=_is_zai` to `build_kwargs`.
-
In `chat_completions.py` `build_kwargs`, after the `is_kimi` block:
```python
if params.get("is_zai", False):
_zai_thinking_enabled = True
if reasoning_config and isinstance(reasoning_config, dict):
if reasoning_config.get("enabled") is False:
_zai_thinking_enabled = False
extra_body["thinking"] = {
"type": "enabled" if _zai_thinking_enabled else "disabled",
}
```
-
Optionally, exclude `open.bigmodel.cn` from `_supports_reasoning_extra_body`
so the no-op `reasoning` field is no longer sent (cosmetic — Z.AI ignores it
anyway, but it pollutes the wire payload).
This makes `reasoning_config.enabled = False` correctly disable thinking on
Z.AI, mirroring the Kimi behavior. Honors the same on/off semantics. Effort
levels could be added later — Z.AI's docs don't currently document an effort
scalar for the `thinking` parameter.
Environment
References
Summary
Hermes'
zaiprovider successfully calls Z.AI's GLM models (e.g.glm-5.1via thecoding plan endpoint
https://open.bigmodel.cn/api/coding/paas/v4), butreasoning_contentis always empty — even on questions that clearly triggerchain-of-thought when the same model is hit directly.
Root cause: Hermes signals "reasoning capable" for
open.bigmodel.cnin_supports_reasoning_extra_body(run_agent.py:7501), which causes the chat-completions transport to inject:
But Z.AI's API doesn't recognize the OpenRouter-style
reasoningfield. Itsilently ignores it. Z.AI's documented thinking parameter is:
(same shape as the existing
is_kimibranch in chat_completions.py:232-240).Without
thinkingset, GLM-5.1 / GLM-4.6 / GLM-4.5 produce regular completionswith no
reasoning_contentdeltas in streaming responses, and nomessage.reasoning_contentfield in non-streaming.Reproduction
Direct cURL against the same endpoint, same model, same payload:
→ 542 streamed `reasoning_content` chunks + 118 `content` chunks.
Same payload through Hermes' agent loop with the same model and endpoint:
→ 0 `reasoning_content` chunks, only `content`.
The only difference between the working cURL and Hermes' wire payload is the
`thinking` field. Verified by dumping Hermes' actual `stream_kwargs` right
before the SDK call and replaying it via the OpenAI Python SDK 2.32.0 — adding
`extra_body={"thinking": {"type": "enabled"}}` to the same payload
immediately produces hundreds of `reasoning_content` deltas.
Proposed fix
Mirror the `is_kimi` branch for Z.AI in `chat_completions.py`:
In `run_agent.py` (next to `_is_kimi`), compute:
```python
_is_zai = base_url_host_matches(self._base_url_lower, "open.bigmodel.cn")
```
Pass `is_zai=_is_zai` to `build_kwargs`.
In `chat_completions.py` `build_kwargs`, after the `is_kimi` block:
```python
if params.get("is_zai", False):
_zai_thinking_enabled = True
if reasoning_config and isinstance(reasoning_config, dict):
if reasoning_config.get("enabled") is False:
_zai_thinking_enabled = False
extra_body["thinking"] = {
"type": "enabled" if _zai_thinking_enabled else "disabled",
}
```
Optionally, exclude `open.bigmodel.cn` from `_supports_reasoning_extra_body`
so the no-op `reasoning` field is no longer sent (cosmetic — Z.AI ignores it
anyway, but it pollutes the wire payload).
This makes `reasoning_config.enabled = False` correctly disable thinking on
Z.AI, mirroring the Kimi behavior. Honors the same on/off semantics. Effort
levels could be added later — Z.AI's docs don't currently document an effort
scalar for the `thinking` parameter.
Environment
References