Z.AI / GLM via `zai` provider never returns reasoning_content — Hermes sends `extra_body.reasoning` (OpenRouter-style) but Z.AI expects `extra_body.thinking={"type":"enabled"}`

## Summary

Hermes' `zai` provider successfully calls Z.AI's GLM models (e.g. `glm-5.1` via the
coding plan endpoint `https://open.bigmodel.cn/api/coding/paas/v4`), but
`reasoning_content` is **always empty** — even on questions that clearly trigger
chain-of-thought when the same model is hit directly.

Root cause: Hermes signals "reasoning capable" for `open.bigmodel.cn` in
`_supports_reasoning_extra_body` (run_agent.py:7501), which causes the chat-
completions transport to inject:

```python
extra_body["reasoning"] = {"enabled": True, "effort": "medium"}
```

But Z.AI's API doesn't recognize the OpenRouter-style `reasoning` field. It
silently ignores it. Z.AI's documented thinking parameter is:

```python
extra_body["thinking"] = {"type": "enabled"}
```

(same shape as the existing `is_kimi` branch in chat_completions.py:232-240).

Without `thinking` set, GLM-5.1 / GLM-4.6 / GLM-4.5 produce regular completions
with no `reasoning_content` deltas in streaming responses, and no
`message.reasoning_content` field in non-streaming.

## Reproduction

Direct cURL against the same endpoint, same model, same payload:

```bash
curl -sS https://open.bigmodel.cn/api/coding/paas/v4/chat/completions \
  -H "Authorization: Bearer \$GLM_API_KEY" \
  -d '{
    "model": "glm-5.1",
    "messages": [{"role":"user","content":"Xiao Ming has 3 apples at \$2 each and 2 lbs of bananas at \$5/lb. Walk through the math step by step."}],
    "thinking": {"type": "enabled"},
    "stream": true
  }'
```

→ 542 streamed \`reasoning_content\` chunks + 118 \`content\` chunks.

Same payload through Hermes' agent loop with the same model and endpoint:
→ 0 \`reasoning_content\` chunks, only \`content\`.

The only difference between the working cURL and Hermes' wire payload is the
\`thinking\` field. Verified by dumping Hermes' actual \`stream_kwargs\` right
before the SDK call and replaying it via the OpenAI Python SDK 2.32.0 — adding
\`extra_body={\"thinking\": {\"type\": \"enabled\"}}\` to the same payload
immediately produces hundreds of \`reasoning_content\` deltas.

## Proposed fix

Mirror the \`is_kimi\` branch for Z.AI in \`chat_completions.py\`:

1. In \`run_agent.py\` (next to \`_is_kimi\`), compute:
   \`\`\`python
   _is_zai = base_url_host_matches(self._base_url_lower, \"open.bigmodel.cn\")
   \`\`\`
   Pass \`is_zai=_is_zai\` to \`build_kwargs\`.

2. In \`chat_completions.py\` \`build_kwargs\`, after the \`is_kimi\` block:
   \`\`\`python
   if params.get(\"is_zai\", False):
       _zai_thinking_enabled = True
       if reasoning_config and isinstance(reasoning_config, dict):
           if reasoning_config.get(\"enabled\") is False:
               _zai_thinking_enabled = False
       extra_body[\"thinking\"] = {
           \"type\": \"enabled\" if _zai_thinking_enabled else \"disabled\",
       }
   \`\`\`

3. Optionally, exclude \`open.bigmodel.cn\` from \`_supports_reasoning_extra_body\`
   so the no-op \`reasoning\` field is no longer sent (cosmetic — Z.AI ignores it
   anyway, but it pollutes the wire payload).

This makes \`reasoning_config.enabled = False\` correctly disable thinking on
Z.AI, mirroring the Kimi behavior. Honors the same on/off semantics. Effort
levels could be added later — Z.AI's docs don't currently document an effort
scalar for the \`thinking\` parameter.

## Environment
- Hermes commit: \`f67a61dc\` (recent main)
- Provider: \`zai\`
- Endpoint: \`https://open.bigmodel.cn/api/coding/paas/v4\` (coding plan)
- Models tested: \`glm-5.1\`
- OpenAI SDK: \`2.32.0\`

## References
- [Z.AI thinking docs](https://docs.bigmodel.cn/cn/guide/capabilities/thinking)
- [GLM-5.1 model card](https://docs.bigmodel.cn/cn/guide/models/text/glm-5.1)
- [Coding plan GLM-5.1 setup](https://docs.bigmodel.cn/cn/coding-plan/using5-1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Z.AI / GLM via `zai` provider never returns reasoning_content — Hermes sends `extra_body.reasoning` (OpenRouter-style) but Z.AI expects `extra_body.thinking={"type":"enabled"}` #16533

Summary

Reproduction

Proposed fix

Environment

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Z.AI / GLM via zai provider never returns reasoning_content — Hermes sends extra_body.reasoning (OpenRouter-style) but Z.AI expects extra_body.thinking={"type":"enabled"} #16533

Description

Summary

Reproduction

Proposed fix

Environment

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Z.AI / GLM via `zai` provider never returns reasoning_content — Hermes sends `extra_body.reasoning` (OpenRouter-style) but Z.AI expects `extra_body.thinking={"type":"enabled"}` #16533