Skip to content

Z.AI / GLM via zai provider never returns reasoning_content — Hermes sends extra_body.reasoning (OpenRouter-style) but Z.AI expects extra_body.thinking={"type":"enabled"} #16533

@nicolas-hey

Description

@nicolas-hey

Summary

Hermes' zai provider successfully calls Z.AI's GLM models (e.g. glm-5.1 via the
coding plan endpoint https://open.bigmodel.cn/api/coding/paas/v4), but
reasoning_content is always empty — even on questions that clearly trigger
chain-of-thought when the same model is hit directly.

Root cause: Hermes signals "reasoning capable" for open.bigmodel.cn in
_supports_reasoning_extra_body (run_agent.py:7501), which causes the chat-
completions transport to inject:

extra_body["reasoning"] = {"enabled": True, "effort": "medium"}

But Z.AI's API doesn't recognize the OpenRouter-style reasoning field. It
silently ignores it. Z.AI's documented thinking parameter is:

extra_body["thinking"] = {"type": "enabled"}

(same shape as the existing is_kimi branch in chat_completions.py:232-240).

Without thinking set, GLM-5.1 / GLM-4.6 / GLM-4.5 produce regular completions
with no reasoning_content deltas in streaming responses, and no
message.reasoning_content field in non-streaming.

Reproduction

Direct cURL against the same endpoint, same model, same payload:

curl -sS https://open.bigmodel.cn/api/coding/paas/v4/chat/completions \
  -H "Authorization: Bearer \$GLM_API_KEY" \
  -d '{
    "model": "glm-5.1",
    "messages": [{"role":"user","content":"Xiao Ming has 3 apples at \$2 each and 2 lbs of bananas at \$5/lb. Walk through the math step by step."}],
    "thinking": {"type": "enabled"},
    "stream": true
  }'

→ 542 streamed `reasoning_content` chunks + 118 `content` chunks.

Same payload through Hermes' agent loop with the same model and endpoint:
→ 0 `reasoning_content` chunks, only `content`.

The only difference between the working cURL and Hermes' wire payload is the
`thinking` field. Verified by dumping Hermes' actual `stream_kwargs` right
before the SDK call and replaying it via the OpenAI Python SDK 2.32.0 — adding
`extra_body={"thinking": {"type": "enabled"}}` to the same payload
immediately produces hundreds of `reasoning_content` deltas.

Proposed fix

Mirror the `is_kimi` branch for Z.AI in `chat_completions.py`:

  1. In `run_agent.py` (next to `_is_kimi`), compute:
    ```python
    _is_zai = base_url_host_matches(self._base_url_lower, "open.bigmodel.cn")
    ```
    Pass `is_zai=_is_zai` to `build_kwargs`.

  2. In `chat_completions.py` `build_kwargs`, after the `is_kimi` block:
    ```python
    if params.get("is_zai", False):
    _zai_thinking_enabled = True
    if reasoning_config and isinstance(reasoning_config, dict):
    if reasoning_config.get("enabled") is False:
    _zai_thinking_enabled = False
    extra_body["thinking"] = {
    "type": "enabled" if _zai_thinking_enabled else "disabled",
    }
    ```

  3. Optionally, exclude `open.bigmodel.cn` from `_supports_reasoning_extra_body`
    so the no-op `reasoning` field is no longer sent (cosmetic — Z.AI ignores it
    anyway, but it pollutes the wire payload).

This makes `reasoning_config.enabled = False` correctly disable thinking on
Z.AI, mirroring the Kimi behavior. Honors the same on/off semantics. Effort
levels could be added later — Z.AI's docs don't currently document an effort
scalar for the `thinking` parameter.

Environment

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/agentCore agent loop, run_agent.py, prompt builderprovider/zaiZAI providertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions