feat(agent): add z.ai/GLM-5 preserved thinking support#11494
Open
neuneu2k wants to merge 1 commit into
Open
Conversation
Author
|
I haven't done a pull request in github in ages, my apologies for the quality of the paperwork. |
Enable z.ai/Zhipu GLM-5.x and GLM-4.7 preserved thinking mode for
multi-turn agent loops.
Three changes in run_agent.py:
1. _is_zai_direct() helper — detects zai provider or known z.ai/bigmodel
endpoint URLs (api.z.ai, open.bigmodel.cn).
2. _build_api_kwargs() — injects thinking parameter in extra_body
for GLM-5/4.7 models:
- Default: {type: enabled, compact_history: false} (preserved thinking)
- reasoning_config.enabled=false → {type: disabled}
- GLM-4.6/4.5 excluded (they auto-determine thinking)
3. Message sanitization — re-injects reasoning_content on assistant
messages for z.ai so multi-turn reasoning continuity works with
compact_history=false.
Response-side extraction was already handled by the generic
_extract_reasoning() method (checks reasoning_content field).
Tests: 19 new tests covering detection, parameter injection, config
gating, and multi-turn passthrough.
cab1736 to
cfa2fd0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Enable z.ai/Zhipu GLM-5.x and GLM-4.7 preserved thinking mode for multi-turn agent loops.
Three changes in run_agent.py:
_is_zai_direct() helper — detects zai provider or known z.ai/bigmodel endpoint URLs (api.z.ai, open.bigmodel.cn).
_build_api_kwargs() — injects thinking parameter in extra_body for GLM-5/4.7 models:
Message sanitization — re-injects reasoning_content on assistant messages for z.ai so multi-turn reasoning continuity works with compact_history=false.
Response-side extraction was already handled by the generic _extract_reasoning() method (checks reasoning_content field).
Tests: 19 new tests covering detection, parameter injection, config gating, and multi-turn passthrough.
What does this PR do?
The GLM 5 family, and to a lesser degree the 4.7 line, has been trained on preserved interleaved thinking, It's supposed to improve chained tool calling by keeping the reasoning steps in context instead as a short term memory.
This PR enables preserved thinking mode on z.ai models if and only if they are served directly from their inference endpoints.
Related Issue
Fixes Preserved thinking for GLM models when the inference provider supports it.
Type of Change
Code
fix(scope):,feat(scope):, etc.)pytest tests/ -qand all tests passDocumentation & Housekeeping
docs/, docstrings) — or N/Acli-config.yaml.exampleif I added/changed config keys — or N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — or N/A