Skip to content

feat(agent): add z.ai/GLM-5 preserved thinking support#11494

Open
neuneu2k wants to merge 1 commit into
NousResearch:mainfrom
neuneu2k:feature/glm-preserved-thinking
Open

feat(agent): add z.ai/GLM-5 preserved thinking support#11494
neuneu2k wants to merge 1 commit into
NousResearch:mainfrom
neuneu2k:feature/glm-preserved-thinking

Conversation

@neuneu2k

@neuneu2k neuneu2k commented Apr 17, 2026

Copy link
Copy Markdown

Enable z.ai/Zhipu GLM-5.x and GLM-4.7 preserved thinking mode for multi-turn agent loops.

Three changes in run_agent.py:

  1. _is_zai_direct() helper — detects zai provider or known z.ai/bigmodel endpoint URLs (api.z.ai, open.bigmodel.cn).

  2. _build_api_kwargs() — injects thinking parameter in extra_body for GLM-5/4.7 models:

    • Default: {type: enabled, compact_history: false} (preserved thinking)
    • reasoning_config.enabled=false → {type: disabled}
    • GLM-4.6/4.5 excluded (they auto-determine thinking)
  3. Message sanitization — re-injects reasoning_content on assistant messages for z.ai so multi-turn reasoning continuity works with compact_history=false.

Response-side extraction was already handled by the generic _extract_reasoning() method (checks reasoning_content field).

Tests: 19 new tests covering detection, parameter injection, config gating, and multi-turn passthrough.

What does this PR do?

The GLM 5 family, and to a lesser degree the 4.7 line, has been trained on preserved interleaved thinking, It's supposed to improve chained tool calling by keeping the reasoning steps in context instead as a short term memory.

This PR enables preserved thinking mode on z.ai models if and only if they are served directly from their inference endpoints.

Related Issue

Fixes Preserved thinking for GLM models when the inference provider supports it.

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: Debian GNU/Linux 12 (bookworm)

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

@neuneu2k neuneu2k marked this pull request as ready for review April 17, 2026 08:49
@neuneu2k

Copy link
Copy Markdown
Author

I haven't done a pull request in github in ages, my apologies for the quality of the paperwork.

Enable z.ai/Zhipu GLM-5.x and GLM-4.7 preserved thinking mode for
multi-turn agent loops.

Three changes in run_agent.py:

1. _is_zai_direct() helper — detects zai provider or known z.ai/bigmodel
   endpoint URLs (api.z.ai, open.bigmodel.cn).

2. _build_api_kwargs() — injects thinking parameter in extra_body
   for GLM-5/4.7 models:
   - Default: {type: enabled, compact_history: false} (preserved thinking)
   - reasoning_config.enabled=false → {type: disabled}
   - GLM-4.6/4.5 excluded (they auto-determine thinking)

3. Message sanitization — re-injects reasoning_content on assistant
   messages for z.ai so multi-turn reasoning continuity works with
   compact_history=false.

Response-side extraction was already handled by the generic
_extract_reasoning() method (checks reasoning_content field).

Tests: 19 new tests covering detection, parameter injection, config
gating, and multi-turn passthrough.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists provider/zai ZAI provider type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Preserved thinking for GLM models when the inference provider supports it.

2 participants