Skip to content

[Feature]: Preserved thinking for GLM models when the inference provider supports it. #11483

@neuneu2k

Description

@neuneu2k

Problem or Use Case

The GLM 5 series (and to a lesser degree the 4.7) has been trained in a ‘preserved thinking' mode for agentic use.

In that mode, interleaved reasoning traces in between tool calls are retained to serve as a short term memory for persistence of intent in long tool call chains, it's supposed to significantly improve long session capabilities and while it does increase (cached) input tokens and context use, if it does limit the number of necessary tool calls necessary to a task, it’s likely worthwhile.

While this is a feature that is intrinsic to the model, I’d only activate it on inference providers that publicize supporting it, it may conflict with prefix caching hit rate on providers who don’t.

Proposed Solution

Default preserved thinking on supported models if they are directly provided by z.ai.

Alternatives Considered

No response

Feature Type

Performance / reliability

Scope

Small (single file, < 50 lines)

Contribution

  • I'd like to implement this myself and submit a PR

Debug Report (optional)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions