[Feature]: Preserved thinking for GLM models when the inference provider supports it.

### Problem or Use Case

The GLM 5 series (and to a lesser degree the 4.7) has been trained in a ‘preserved thinking' mode for agentic use.

In that mode, interleaved reasoning traces in between tool calls are retained to serve as a short term memory for persistence of intent in long tool call chains, it's supposed to significantly improve long session capabilities and while it does increase (cached) input tokens and context use, if it does limit the number of necessary tool calls necessary to a task, it’s likely worthwhile.

While this is a feature that is intrinsic to the model, I’d only activate it on inference providers that publicize supporting it, it may conflict with prefix caching hit rate on providers who don’t.


### Proposed Solution

Default preserved thinking on supported models if they are directly provided by z.ai.

### Alternatives Considered

_No response_

### Feature Type

Performance / reliability

### Scope

Small (single file, < 50 lines)

### Contribution

- [x] I'd like to implement this myself and submit a PR

### Debug Report (optional)

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Preserved thinking for GLM models when the inference provider supports it. #11483

Problem or Use Case

Proposed Solution

Alternatives Considered

Feature Type

Scope

Contribution

Debug Report (optional)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature]: Preserved thinking for GLM models when the inference provider supports it. #11483

Description

Problem or Use Case

Proposed Solution

Alternatives Considered

Feature Type

Scope

Contribution

Debug Report (optional)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions