Side queries on the fast model use the main model's per-model settings

### What happened?

I configured my main model and my fast model separately under `modelProviders` in `~/.qwen/settings.json`, including model-specific options like `extra_body.enable_thinking`. My main model is a thinking model and I want it to think; my fast model is configured to **never** think (its entry sets `extra_body.enable_thinking: false` and `extra_body.thinking.type: "disabled"`).

Despite that, side queries that run on the fast model (e.g. session-title generation, recap, tool-use summary) come back with `reasoning_content` populated. The fast model is clearly thinking even though my settings tell it not to.

When I checked the captured request body, I noticed that the per-model settings configured for my fast model are not the ones being applied. The fast model's request inherits the **main model's** settings instead — only the model id in the URL/body is swapped, everything else (`extra_body`, sampling params, reasoning config, base URL, API key) carries over from whatever the main model has.

In my case, that means side queries to `deepseek-v4-flash` are sent with my main model's `enable_thinking: true`, which is the exact opposite of what `deepseek-v4-flash`'s own entry says. The model dutifully thinks, returns `reasoning_content`, and the side query takes longer and costs more than it should.

### What did you expect to happen?

When a side query runs on the fast model, the request should use the **fast model's** per-model settings — the same `extra_body`, sampling params, and reasoning config that the fast model would use if I selected it as my main model. The fast model entry in `modelProviders` should be the source of truth for how the fast model is called, regardless of whether the call originates from a main turn, a title generator, a recap, or any other side query.

In other words: each model's `modelProviders` entry should fully describe how that model is called. Today, only the main model's entry is consulted; the fast model's entry is effectively ignored except for its model id.

This also matters beyond the thinking/no-thinking case:

- If the fast model lives on a different provider/base URL than the main model, side queries today would silently go to the wrong endpoint with the wrong key. It only "works" if both models happen to share the same base URL and API key (as in my setup, since both are on Dashscope).
- If a user sets different `samplingParams` per model (e.g. lower `temperature` for the fast model to reduce variance on summaries), those settings are ignored on side queries.

### Steps to reproduce

1. In `~/.qwen/settings.json`, set `model.name` to a thinking model that defaults to thinking-on (e.g. a Qwen3 or DeepSeek thinking-mode model). Give its `modelProviders` entry `extra_body: { enable_thinking: true }`.
2. Set `fastModel` to a different model whose `modelProviders` entry has `extra_body: { enable_thinking: false }` (or `thinking: { type: "disabled" }`).
3. Run an interactive session, send two short messages so the auto-title generator triggers, then exit. (Or run any other side-query path: `/rename --auto`, recap, etc.)
4. Observe: the fast model's response for the title call contains `reasoning_content`, despite that model's entry disabling thinking.

### Why this matters

Two practical consequences:

- **Cost and latency.** Side queries are intended to be cheap and fast. When they accidentally inherit "think first" from the main model, the fast model burns its small token budget on reasoning, which can make the request slower than the main turn it's supporting and occasionally truncate the actual answer.
- **User trust in settings.** Putting per-model config under `modelProviders.<authType>[].generationConfig` strongly implies that those settings apply whenever that model is used. The current behavior silently overrides them on every side query, which is hard to discover without inspecting outgoing traffic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Side queries on the fast model use the main model's per-model settings #3765

What happened?

What did you expect to happen?

Steps to reproduce

Why this matters

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Side queries on the fast model use the main model's per-model settings #3765

Description

What happened?

What did you expect to happen?

Steps to reproduce

Why this matters

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions