Skip to content

Side queries on the fast model use the main model's per-model settings #3765

@tanzhenxin

Description

@tanzhenxin

What happened?

I configured my main model and my fast model separately under modelProviders in ~/.qwen/settings.json, including model-specific options like extra_body.enable_thinking. My main model is a thinking model and I want it to think; my fast model is configured to never think (its entry sets extra_body.enable_thinking: false and extra_body.thinking.type: "disabled").

Despite that, side queries that run on the fast model (e.g. session-title generation, recap, tool-use summary) come back with reasoning_content populated. The fast model is clearly thinking even though my settings tell it not to.

When I checked the captured request body, I noticed that the per-model settings configured for my fast model are not the ones being applied. The fast model's request inherits the main model's settings instead — only the model id in the URL/body is swapped, everything else (extra_body, sampling params, reasoning config, base URL, API key) carries over from whatever the main model has.

In my case, that means side queries to deepseek-v4-flash are sent with my main model's enable_thinking: true, which is the exact opposite of what deepseek-v4-flash's own entry says. The model dutifully thinks, returns reasoning_content, and the side query takes longer and costs more than it should.

What did you expect to happen?

When a side query runs on the fast model, the request should use the fast model's per-model settings — the same extra_body, sampling params, and reasoning config that the fast model would use if I selected it as my main model. The fast model entry in modelProviders should be the source of truth for how the fast model is called, regardless of whether the call originates from a main turn, a title generator, a recap, or any other side query.

In other words: each model's modelProviders entry should fully describe how that model is called. Today, only the main model's entry is consulted; the fast model's entry is effectively ignored except for its model id.

This also matters beyond the thinking/no-thinking case:

  • If the fast model lives on a different provider/base URL than the main model, side queries today would silently go to the wrong endpoint with the wrong key. It only "works" if both models happen to share the same base URL and API key (as in my setup, since both are on Dashscope).
  • If a user sets different samplingParams per model (e.g. lower temperature for the fast model to reduce variance on summaries), those settings are ignored on side queries.

Steps to reproduce

  1. In ~/.qwen/settings.json, set model.name to a thinking model that defaults to thinking-on (e.g. a Qwen3 or DeepSeek thinking-mode model). Give its modelProviders entry extra_body: { enable_thinking: true }.
  2. Set fastModel to a different model whose modelProviders entry has extra_body: { enable_thinking: false } (or thinking: { type: "disabled" }).
  3. Run an interactive session, send two short messages so the auto-title generator triggers, then exit. (Or run any other side-query path: /rename --auto, recap, etc.)
  4. Observe: the fast model's response for the title call contains reasoning_content, despite that model's entry disabling thinking.

Why this matters

Two practical consequences:

  • Cost and latency. Side queries are intended to be cheap and fast. When they accidentally inherit "think first" from the main model, the fast model burns its small token budget on reasoning, which can make the request slower than the main turn it's supporting and occasionally truncate the actual answer.
  • User trust in settings. Putting per-model config under modelProviders.<authType>[].generationConfig strongly implies that those settings apply whenever that model is used. The current behavior silently overrides them on every side query, which is hard to discover without inspecting outgoing traffic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions