Skip to content

Disable reasoning on all fastModel side queries (follow-up to #3759) #3760

@tanzhenxin

Description

@tanzhenxin

What would you like to be added?

When a side query runs on fastModel, reasoning should be disabled by default. The forked-agent path already does this. The remaining fastModel consumers do not, and they should — they are all small, bounded, latency-sensitive tasks where reasoning provides no value.

Sites in scope:

Consumer Workload Output budget
Session recap Generate <recap>…</recap> summary 300 tokens
Session title Schema-constrained {title} JSON 100 tokens
Tool-use summary Short label for a completed tool batch 60 tokens
Rename (kebab-case) 2-4 word kebab-case session name very short
Auto-memory recall selector Schema-constrained list of filenames (proposed in #3759) small

Why is this needed?

Three converging reasons make this consistent across the table:

The prompts already say "don't reason." Several of the system prompts in these consumers literally instruct the model to skip reasoning and preamble. Disabling reasoning at the API level is the principled version of the same intent. Today we rely on the prompt to keep the model in line, which capable thinking models routinely ignore.

The post-processors strip reasoning anyway. The recap path extracts only the content inside <recap>…</recap> and discards everything before it — i.e. the leaked reasoning. We are paying for tokens that get thrown away.

The output budgets are tiny. Tool-use summary caps at 60 tokens, title at 100, recap at 300. On a reasoning-heavy fast model, the reasoning trace alone can blow the budget before any structured output is produced — the same failure mode as #3759 (auto-memory recall selector aborting after 5 seconds because the main model spends its time thinking).

The rename command is the most pointed example: the source already has a comment that "doesn't need main-model reasoning," yet the code still does not signal that to the API.

Additional context

Quality risk. Title and recap quality could in principle degrade slightly on borderline conversations when reasoning is disabled. In practice these tasks are simple enough that this is unlikely to matter, and they are best-effort cosmetic features — the win in latency, cost, and reliability outweighs the marginal quality risk.

Provider caveat. Some thinking models (e.g. deepseek-reasoner) cannot have thinking disabled at all; the existing pipeline already documents this. Users who choose such a model as their fastModel will not see the benefit, but nothing will break.

Adjacent follow-ups (out of scope). A small number of side queries — memory governance, forget-selection, sub-agent spec generation, next-speaker classifier — currently default to the main model rather than fastModel. They have the same shape and would benefit from the same combined treatment (use fastModel + disable reasoning), but that is a separate cleanup and should not be bundled into this issue.

Relationship to #3759. That issue is a bug report for the auto-memory recall selector specifically — it currently uses the main model with a 5-second deadline and aborts every turn. The fix there pulls the selector onto fastModel. This issue is the broader cleanup: once the selector is on fastModel, it should also disable reasoning, and the same disable should be applied to the four other consumers that already use fastModel today.

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions