What would you like to be added?
When a side query runs on fastModel, reasoning should be disabled by default. The forked-agent path already does this. The remaining fastModel consumers do not, and they should — they are all small, bounded, latency-sensitive tasks where reasoning provides no value.
Sites in scope:
| Consumer |
Workload |
Output budget |
| Session recap |
Generate <recap>…</recap> summary |
300 tokens |
| Session title |
Schema-constrained {title} JSON |
100 tokens |
| Tool-use summary |
Short label for a completed tool batch |
60 tokens |
| Rename (kebab-case) |
2-4 word kebab-case session name |
very short |
| Auto-memory recall selector |
Schema-constrained list of filenames (proposed in #3759) |
small |
Why is this needed?
Three converging reasons make this consistent across the table:
The prompts already say "don't reason." Several of the system prompts in these consumers literally instruct the model to skip reasoning and preamble. Disabling reasoning at the API level is the principled version of the same intent. Today we rely on the prompt to keep the model in line, which capable thinking models routinely ignore.
The post-processors strip reasoning anyway. The recap path extracts only the content inside <recap>…</recap> and discards everything before it — i.e. the leaked reasoning. We are paying for tokens that get thrown away.
The output budgets are tiny. Tool-use summary caps at 60 tokens, title at 100, recap at 300. On a reasoning-heavy fast model, the reasoning trace alone can blow the budget before any structured output is produced — the same failure mode as #3759 (auto-memory recall selector aborting after 5 seconds because the main model spends its time thinking).
The rename command is the most pointed example: the source already has a comment that "doesn't need main-model reasoning," yet the code still does not signal that to the API.
Additional context
Quality risk. Title and recap quality could in principle degrade slightly on borderline conversations when reasoning is disabled. In practice these tasks are simple enough that this is unlikely to matter, and they are best-effort cosmetic features — the win in latency, cost, and reliability outweighs the marginal quality risk.
Provider caveat. Some thinking models (e.g. deepseek-reasoner) cannot have thinking disabled at all; the existing pipeline already documents this. Users who choose such a model as their fastModel will not see the benefit, but nothing will break.
Adjacent follow-ups (out of scope). A small number of side queries — memory governance, forget-selection, sub-agent spec generation, next-speaker classifier — currently default to the main model rather than fastModel. They have the same shape and would benefit from the same combined treatment (use fastModel + disable reasoning), but that is a separate cleanup and should not be bundled into this issue.
Relationship to #3759. That issue is a bug report for the auto-memory recall selector specifically — it currently uses the main model with a 5-second deadline and aborts every turn. The fix there pulls the selector onto fastModel. This issue is the broader cleanup: once the selector is on fastModel, it should also disable reasoning, and the same disable should be applied to the four other consumers that already use fastModel today.
What would you like to be added?
When a side query runs on
fastModel, reasoning should be disabled by default. The forked-agent path already does this. The remainingfastModelconsumers do not, and they should — they are all small, bounded, latency-sensitive tasks where reasoning provides no value.Sites in scope:
<recap>…</recap>summary{title}JSONWhy is this needed?
Three converging reasons make this consistent across the table:
The prompts already say "don't reason." Several of the system prompts in these consumers literally instruct the model to skip reasoning and preamble. Disabling reasoning at the API level is the principled version of the same intent. Today we rely on the prompt to keep the model in line, which capable thinking models routinely ignore.
The post-processors strip reasoning anyway. The recap path extracts only the content inside
<recap>…</recap>and discards everything before it — i.e. the leaked reasoning. We are paying for tokens that get thrown away.The output budgets are tiny. Tool-use summary caps at 60 tokens, title at 100, recap at 300. On a reasoning-heavy fast model, the reasoning trace alone can blow the budget before any structured output is produced — the same failure mode as #3759 (auto-memory recall selector aborting after 5 seconds because the main model spends its time thinking).
The rename command is the most pointed example: the source already has a comment that "doesn't need main-model reasoning," yet the code still does not signal that to the API.
Additional context
Quality risk. Title and recap quality could in principle degrade slightly on borderline conversations when reasoning is disabled. In practice these tasks are simple enough that this is unlikely to matter, and they are best-effort cosmetic features — the win in latency, cost, and reliability outweighs the marginal quality risk.
Provider caveat. Some thinking models (e.g.
deepseek-reasoner) cannot have thinking disabled at all; the existing pipeline already documents this. Users who choose such a model as theirfastModelwill not see the benefit, but nothing will break.Adjacent follow-ups (out of scope). A small number of side queries — memory governance, forget-selection, sub-agent spec generation, next-speaker classifier — currently default to the main model rather than
fastModel. They have the same shape and would benefit from the same combined treatment (usefastModel+ disable reasoning), but that is a separate cleanup and should not be bundled into this issue.Relationship to #3759. That issue is a bug report for the auto-memory recall selector specifically — it currently uses the main model with a 5-second deadline and aborts every turn. The fix there pulls the selector onto
fastModel. This issue is the broader cleanup: once the selector is onfastModel, it should also disable reasoning, and the same disable should be applied to the four other consumers that already usefastModeltoday.