server: add --reasoning-budget 0 to disable thinking (incl. qwen3 w/ enable_thinking:false)#13771
server: add --reasoning-budget 0 to disable thinking (incl. qwen3 w/ enable_thinking:false)#13771ochafik merged 13 commits intoggml-org:masterfrom
server: add --reasoning-budget 0 to disable thinking (incl. qwen3 w/ enable_thinking:false)#13771Conversation
|
yes this can be useful, I thought about it in #13272 , which is part of my idea about implementing the thinking budget. just to be less confused between |
|
Consider adding Granite's |
@CISC I hadn't seen that one, thanks for bringing this up! Strong case for support through @ngxson's #13272, the request param could override the flag then, or something. |
server: add --reasoning-format=disabled to disable thinking (incl. qwen3 w/ enable_thinking:false)server: add --reasoning-format=nothink to disable thinking (incl. qwen3 w/ enable_thinking:false)
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
common/arg.cpp
Outdated
| "controls whether thought tags are allowed and/or extracted from the response, and in which format they're returned; one of:\n" | ||
| "- none: leaves thoughts unparsed in `message.content`\n" | ||
| "- deepseek: puts thoughts in `message.reasoning_content` (except in streaming mode, which behaves as `none`)\n" | ||
| "- nothink: prevents generation of thoughts (forcibly closing thoughts tag or setting template-specific variables such as `enable_thinking: false` for Qwen3)\n" |
There was a problem hiding this comment.
doesn't feel worth adding a separate flag at this stage, wdyt?
Tbh I think we should still separate it to another flag. The format meaning it only format the response, not changing the behavior, but here nothink changes the generation behavior
There was a problem hiding this comment.
I think it's ok to just add a flag called --reasoning-budget and only support either -1 (unlimited budget) or 0 (no think) for now
server: add --reasoning-format=nothink to disable thinking (incl. qwen3 w/ enable_thinking:false)server: add --reasoning-budget to disable thinking (incl. qwen3 w/ enable_thinking:false)
server: add --reasoning-budget to disable thinking (incl. qwen3 w/ enable_thinking:false)server: add --reasoning-budget 0 to disable thinking (incl. qwen3 w/ enable_thinking:false)
|
@ngxson & @ochafik I have a question regarding the usage. Simply adding This request: Returns the following: |
|
@countzero You need to start the server with |
|
@kth8 Thank you for the hint. That indeed works now: @ngxson & @ochafik As a developer I would like to use the Suggestion: Activate |
|
Please take a look: #13877 |
|
I am not able to get reasoning-budget to work |
|
@jacekpoplawski you didn't run with |
|
does it work for you with --jinja? |

This allows disabling thinking for all supported thinking models (QwQ, DeepSeek R1 distills, Qwen3, Command R7B), when the flag
--reasoning-budget 0is set"enable_thinking": falseas extra template context variable (similar to Support jinja extra template kwargs (Qwen3 enable_thinking feature), from command line and from client #13196, which will still be very useful in general)For per-request behaviour, see #13272 (discussion on upcoming reasoning budget request param) and #13196 (support passing generic kvs).
cc/ @matteoserva
cc/ @ngxson Not sure about the slight alteration of the semantics of the CLI flag (updated docs + inline help), but doesn't feel worth adding a separate flag at this stage, wdyt?