server: real-time reasoning interruption via control endpoint#23971
Conversation
|
I'm going to improve it by making the button |
sure, let's go with this |
f63742c to
6d67931
Compare
Builds on the manual reasoning budget trigger from ggml-org#23949. Adds a CONTROL task that mirrors the CANCEL path on the live slot and calls common_sampler_reasoning_budget_force to end thinking mid-generation. POST /v1/chat/completions/control with { id_slot, action }, opt-in reasoning_control arms the budget sampler on demand. Router and single model. Minimal WebUI button as a skeleton for further UI work.
Add isReasoning to the chat store, mirroring the isLoading pattern: per conversation map, private setter, public accessor and reactive export. Set from the stream callbacks, true on reasoning chunks, false on the first content chunk, reset on stream end and resynced on conversation switch. The skip button now keys off isReasoning so it shows only during the thinking phase, not the whole generation.
Move the chat completion routes, the slots route and the reasoning control action out of chat.service into api-endpoints and a dedicated control-actions module. No behavior change, drops the magic strings so the control protocol has a single source of truth.
6d67931 to
f617ae0
Compare
Address @ngxson review on the control endpoint. Switch from id_slot to the chat completion id to avoid a TOCTOU: the slot can be reassigned between the lookup and the control request, so matching the live completion (oaicompat_cmpl_id) is safe and a finished one simply matches nothing. Rename the action to reasoning_end, guard it on the reasoning_control flag of the target slot, and reduce the response to {success} with an optional message.
Keep the streamed completion id on the message and post it back to the
control endpoint instead of probing /slots. Drops the slot discovery
and the TOCTOU that came with it. Action renamed to reasoning_end,
response read as {success}.
ngxson
left a comment
There was a problem hiding this comment.
remember to also update server docs to add the new endpoint
many AI-generated comments are for AI to response to your prompt, they don't have real technical values. please consider removing all comments about TOCTOU
Move the control fields into task_params and drop the redundant comments on the control path.
|
also need to update server docs |
Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
Per @allozaur review, clearer name for the streamed completion id.
|
I need to fix a regression |
The webui streams through the agentic flow, which relayed onModel but not onCompletionId, so the completion id never reached the message and the control request was never sent. Relay it through the flow and its callbacks type, declare id on the chunk type, and log an explicit error when the button fires without a usable id.
The model is a property of the completion, so read it from the streaming message like the id, not from the model dropdown which is unrelated UI state. Makes the request self-consistent by construction instead of just unlikely to drift.
|
This time it's reliable and semantically more correct :
|
|
Is there an option to disable this? This new feature (which is nice and cleanly implemented) causes token generation to run 20t/s slower on my hardware. As aldehir mentioned in #23949 it does cause performance issues for some people. Looked around the webui for an option to toggle it off, but didn't see one. |
Overview
Builds on the manual reasoning budget trigger from #23949. Adds a CONTROL task that mirrors the CANCEL path on the live slot and calls common_sampler_reasoning_budget_force to end thinking mid-generation. POST /v1/chat/completions/control with { id_slot, action }, opt-in reasoning_control arms the budget sampler on demand. Router and single model.
Minimal WebUI button as a skeleton for further UI work. cc @allozaur
Additional information
Video
think-stop.mp4
Closes #23944
Requirements