server: real-time reasoning interruption via control endpoint by ServeurpersoCom · Pull Request #23971 · ggml-org/llama.cpp

ServeurpersoCom · 2026-06-01T12:53:44Z

Overview

Builds on the manual reasoning budget trigger from #23949. Adds a CONTROL task that mirrors the CANCEL path on the live slot and calls common_sampler_reasoning_budget_force to end thinking mid-generation. POST /v1/chat/completions/control with { id_slot, action }, opt-in reasoning_control arms the budget sampler on demand. Router and single model.

Minimal WebUI button as a skeleton for further UI work. cc @allozaur

Additional information

Video

think-stop.mp4

Closes #23944

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES Opus 4.8 High + MCP container w/GPU

ServeurpersoCom · 2026-06-01T13:06:45Z

I'm going to improve it by making the button ~~grey or~~ disappear at the end of the reasoning inference, WTDT @allozaur, to make a event-ready skeleton and you do the design on top.

allozaur · 2026-06-01T13:15:48Z

I'm going to improve it by making the button grey or disappear at the end of the reasoning inference, WTDT @allozaur, to make a event-ready skeleton and you do the design on top.

sure, let's go with this

Builds on the manual reasoning budget trigger from ggml-org#23949. Adds a CONTROL task that mirrors the CANCEL path on the live slot and calls common_sampler_reasoning_budget_force to end thinking mid-generation. POST /v1/chat/completions/control with { id_slot, action }, opt-in reasoning_control arms the budget sampler on demand. Router and single model. Minimal WebUI button as a skeleton for further UI work.

Add isReasoning to the chat store, mirroring the isLoading pattern: per conversation map, private setter, public accessor and reactive export. Set from the stream callbacks, true on reasoning chunks, false on the first content chunk, reset on stream end and resynced on conversation switch. The skip button now keys off isReasoning so it shows only during the thinking phase, not the whole generation.

Move the chat completion routes, the slots route and the reasoning control action out of chat.service into api-endpoints and a dedicated control-actions module. No behavior change, drops the magic strings so the control protocol has a single source of truth.

@ngxson

Address @ngxson review on the control endpoint. Switch from id_slot to the chat completion id to avoid a TOCTOU: the slot can be reassigned between the lookup and the control request, so matching the live completion (oaicompat_cmpl_id) is safe and a finished one simply matches nothing. Rename the action to reasoning_end, guard it on the reasoning_control flag of the target slot, and reduce the response to {success} with an optional message.

Keep the streamed completion id on the message and post it back to the control endpoint instead of probing /slots. Drops the slot discovery and the TOCTOU that came with it. Action renamed to reasoning_end, response read as {success}.

ngxson

remember to also update server docs to add the new endpoint

many AI-generated comments are for AI to response to your prompt, they don't have real technical values. please consider removing all comments about TOCTOU

Move the control fields into task_params and drop the redundant comments on the control path.

ngxson · 2026-06-01T15:13:18Z

also need to update server docs

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

@allozaur

Per @allozaur review, clearer name for the streamed completion id.

ServeurpersoCom · 2026-06-01T17:07:17Z

I need to fix a regression ~~following the renaming / from rebasing onto latest master on my server.~~
Retrieving the conversation ID from the UI side is unreliable and the request does not always go through.

The webui streams through the agentic flow, which relayed onModel but not onCompletionId, so the completion id never reached the message and the control request was never sent. Relay it through the flow and its callbacks type, declare id on the chunk type, and log an explicit error when the button fires without a usable id.

The model is a property of the completion, so read it from the streaming message like the id, not from the model dropdown which is unrelated UI state. Makes the request self-consistent by construction instead of just unlikely to drift.

ServeurpersoCom · 2026-06-01T21:14:27Z

This time it's reliable and semantically more correct :

The ID has to be passed through the agentic flow.
The model name has to be read from the actual message being inferred.

CMay · 2026-06-02T10:18:55Z

Is there an option to disable this? This new feature (which is nice and cleanly implemented) causes token generation to run 20t/s slower on my hardware. As aldehir mentioned in #23949 it does cause performance issues for some people.

Looked around the webui for an option to toggle it off, but didn't see one.

ServeurpersoCom requested review from a team as code owners June 1, 2026 12:53

ServeurpersoCom requested review from allozaur and ngxson June 1, 2026 13:09

ngxson reviewed Jun 1, 2026

View reviewed changes

Comment thread tools/server/server-context.cpp Outdated

Comment thread tools/server/server-context.cpp Outdated

Comment thread tools/server/server-task.h Outdated

Comment thread tools/server/server-context.cpp Outdated

github-actions Bot added examples server server/ui labels Jun 1, 2026

ServeurpersoCom force-pushed the realtime-reasoning-control branch from f63742c to 6d67931 Compare June 1, 2026 14:05

ServeurpersoCom added 3 commits June 1, 2026 16:09

ServeurpersoCom force-pushed the realtime-reasoning-control branch from 6d67931 to f617ae0 Compare June 1, 2026 14:09

ServeurpersoCom added 2 commits June 1, 2026 16:33

ui: target reasoning control by completion id

c86ab4d

Keep the streamed completion id on the message and post it back to the control endpoint instead of probing /slots. Drops the slot discovery and the TOCTOU that came with it. Action renamed to reasoning_end, response read as {success}.

allozaur approved these changes Jun 1, 2026

View reviewed changes

ServeurpersoCom requested a review from ngxson June 1, 2026 14:56

ngxson reviewed Jun 1, 2026

View reviewed changes

Comment thread tools/server/server-context.cpp Outdated

Comment thread tools/server/server-context.cpp Outdated

Comment thread tools/server/server-task.h Outdated

server: address review from @ngxson

ae128b3

Move the control fields into task_params and drop the redundant comments on the control path.

server: document the reasoning control endpoint

4143e6c

ngxson approved these changes Jun 1, 2026

View reviewed changes

allozaur approved these changes Jun 1, 2026

View reviewed changes

Comment thread tools/ui/src/lib/types/database.d.ts Outdated

ServeurpersoCom and others added 2 commits June 1, 2026 18:34

Update tools/ui/src/lib/types/database.d.ts

21664f4

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>

ui: rename cmplId to completionId

92cbf44

Per @allozaur review, clearer name for the streamed completion id.

allozaur approved these changes Jun 1, 2026

View reviewed changes

ServeurpersoCom added 2 commits June 1, 2026 23:06

allozaur approved these changes Jun 2, 2026

View reviewed changes

allozaur merged commit 354ebac into ggml-org:master Jun 2, 2026
26 of 28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: real-time reasoning interruption via control endpoint#23971

server: real-time reasoning interruption via control endpoint#23971
allozaur merged 11 commits into
ggml-org:masterfrom
ServeurpersoCom:realtime-reasoning-control

ServeurpersoCom commented Jun 1, 2026 •

edited

Loading

Uh oh!

ServeurpersoCom commented Jun 1, 2026 •

edited

Loading

Uh oh!

allozaur commented Jun 1, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson commented Jun 1, 2026

Uh oh!

Uh oh!

ServeurpersoCom commented Jun 1, 2026 •

edited

Loading

Uh oh!

ServeurpersoCom commented Jun 1, 2026

Uh oh!

Uh oh!

CMay commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ServeurpersoCom commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Video

Requirements

Uh oh!

ServeurpersoCom commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

allozaur commented Jun 1, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson commented Jun 1, 2026

Uh oh!

Uh oh!

ServeurpersoCom commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ServeurpersoCom commented Jun 1, 2026

Uh oh!

Uh oh!

CMay commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ServeurpersoCom commented Jun 1, 2026 •

edited

Loading

ServeurpersoCom commented Jun 1, 2026 •

edited

Loading

ServeurpersoCom commented Jun 1, 2026 •

edited

Loading