Any way to send `reasoning_content` back to llama-server via the builtin-web UI? #18368

tarruda · 2025-12-25T14:01:52Z

tarruda
Dec 25, 2025

Some LLMs don't need reasoning_content when rendering the chat template, but a few like Minimax M2 were trained on interleaved thinking and in fact the chat template shows that it expects reasoning_content when rendering the chat history.

Inspecting requests sent by the web UI, it seems reasoning_content received by the frontend is not sent back to the server:

This might reduce output quality in multi-turn interactions.

Answered by ServeurpersoCom

Dec 26, 2025

I have a big refactor pending review by Alek and I'll submit a PR :)

View full answer

woheller69 · 2025-12-25T19:15:59Z

woheller69
Dec 25, 2025

related to #17430 ?

0 replies

aldehir · 2025-12-26T08:16:28Z

aldehir
Dec 26, 2025
Collaborator

I believe only Ministral 3 supports interleaved thinking with user messages. All other models (gpt-oss, MiniMax M2, Kimi K2, Nemotron Nano 3) only keep it during tool call/tool response loops. So there's little value with the current Web UI.

I think it should be considered in the upcoming tool calling/MCP Web UI integration.

cc @ngxson @allozaur @ServeurpersoCom

21 replies

aldehir Dec 26, 2025
Collaborator

@ServeurpersoCom, we don't need to concern ourselves with the tags. Just sending back the reasoning_content field is sufficient. The parsing in common should ensure it gets populated in the template.

ServeurpersoCom Dec 26, 2025
Collaborator

If it's a matter of returning the reasoning_content exactly as it was received on frontend, it's trivial.

woheller69 Dec 26, 2025

this is what the old webui did with "Exclude thought process when sending requests to API" setting OFF, right?

ServeurpersoCom Dec 26, 2025
Collaborator

I have a big refactor pending review by Alek and I'll submit a PR :)

Answer selected by tarruda

aldehir Dec 26, 2025
Collaborator

this is what the old webui did with "Exclude thought process when sending requests to API" setting OFF, right?

Not quite. In the old WebUI, it relied on extracting the thinking tags from the assistant message. The option simply turned this off. The CoT was still received in the content field.

It is now preferred to parse the reasoning on the backend and pass it to clients in the separate reasoning_content field.

woheller69 Dec 26, 2025

I would like to have exactly that option back...

ngxson Dec 26, 2025
Maintainer

It won't simply be enough to just put reasoning_content back into the server. In most cases, jinja chat template simply don't know how to handle it, and it will be removed from the message.

I doubt if we can really have a good balance approach for interleaved thinking models. For these model, we probably need to also send and store the non-parsed version of the message (bypass chat parsing).

aldehir Dec 26, 2025
Collaborator

It won't simply be enough to just put reasoning_content back into the server. In most cases, jinja chat template simply don't know how to handle it, and it will be removed from the message.

Just like the parsing extracts the reasoning traces, the chat init/formatting can transform them to the appropriate field in the template. The specialization already exists for a particular template, so no generalized approach is necessary. The only thing is to standardize the field, which has settled to reasoning_content now.

This is already done for Command R7B, GPT-OSS, and Ministral. Templates for MiniMax M2 and Kimi K2 already look for message.reasoning_content, so no additional work is needed for them.

Of course, this falls apart if models start emitting other artifacts.

ngxson · 2025-12-26T18:49:15Z

ngxson
Dec 26, 2025
Maintainer

@aldehir Moving this to a dedicated thread: I'm wondering if it would be useful to have a code checking if a template support putting back reasoning_content or not

We can do this by just try to format a chat w/ and w/o the reasoning_content. There will be 3 cases:

reasoning_content is injected successfully into the content (we ignore checking the order for now)
reasoning_content is ignored by the template
An exception is thrown

Depending on this, we can maybe show a warning, or decide if reasoning_content can be (safely) supported.

3 replies

aldehir Dec 26, 2025
Collaborator

Sure. If we move to a class-based API (#18215), then the chat template lifecycle can match the server process and provide functions to query its capabilities. I'm guessing you'll need this information upfront to present a warning. Or copy how it's done in supports_thinking().

I wonder what's the benefit of the check, though. It's added complexity for information that can be inferred after detecting the chat format.

ngxson Dec 26, 2025
Maintainer

I think for now, the main benefit would be to report to the webui if the chat template supports feeding back the reasoning content. Just to make it more visible to end-user, maybe?

aldehir Dec 26, 2025
Collaborator

I think it's a great idea.

I was thinking we could just return a known set of capabilities depending on the chat format. Honestly, though, the checks are only a few lines of code so there's no harm in implementing them.

Any way to send reasoning_content back to llama-server via the builtin-web UI? #18368

Uh oh!

tarruda Dec 25, 2025

Replies: 3 comments · 24 replies

Uh oh!

woheller69 Dec 25, 2025

Uh oh!

Uh oh!

aldehir Dec 26, 2025 Collaborator

Uh oh!

aldehir Dec 26, 2025 Collaborator

Uh oh!

ServeurpersoCom Dec 26, 2025 Collaborator

Uh oh!

woheller69 Dec 26, 2025

Uh oh!

ServeurpersoCom Dec 26, 2025 Collaborator

Uh oh!

aldehir Dec 26, 2025 Collaborator

Uh oh!

woheller69 Dec 26, 2025

Uh oh!

ngxson Dec 26, 2025 Maintainer

Uh oh!

Uh oh!

aldehir Dec 26, 2025 Collaborator

Uh oh!

ngxson Dec 26, 2025 Maintainer

Uh oh!

Uh oh!

aldehir Dec 26, 2025 Collaborator

Uh oh!

ngxson Dec 26, 2025 Maintainer

Uh oh!

aldehir Dec 26, 2025 Collaborator

Any way to send `reasoning_content` back to llama-server via the builtin-web UI? #18368

tarruda
Dec 25, 2025

Replies: 3 comments 24 replies

woheller69
Dec 25, 2025

aldehir
Dec 26, 2025
Collaborator

aldehir Dec 26, 2025
Collaborator

ServeurpersoCom Dec 26, 2025
Collaborator

ServeurpersoCom Dec 26, 2025
Collaborator

aldehir Dec 26, 2025
Collaborator

ngxson Dec 26, 2025
Maintainer

aldehir Dec 26, 2025
Collaborator

ngxson
Dec 26, 2025
Maintainer

aldehir Dec 26, 2025
Collaborator

ngxson Dec 26, 2025
Maintainer

aldehir Dec 26, 2025
Collaborator