Any way to send reasoning_content back to llama-server via the builtin-web UI?
#18368
-
|
Some LLMs don't need reasoning_content when rendering the chat template, but a few like Minimax M2 were trained on interleaved thinking and in fact the chat template shows that it expects reasoning_content when rendering the chat history. Inspecting requests sent by the web UI, it seems
This might reduce output quality in multi-turn interactions. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 24 replies
-
|
related to #17430 ? |
Beta Was this translation helpful? Give feedback.
-
|
I believe only Ministral 3 supports interleaved thinking with user messages. All other models (gpt-oss, MiniMax M2, Kimi K2, Nemotron Nano 3) only keep it during tool call/tool response loops. So there's little value with the current Web UI. I think it should be considered in the upcoming tool calling/MCP Web UI integration. |
Beta Was this translation helpful? Give feedback.
-
|
@aldehir Moving this to a dedicated thread: I'm wondering if it would be useful to have a code checking if a template support putting back We can do this by just try to format a chat w/ and w/o the
Depending on this, we can maybe show a warning, or decide if |
Beta Was this translation helpful? Give feedback.

I have a big refactor pending review by Alek and I'll submit a PR :)