-
Notifications
You must be signed in to change notification settings - Fork 15.4k
Description
Bug Description
When serving a gpt-oss model (e.g. ggml-org/gpt-oss-120b-GGUF) with --reasoning-format auto, the first request succeeds but subsequent multi-turn requests with tool calls fail with HTTP 500:
Jinja Exception: Cannot pass both content and thinking in an assistant message with tool calls!
Regression
Introduced in PR #16937 (commit 87c9efc3b) — "common : move gpt-oss reasoning processing to init params".
That PR moved the thinking field assignment from output serialization (common_chat_msgs_to_json_oaicompat) to input processing (common_chat_params_init_gpt_oss). The new code conditionally adds thinking from reasoning_content when tool calls are present, but does not erase content from the adjusted message.
Root Cause
In common/chat.cpp — common_chat_params_init_gpt_oss():
if (has_reasoning_content && has_tool_calls) {
auto adjusted_message = msg;
adjusted_message["thinking"] = msg.at("reasoning_content");
// BUG: "content" is not erased — template forbids having both
adjusted_messages.push_back(adjusted_message);
}The gpt-oss Jinja template (models/templates/openai-gpt-oss-120b.jinja) explicitly checks that tool-call messages have either content or thinking, not both — they render to the same <|channel|>analysis slot. When the client sends back conversation history containing assistant messages with content, reasoning_content, and tool_calls, the adjusted message ends up with {content, thinking, tool_calls} and the template raises an error.
Steps to Reproduce
- Start
llama-serverwith a gpt-oss model (e.g.llama-server -hf ggml-org/gpt-oss-120b-GGUF --reasoning-format auto --jinja -fa on) - Send a chat completion request that triggers tool calls
- Send a follow-up request including the full conversation history (with the assistant's
content,reasoning_content, andtool_calls) - Server returns HTTP 500 with the Jinja exception
Fix
Add adjusted_message.erase("content") after setting thinking:
if (has_reasoning_content && has_tool_calls) {
auto adjusted_message = msg;
adjusted_message["thinking"] = msg.at("reasoning_content");
adjusted_message.erase("content"); // template forbids both content and thinking with tool_calls
adjusted_messages.push_back(adjusted_message);
}