Skip to content

Bug: Jinja template error on multi-turn tool calls with reasoning_content (gpt-oss) #19701

@abhijitb11

Description

@abhijitb11

Bug Description

When serving a gpt-oss model (e.g. ggml-org/gpt-oss-120b-GGUF) with --reasoning-format auto, the first request succeeds but subsequent multi-turn requests with tool calls fail with HTTP 500:

Jinja Exception: Cannot pass both content and thinking in an assistant message with tool calls!

Regression

Introduced in PR #16937 (commit 87c9efc3b) — "common : move gpt-oss reasoning processing to init params".

That PR moved the thinking field assignment from output serialization (common_chat_msgs_to_json_oaicompat) to input processing (common_chat_params_init_gpt_oss). The new code conditionally adds thinking from reasoning_content when tool calls are present, but does not erase content from the adjusted message.

Root Cause

In common/chat.cppcommon_chat_params_init_gpt_oss():

if (has_reasoning_content && has_tool_calls) {
    auto adjusted_message = msg;
    adjusted_message["thinking"] = msg.at("reasoning_content");
    // BUG: "content" is not erased — template forbids having both
    adjusted_messages.push_back(adjusted_message);
}

The gpt-oss Jinja template (models/templates/openai-gpt-oss-120b.jinja) explicitly checks that tool-call messages have either content or thinking, not both — they render to the same <|channel|>analysis slot. When the client sends back conversation history containing assistant messages with content, reasoning_content, and tool_calls, the adjusted message ends up with {content, thinking, tool_calls} and the template raises an error.

Steps to Reproduce

  1. Start llama-server with a gpt-oss model (e.g. llama-server -hf ggml-org/gpt-oss-120b-GGUF --reasoning-format auto --jinja -fa on)
  2. Send a chat completion request that triggers tool calls
  3. Send a follow-up request including the full conversation history (with the assistant's content, reasoning_content, and tool_calls)
  4. Server returns HTTP 500 with the Jinja exception

Fix

Add adjusted_message.erase("content") after setting thinking:

if (has_reasoning_content && has_tool_calls) {
    auto adjusted_message = msg;
    adjusted_message["thinking"] = msg.at("reasoning_content");
    adjusted_message.erase("content");  // template forbids both content and thinking with tool_calls
    adjusted_messages.push_back(adjusted_message);
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions