[Bugfix] [Frontend] Responses API, fix merging of messages#42189
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the chat message construction logic to merge consecutive assistant-side response items, such as reasoning, tool calls, and output text, into single assistant messages. It also updates the test suite with comprehensive merging policy checks. Feedback suggests extending the merging logic to account for existing message history to prevent template issues and replacing assertions with robust type checks when modifying tool call lists to ensure compatibility with immutable types.
Signed-off-by: Yifan Zong <yzong@redhat.com>
|
@yzong-rh thanks, but I am not a coauthor ;) (no code seems to be directly taken from my abandoned PR + coauthorship potentially could complicate DCO) Your PR looks good, two minor remarks:
|
Signed-off-by: Yifan Zong <yzong@redhat.com>
I pulled from you PR to work on top of it. Enough changes may have been made that not much is left of the original code. Happy to amend DCO and give you proper attribution
I removed the explicit merge function to avoid code duplication in the if / else chain, although you are right that
Agree with you that both would indicate a problem given the current models. Other than switching from a gpt-oss to a native model mid-session you mentioned, another case is that the user turn somehow got lost in the conversation, in which case having two consecutive assistant turns might be preferable over a single assistant turn with concatenated content and reasoning. I originally aimed to replicate the llama.cpp PR but a less greedy approach might be less surprising here. Plus, the current content concatenation path isn't that robust, and the code will be much simpler. Updated the logic and BFCL as sanity check (no perf regression, as expected) |
Signed-off-by: Yifan <yzong@redhat.com>
…ect#42189) Signed-off-by: Yifan Zong <yzong@redhat.com> Signed-off-by: Yifan <yzong@redhat.com>
…ect#42189) Signed-off-by: Yifan Zong <yzong@redhat.com> Signed-off-by: Yifan <yzong@redhat.com>
…ect#42189) Signed-off-by: Yifan Zong <yzong@redhat.com> Signed-off-by: Yifan <yzong@redhat.com>
…ect#42189) Signed-off-by: Yifan Zong <yzong@redhat.com> Signed-off-by: Yifan <yzong@redhat.com> (cherry picked from commit 6ff7405)
…ect#42189) Signed-off-by: Yifan Zong <yzong@redhat.com> Signed-off-by: Yifan <yzong@redhat.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
…ect#42189) Signed-off-by: Yifan Zong <yzong@redhat.com> Signed-off-by: Yifan <yzong@redhat.com>
Co-Authored-By: bfroemel
Based on #37294
With help from: @bfroemel, @weiguangli-io and @aayushbaluni
Purpose
Addresses #37167 (
responses API, combining of message and tool call) by merging contiguous assistant-side Responses items into a single Chat Completions assistant message on the non-Harmony path.Merging with the previous message only happens when the previously constructed message is an assistant message; otherwise a new assistant message is started.
llama.cppPR #19773vLLMPR #37294output_textcontentif no previouscontent.content.contentif it hasreasoningonly.function_calltool_calls.tool_calls.tool_calls.reasoningreasoningif no previousreasoning.reasoning_content.reasoning.Note:
Converting Responses items into Chat Completion for non-Harmony models is inherently lossy. Partial-completion may apply to multiple merged Responses items. These are necessary tradeoffs in order to preserve single assistant turn for template rendering.
Existing code also does not handle refusal or multi-part content. This PR doesn't remedy this.
Test Plan
Unit test:
Token-level comparisons:
Send a Responses API request with send_response.py using tools.json and gemma4.step1.response.json or qwen36.step1.response.json. Note that they are real conversations generated by the model.
And compare the decoded prompt token ids.
BFCL evals
Test Result
Unit test pass.
Prompt token comparison:
tool_call_idmismatch, see [Bugfix] Gemma 4 chat template crash with missing tool name and tool id #42188BFCL results:
Qwen3.6:
multi_turn_miss_funcmulti_turn_miss_parammulti_turn_long_contextmulti_turn_baseGemma 4:
multi_turn_miss_funcmulti_turn_miss_parammulti_turn_long_contextmulti_turn_basecc @qandrew @chaunceyjiang
Made with Cursor