[Bugfix][Frontend] Fix Gemma4 streaming HTML duplication after tool calls#38909
Conversation
There was a problem hiding this comment.
Code Review
This pull request fixes a duplication bug in the Gemma4ToolParser by removing the manual reconstruction of current_text from previous_text and buffered delta_text. This change ensures that characters are not duplicated when a tool call ends or when parsing HTML content within tool arguments. Corresponding unit tests have been added to verify the fix for both plain text following a tool call and HTML arguments. I have no feedback to provide.
|
This pull request has merge conflicts that must be resolved before it can be |
|
@yoke233 I simply tested this PR of yours, and it indeed solves the problem. You need to resolve the merge conflicts to proceed with the merge. |
|
Can you rebase and resolve conflict? |
…alls Stop rebuilding current_text from buffered streaming deltas in the Gemma4 tool parser and add regression coverage for plain text and HTML content after tool calls. Co-authored-by: OpenAI Codex Signed-off-by: yoke233 <yoke2012@gmail.com>
37c7891 to
0cd2c0a
Compare
|
Rebased onto the latest I kept this PR scoped to the original
I did not fold in other related fixes here. |
|
Hi @yoke233, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: yoke233 <yoke2012@gmail.com>
chaunceyjiang
left a comment
There was a problem hiding this comment.
LGTM.
/cc @lucianommartins PTAL.
|
thanks @chaunceyjiang - LGTM, if all tests pass it is go! thanks @yoke233! |
|
@chaunceyjiang can we get this merged please |
…alls (vllm-project#38909) Signed-off-by: yoke233 <yoke2012@gmail.com>
…alls (vllm-project#38909) Signed-off-by: yoke233 <yoke2012@gmail.com>
…alls (vllm-project#38909) Signed-off-by: yoke233 <yoke2012@gmail.com>
…alls (vllm-project#38909) Signed-off-by: yoke233 <yoke2012@gmail.com>
…alls (vllm-project#38909) Signed-off-by: yoke233 <yoke2012@gmail.com> (cherry picked from commit 1cddaca)
…alls (vllm-project#38909) Signed-off-by: yoke233 <yoke2012@gmail.com> (cherry picked from commit 1cddaca) (cherry picked from commit b6e6392)
…alls (vllm-project#38909) Signed-off-by: yoke233 <yoke2012@gmail.com>
…alls (vllm-project#38909) Signed-off-by: yoke233 <yoke2012@gmail.com> (cherry picked from commit 87efc0f)
…alls (vllm-project#38909) Signed-off-by: yoke233 <yoke2012@gmail.com> (cherry picked from commit 87efc0f) (cherry picked from commit d47d6c6)
…alls (vllm-project#38909) Signed-off-by: yoke233 <yoke2012@gmail.com>
…alls (vllm-project#38909) Signed-off-by: yoke233 <yoke2012@gmail.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Summary
Fix a Gemma4 streaming bug where buffered deltas are stitched back into
current_text, which can corrupt normal text after or inside tool calls.This showed up most clearly with HTML content in tool arguments such as
write_file(content=...), where tags like<html>and<meta>could be streamed into malformed output such as<<htmlhtmlor<<metameta.Fixes #38910.
Root cause
Gemma4ToolParser.extract_tool_calls_streaming()buffereddelta_textto avoid leaking partial special tokens and then rebuiltcurrent_textusing:That mixes buffered output state with the original accumulated model text. When a buffered
<is replayed intocurrent_text, the parser can produce invalid intermediate states like<<div>,<<html, and<<meta. Those bad intermediate states then interact with the argument diff logic and can duplicate tag names in the final streamed tool arguments.Fix
Keep
current_textfrom the upstream streaming state and use buffereddelta_textonly for emission. Do not reconstructcurrent_textfrom the buffered delta.Tests
Why this is not duplicating an existing PR:
Validation run:
uv run --no-project python -<div>intact instead of producing<<div>internally<html>/<meta>intact instead of producing duplicated prefixes such as<<htmlhtmlor<<metametaNote:
tests/tool_parsers/test_gemma4_tool_parser.py.AI disclosure
This PR includes AI-assisted code generation and analysis. I reviewed the changes, reproduced the bug path, and validated the fix myself.