Skip to content

fix: Gemma 4 MoE multi-turn tool call corruption#1665

Merged
jundot merged 3 commits into
jundot:mainfrom
kreeger:fix/gemma-4-moe-tool-call-fix
Jun 5, 2026
Merged

fix: Gemma 4 MoE multi-turn tool call corruption#1665
jundot merged 3 commits into
jundot:mainfrom
kreeger:fix/gemma-4-moe-tool-call-fix

Conversation

@kreeger

@kreeger kreeger commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

👋 First time contribution for me! Please let me know if you'd like to see stuff changed on this changeset, I'm happy to work with you all. Been loving oMLX for weeks now, figured it was time to give back.

Problem domain

Multi-turn tool-calling conversations with Gemma 4 26B A4B were breaking on the second and later turns. Stray protocol tokens (<|tool_call>, <tool_call|>) were leaking from the streaming layer into the assistant message content. When that message was getting fed back through apply_chat_template, the tokens got treated as real special tokens, producing unbalanced delimiters that corrupt the model's context for all subsequent turns.

This PR fixes that by identifying and stripping stray closing markers in tool_calling.py, and sanitizes the content in Gemma 4 runs before re-rendering conversation history.

Root cause(s)

  1. ToolCallStreamFilter didn't strip stray close markers.
    • A <tool_call|> emitted outside a matched open/close pair passed through unfiltered
    • If the tokenizer split it across two feed calls (<tool_call| then >), the buffer would also flush each half immediately rather than holding them for reassembly
  2. extract_gemma4_messages didn't sanitize content before re-rendering history.
    • Any marker that survived the streaming pass was embedded verbatim into the next turn's prompt

How to reproduce

  1. Load any Gemma 4 26B A4B model with at least one tool defined.
  2. Send a request that causes a tool call.
  3. Append the tool result and send a follow-up turn.
  4. Observe incoherent output or tool use failure on turn 2+. Verbose prompt logging will show <tool_call|> embedded literally in the rendered prompt.

Edge cases considered

  • Split-feed: <tool_call|> split as <tool_call| + > across two feed() calls
    • Fixed by adding stray-close markers to the partial-prefix hold logic
  • Well-formed </tool_call> in prose: The strip is scoped to the bare stray tokens only (no leading slash), so legitimate closing tags in prose are untouched
  • Prose around a marker: The regex removes only the token
    • "Here is the result.<tool_call|>" becomes "Here is the result."

Issues to close

This ought to finally close #617, which I was getting bit by. It might also close #1465 and/or #1410, but I'm not sure about that; those users should try re-testing if/when this makes it into main.

kreeger added 3 commits June 4, 2026 13:43
Strip stray `<|tool_call>` / `<tool_call|>` tokens from assistant
message content before feeding history back to `apply_chat_template`.

These tokens only belong in `tool_calls`/`tool_responses`; any
occurrence in `content` is a streaming leak artifact. Left in place, the
template embeds them as real special tokens, producing unbalanced
open/close counts that corrupt the model's context on subsequent turns.
ToolCallStreamFilter flushed a split close marker (e.g. <tool_call|
then >) immediately because _partial_suffix_len did not detect it as
a partial prefix worth holding. Add stray-close markers to the hold
logic so both halves reassemble before the strip check fires.
@kreeger kreeger changed the title fix: Gemma 4 multi-turn tool call corruption fix: Gemma 4 MoE multi-turn tool call corruption Jun 4, 2026
@jundot

jundot commented Jun 5, 2026

Copy link
Copy Markdown
Owner

Thanks for tracking this down. The Gemma 4 multi-turn repro lines up with the leak through stored assistant content, and this fixes the path I care about for #617. I found one small scope issue in the streaming filter for non-Gemma XML close markers, so I will merge this and fold that into a follow-up on main.

@jundot jundot merged commit e74f755 into jundot:main Jun 5, 2026
@kreeger kreeger deleted the fix/gemma-4-moe-tool-call-fix branch June 5, 2026 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants