fix: Gemma 4 MoE multi-turn tool call corruption by kreeger · Pull Request #1665 · jundot/omlx

kreeger · 2026-06-04T19:27:06Z

👋 First time contribution for me! Please let me know if you'd like to see stuff changed on this changeset, I'm happy to work with you all. Been loving oMLX for weeks now, figured it was time to give back.

Problem domain

Multi-turn tool-calling conversations with Gemma 4 26B A4B were breaking on the second and later turns. Stray protocol tokens (<|tool_call>, <tool_call|>) were leaking from the streaming layer into the assistant message content. When that message was getting fed back through apply_chat_template, the tokens got treated as real special tokens, producing unbalanced delimiters that corrupt the model's context for all subsequent turns.

This PR fixes that by identifying and stripping stray closing markers in tool_calling.py, and sanitizes the content in Gemma 4 runs before re-rendering conversation history.

Root cause(s)

ToolCallStreamFilter didn't strip stray close markers.
- A <tool_call|> emitted outside a matched open/close pair passed through unfiltered
- If the tokenizer split it across two feed calls (<tool_call| then >), the buffer would also flush each half immediately rather than holding them for reassembly
extract_gemma4_messages didn't sanitize content before re-rendering history.
- Any marker that survived the streaming pass was embedded verbatim into the next turn's prompt

How to reproduce

Load any Gemma 4 26B A4B model with at least one tool defined.
Send a request that causes a tool call.
Append the tool result and send a follow-up turn.
Observe incoherent output or tool use failure on turn 2+. Verbose prompt logging will show <tool_call|> embedded literally in the rendered prompt.

Edge cases considered

Split-feed: <tool_call|> split as <tool_call| + > across two feed() calls
- Fixed by adding stray-close markers to the partial-prefix hold logic
Well-formed </tool_call> in prose: The strip is scoped to the bare stray tokens only (no leading slash), so legitimate closing tags in prose are untouched
Prose around a marker: The regex removes only the token
- "Here is the result.<tool_call|>" becomes "Here is the result."

Issues to close

This ought to finally close #617, which I was getting bit by. It might also close #1465 and/or #1410, but I'm not sure about that; those users should try re-testing if/when this makes it into main.

Strip stray `<|tool_call>` / `<tool_call|>` tokens from assistant message content before feeding history back to `apply_chat_template`. These tokens only belong in `tool_calls`/`tool_responses`; any occurrence in `content` is a streaming leak artifact. Left in place, the template embeds them as real special tokens, producing unbalanced open/close counts that corrupt the model's context on subsequent turns.

ToolCallStreamFilter flushed a split close marker (e.g. <tool_call| then >) immediately because _partial_suffix_len did not detect it as a partial prefix worth holding. Add stray-close markers to the hold logic so both halves reassemble before the strip check fires.

jundot · 2026-06-05T02:53:08Z

Thanks for tracking this down. The Gemma 4 multi-turn repro lines up with the leak through stored assistant content, and this fixes the path I care about for #617. I found one small scope issue in the streaming filter for non-Gemma XML close markers, so I will merge this and fold that into a follow-up on main.

kreeger added 3 commits June 4, 2026 13:43

fix: Remove stray closing markers for Gemma 4 tool calls

04addc8

kreeger changed the title ~~fix: Gemma 4 multi-turn tool call corruption~~ fix: Gemma 4 MoE multi-turn tool call corruption Jun 4, 2026

jundot merged commit e74f755 into jundot:main Jun 5, 2026

kreeger deleted the fix/gemma-4-moe-tool-call-fix branch June 5, 2026 13:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Gemma 4 MoE multi-turn tool call corruption#1665

fix: Gemma 4 MoE multi-turn tool call corruption#1665
jundot merged 3 commits into
jundot:mainfrom
kreeger:fix/gemma-4-moe-tool-call-fix

kreeger commented Jun 4, 2026 •

edited

Loading

Uh oh!

jundot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kreeger commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem domain

Root cause(s)

How to reproduce

Edge cases considered

Issues to close

Uh oh!

jundot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kreeger commented Jun 4, 2026 •

edited

Loading