fix: tool-call JSON leaks into content with stream+tools on tokenizer-template models (#10052)#10057
Conversation
The JSON-schema-to-GBNF property sort used `aOrder != 0 && bOrder != 0` as its "is this key ordered?" guard. That treats index 0 — the first key listed in properties_order — as unset, so `properties_order: name,arguments` fell back to alphabetical ordering and still emitted "arguments" before "name". Use presence in the order map instead: listed keys sort by their index and ahead of unlisted keys, which keep a stable alphabetical order. This makes the documented `properties_order: name,arguments` actually produce name-first tool-call JSON. Relates to #10052. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]
…template owns templating (#10052) When use_tokenizer_template delegates templating to the backend (llama.cpp), the backend also owns tool-call grammar generation and parsing. LocalAI was still generating its own GBNF grammar and sending it down. With a grammar present, llama.cpp does not hand the tools to its template, so its native peg/json tool parser never engages: it streams the grammar-constrained tool-call JSON back as plain content instead of emitting tool_calls. In streaming mode the JSON object leaked into the content field, and the Go-side incremental detector never gated content because the LocalAI-generated grammar emitted "arguments" before "name". The GGUF auto-import path already couples use_tokenizer_template with grammar.disable, but that block is skipped when a template is already configured, so gallery and hand-written configs (e.g. qwen3) that set the tokenizer template directly never got the paired grammar.disable. - SetDefaults now enforces the coupling for every config: when use_tokenizer_template is set, grammar generation is disabled and tools flow to the backend's native (name-first) pipeline. This also fixes already-installed models without editing each config. - Set function.grammar.disable in the shared gallery/qwen3.yaml, which is the base config referenced by every qwen3 gallery entry. Verified end to end against qwen3-4b with stream:true + tools: content no longer carries the tool-call JSON, reasoning is classified separately, and tool calls stream as proper name-first tool_calls deltas. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]
Relationship to #9722 / #10055 / #9940While investigating I confirmed this PR shares a root cause with the two open PRs against #9722. All three are facets of the same tension: LocalAI's Go-side tool-call parsing/streaming runs in parallel with llama.cpp's C++ chat-template autoparser, and the two collide. Each addresses a different collision symptom, at a different layer:
There is no file overlap between the three branches, so they don't conflict:
Interaction worth a reviewer's attentionThis PR makes every Verification (this branch, qwen3-4b, llama.cpp)
Note for #9940 (FYI, not blocking this PR)The
A name-shape validation (e.g. reject anything not matching |
What
Fixes #10052: with
stream: true+toolson a tokenizer-template model (e.g. qwen3), the chat-completions endpoint streamed the tool-call JSON object out through thecontentfield (in addition to emitting the realtool_calls).Root cause
When
use_tokenizer_templatedelegates templating to the backend (llama.cpp), the backend also owns tool-call grammar generation and parsing. LocalAI was still generating its own GBNF grammar and sending it down. With a Go grammar present, llama.cpp does not pass the tools to its template (has_grammar_from_go=1), so its native peg/json tool parser never engages — it streams the grammar-constrained tool-call JSON back as plaincontentrather thantool_calls.Two contributing defects:
use_tokenizer_templatewithgrammar.disable, but that block is skipped when a template is already configured — so gallery and hand-written configs (qwen3) that set the tokenizer template directly never got the pairedgrammar.disable.aOrder != 0 && bOrder != 0as its guard, treating index 0 (the first key inproperties_order) as "unset". So evenproperties_order: name,argumentsfell back to alphabetical order and emittedargumentsbeforename— which prevented the streaming detector from ever gating content.Changes
SetDefaultsnow couplesuse_tokenizer_template ⇒ grammar.disablefor every config, so gallery, hand-written, and GGUF-import configs agree. Tools flow to the backend's native (name-first) pipeline. This also fixes already-installed models without editing each config.gallery/qwen3.yaml(the shared base referenced by every qwen3 entry) setsfunction.grammar.disable: trueexplicitly.properties_ordersentinel so a listed key at index 0 is honored (presence-in-map instead of!= 0).Verification
End-to-end against
qwen3-4b(stream:true+tools), built from this branch:contentnow carries the actual text (Hello! How can I assist you today?), reasoning is classified separately, and no JSON leaks. Previouslycontentdribbled out{"arguments":...,"name":"answer"}.tool_callsdeltas ({"name":"exec",...}then the arguments) with zero content leak.New tests (TDD, Ginkgo):
core/config:SetDefaultsdisables Go grammar when the tokenizer template is used (and leaves it enabled otherwise).pkg/functions/grammars:properties_order: name,argumentsyields name-first; empty order stays alphabetical; listed keys sort ahead of unlisted ones.golangci-lint run --new-from-merge-base=master→ 0 issues.🤖 Generated with Claude Code