fix(openai): stop streaming tool-call double-emission when autoparser is active by bozhouDev · Pull Request #10055 · mudler/LocalAI

bozhouDev · 2026-05-29T02:45:39Z

Description

Addresses #9722.

Streaming /v1/chat/completions could emit the same logical tool call at multiple index values. In processStreamWithTools (core/http/endpoints/openai/chat_stream_workers.go) the Go-side iterative parser (ParseXMLIterative / ParseJSONIterative) runs on every token and emits tool-call deltas, while the C++ chat-template autoparser delivers its own tool calls via ChatDeltas, which are flushed at end-of-stream by ToolCallsFromChatDeltas → buildDeferredToolCallChunks. With both paths active the same call is emitted twice at different indices, so OpenAI clients that accumulate tool calls by index end up dispatching the tool N times.

This skips the Go-side iterative parser once the autoparser is producing tool calls (hasChatDeltaToolCalls). The deferred flush stays guarded by lastEmittedCount, so the race where the Go parser emitted before the flag flipped also remains single-emission. Backends without an autoparser (e.g. vLLM) keep hasChatDeltaToolCalls=false and are unaffected.

Notes for Reviewers

Scope: this targets the default autoparser-on path from Streaming /v1/chat/completions emits the same tool_call at multiple index values #9722 (the llama.cpp chat-template autoparser, which is what the reporter ran). The issue also mentions a disable_peg_parser: true variant where the Go iterative parser alone produces multiple indices — that path keeps hasChatDeltaToolCalls=false, so it is intentionally left untouched here and would need a separate change.
Verification: go build ./core/http/endpoints/openai/, go vet, gofmt -l, and the existing core/http/endpoints/openai test suite all pass. I was not able to reproduce against a live llama.cpp model in my environment, so a maintainer sanity-check with a tool-calling GGUF (the repro in Streaming /v1/chat/completions emits the same tool_call at multiple index values #9722) would be appreciated. The change is intentionally minimal and mirrors the existing preferAutoparser philosophy already used in this file, so non-autoparser backends are unaffected.
I'm happy to adjust the approach — the issue lists alternatives (tracking autoparser emissions in lastEmittedCount, or a (name, arguments) dedupe net) if you prefer one of those.

Signed commits

Yes, I signed my commits.

… is active Streaming /v1/chat/completions could emit the same logical tool call at multiple `index` values. In processStreamWithTools the Go-side iterative parser (ParseXMLIterative / ParseJSONIterative) runs on every token and emits tool-call deltas, while the C++ chat-template autoparser delivers its own tool calls via ChatDeltas that are flushed at end-of-stream by ToolCallsFromChatDeltas -> buildDeferredToolCallChunks. With both paths active the same call is emitted twice at different indices, so OpenAI clients that accumulate tool calls by `index` dispatch the tool N times. Skip the Go-side iterative parser once the autoparser is producing tool calls (hasChatDeltaToolCalls). The deferred flush stays guarded by lastEmittedCount, so the race where the Go parser emitted before the flag flipped also remains single-emission. Backends without an autoparser (e.g. vLLM) keep hasChatDeltaToolCalls=false and are unaffected. Refs mudler#9722 Co-authored-by: Cursor <cursoragent@cursor.com> Signed-off-by: bozhouDev <259759010+bozhouDev@users.noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com>

localai-bot mentioned this pull request May 29, 2026

fix: tool-call JSON leaks into content with stream+tools on tokenizer-template models (#10052) #10057

Merged

mudler merged commit e1a782b into mudler:master May 29, 2026
56 checks passed

localai-bot mentioned this pull request May 29, 2026

fix(functions): validate auto-detected XML tool-call names — robust glm-4.5/Hermes guard (#9722, supersedes #9940) #10059

Merged

mudler added the bug Something isn't working label May 29, 2026

BrewTestBot mentioned this pull request May 29, 2026

localai 4.3.5 Homebrew/homebrew-core#285377

Merged

BrewTestBot mentioned this pull request Jun 10, 2026

localai 4.4.0 Homebrew/homebrew-core#287347

Merged

localai-bot mentioned this pull request Jun 12, 2026

Agent always ever answers {"{" #9419

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(openai): stop streaming tool-call double-emission when autoparser is active#10055

fix(openai): stop streaming tool-call double-emission when autoparser is active#10055
mudler merged 1 commit into
mudler:masterfrom
bozhouDev:fix/streaming-tool-call-double-emit

bozhouDev commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

bozhouDev commented May 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants