Skip to content

fix(openai): stop streaming tool-call double-emission when autoparser is active#10055

Merged
mudler merged 1 commit into
mudler:masterfrom
bozhouDev:fix/streaming-tool-call-double-emit
May 29, 2026
Merged

fix(openai): stop streaming tool-call double-emission when autoparser is active#10055
mudler merged 1 commit into
mudler:masterfrom
bozhouDev:fix/streaming-tool-call-double-emit

Conversation

@bozhouDev

Copy link
Copy Markdown
Contributor

Description

Addresses #9722.

Streaming /v1/chat/completions could emit the same logical tool call at multiple index values. In processStreamWithTools (core/http/endpoints/openai/chat_stream_workers.go) the Go-side iterative parser (ParseXMLIterative / ParseJSONIterative) runs on every token and emits tool-call deltas, while the C++ chat-template autoparser delivers its own tool calls via ChatDeltas, which are flushed at end-of-stream by ToolCallsFromChatDeltasbuildDeferredToolCallChunks. With both paths active the same call is emitted twice at different indices, so OpenAI clients that accumulate tool calls by index end up dispatching the tool N times.

This skips the Go-side iterative parser once the autoparser is producing tool calls (hasChatDeltaToolCalls). The deferred flush stays guarded by lastEmittedCount, so the race where the Go parser emitted before the flag flipped also remains single-emission. Backends without an autoparser (e.g. vLLM) keep hasChatDeltaToolCalls=false and are unaffected.

Notes for Reviewers

  • Scope: this targets the default autoparser-on path from Streaming /v1/chat/completions emits the same tool_call at multiple index values #9722 (the llama.cpp chat-template autoparser, which is what the reporter ran). The issue also mentions a disable_peg_parser: true variant where the Go iterative parser alone produces multiple indices — that path keeps hasChatDeltaToolCalls=false, so it is intentionally left untouched here and would need a separate change.
  • Verification: go build ./core/http/endpoints/openai/, go vet, gofmt -l, and the existing core/http/endpoints/openai test suite all pass. I was not able to reproduce against a live llama.cpp model in my environment, so a maintainer sanity-check with a tool-calling GGUF (the repro in Streaming /v1/chat/completions emits the same tool_call at multiple index values #9722) would be appreciated. The change is intentionally minimal and mirrors the existing preferAutoparser philosophy already used in this file, so non-autoparser backends are unaffected.
  • I'm happy to adjust the approach — the issue lists alternatives (tracking autoparser emissions in lastEmittedCount, or a (name, arguments) dedupe net) if you prefer one of those.

Signed commits

  • Yes, I signed my commits.

… is active

Streaming /v1/chat/completions could emit the same logical tool call at
multiple `index` values. In processStreamWithTools the Go-side iterative
parser (ParseXMLIterative / ParseJSONIterative) runs on every token and
emits tool-call deltas, while the C++ chat-template autoparser delivers
its own tool calls via ChatDeltas that are flushed at end-of-stream by
ToolCallsFromChatDeltas -> buildDeferredToolCallChunks. With both paths
active the same call is emitted twice at different indices, so OpenAI
clients that accumulate tool calls by `index` dispatch the tool N times.

Skip the Go-side iterative parser once the autoparser is producing tool
calls (hasChatDeltaToolCalls). The deferred flush stays guarded by
lastEmittedCount, so the race where the Go parser emitted before the flag
flipped also remains single-emission. Backends without an autoparser
(e.g. vLLM) keep hasChatDeltaToolCalls=false and are unaffected.

Refs mudler#9722

Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: bozhouDev <259759010+bozhouDev@users.noreply.github.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants