fix(streaming): preserve all tool_calls in OpenAI batch responses by h3c-hexin · Pull Request #1686 · Hmbown/CodeWhale

h3c-hexin · 2026-05-15T10:37:25Z

Summary

Fixes a bug where multiple tool_calls in a single assistant message lose all but the last ToolCallStarted event when streaming from any OpenAI-compatible backend (vLLM / Ollama / LM Studio / Together AI / etc.).

Symptom: when a model emits N batch tool_calls in one round, listeners see 1 × chat:tool_start and N × chat:tool_end. The TUI / runtime API / embedder bridges end up with N orphan tool_result blocks and no matching tool_use blocks in assistant history. Session persistence drops the tool calls; subsequent recall_archive / cycle restart cannot reconstruct the turn.

Reproduction (vLLM 0.7 + Qwen3.6-35B-A3B, prompt asks for 7-file Tauri scaffold):

backend dispatches:   7 × write_file + 1 × exec_shell
ApprovalRequired log: 9 events ✓
listeners receive:    1 × chat:tool_start, 7 × chat:tool_end

Root cause: run_turn (core/engine/turn_loop.rs:354) tracked the streaming tool block with a single current_tool_index: Option<usize>. The Anthropic-style adapter emits Start/Stop in lockstep so the slot never overlaps. But the OpenAI streaming parser (client/chat.rs:1954-2064) emits all ContentBlockStart::ToolUse events as soon as their deltas land, then batches every ContentBlockStop at finish_reason. Each new Start overwrites the slot; the first Stop.take() returns the last index (wrong tool), every later Stop.take() returns None.

Fix: replace the single slot with HashMap<u32 block_index, usize tool_uses_idx>. Start inserts, InputJsonDelta looks up by the outer ContentBlockDelta index, Stop removes by its own index. Routing no longer depends on the equally-overwritten current_block_kind; a successful remove(&index) already proves the Stop belongs to a tool block.

5-place inline edit in turn_loop.rs, no public API changes.

Testing

cargo test --all-features — all related tests pass (3 pre-existing flaky tools::web_run::tests::* failures unrelated, reproducible on main under high concurrency, pass individually)
cargo fmt --all -- --check
cargo clippy --all-targets --all-features — zero warnings
Added regression test batch_tool_calls_preserve_all_tool_use_indices in core/engine/turn_loop.rs::tests
Manual end-to-end verification: vLLM + Qwen3.6-35B + 7-file Tauri scaffold prompt now shows all 7 write_file tool_use blocks paired with their tool_result blocks in the assistant history

Checklist

Updated docs or comments as needed (inline comments at all 5 change sites explain the bug and the fix)
Added or updated tests where relevant
Verified TUI behavior manually

When an OpenAI-compatible backend (vLLM, Ollama, LM Studio, Together AI, self-hosted vLLM/SGLang, etc.) streams an assistant message containing multiple tool_calls in a single round, only the **last** tool's `Event::ToolCallStarted` was firing. The preceding N-1 tool calls executed and produced tool_result events, but never announced their start to consumers (TUI / runtime API / embedder bridges), leaving them with N orphan tool_result blocks and no matching tool_use blocks in the assistant history. ## Reproduction ```text backend dispatches: 7 × write_file + 1 × exec_shell log shows: 7 × ApprovalRequired events ✓ listeners receive: 1 × chat:tool_start, 7 × chat:tool_end session history: 1 tool_use + 7 tool_result (6 orphans) ``` Tested against vLLM 0.7 + Qwen3.6-35B-A3B with a "scaffold 7-file Tauri template" prompt. Any model+backend combo that emits batch tool_calls trips this — typical when a single LLM round asks for multiple parallel file writes or edits. ## Root cause `run_turn` tracked the currently-streaming tool block with a single `current_tool_index: Option<usize>`. The Anthropic-style adapter (non-streaming response → events at `chat.rs::L1807`) emits Start/Stop pairs in lockstep so the slot never overlaps. But the OpenAI streaming parser (`chat.rs::L1954-2064`) emits every `ContentBlockStart::ToolUse` as soon as a tool_call delta lands, then batches every `ContentBlockStop` at `finish_reason`: ```text Start { index: 0 } // tool #1 Delta { index: 0, .. } Start { index: 1 } // tool #2 — overwrites current_tool_index Delta { index: 1, .. } … Start { index: 6 } // current_tool_index = Some(6) Delta { index: 6, .. } Stop { index: 0 } // take() returns Some(6) ← wrong tool! Stop { index: 1 } // take() returns None Stop { index: 2 } // take() returns None … ``` The first `Stop` consumes the last index and emits `ToolCallStarted` for the wrong `tool_uses` entry; every subsequent `Stop` finds the slot already `None` and skips the entire `if let Some(index) = …` branch, dropping the announcement. ## Fix Replace the single slot with `HashMap<u32 block_index, usize tool_uses_idx>`: - `ContentBlockStart::ToolUse` and `::ServerToolUse` insert the `(event.index → tool_uses.len())` mapping. - `InputJsonDelta` looks up by the `ContentBlockDelta` outer index. - `ContentBlockStop` removes by the stop's index, so each Stop routes to its own `tool_uses` entry regardless of arrival order. Routing no longer depends on `current_block_kind` (which has the same single-slot overwrite problem); `current_tool_indices.remove(&index)` returning `Some(_)` already proves the Stop belongs to a tool block. ## Tests Added `batch_tool_calls_preserve_all_tool_use_indices` in `core/engine/turn_loop.rs::tests` — feeds 7 Starts and 7 Stops through the same `HashMap` API used by `run_turn`, asserts every index round-trips. Manual end-to-end verification: vLLM + Qwen3.6-35B + 7-file Tauri template prompt → frontend `messages` history now contains all 7 `write_file` tool_use blocks paired with their tool_result blocks.

gemini-code-assist

Code Review

This pull request addresses a bug in handling OpenAI-compatible streaming responses where multiple tool calls are batched. It replaces the single current_tool_index with a HashMap to track multiple concurrent tool blocks by their index, ensuring that ContentBlockStop events are correctly routed. A regression test was also added to verify the fix. Feedback points out that while the tool use logic is corrected, other block types like Text and Thinking still rely on a single current_block_kind slot, which may lead to similar issues in batched responses; it is recommended to apply a similar HashMap approach to those block types to ensure full consistency.

gemini-code-assist · 2026-05-15T10:39:03Z

+                        // Route the Stop using event.index (via
+                        // `current_tool_indices`) rather than the single
+                        // `current_block_kind` slot. In an OpenAI batch
+                        // tool-call stream every Stop after the first sees
+                        // `stopped_kind = None` because `take()` cleared the
+                        // slot, so the original `matches!(stopped_kind, …)`
+                        // check would skip every tool except the last.


While this fix correctly addresses the issue for ToolUse blocks by using the current_tool_indices map, the logic for Text and Thinking blocks remains broken for batched responses.

As noted in the comment at lines 681-684, stopped_kind is derived from current_block_kind.take() (at line 662), which clears the single slot. In an OpenAI-compatible stream where multiple ContentBlockStop events are batched at the end, only the first Stop event will have a non-None stopped_kind. If a Text block (usually index 0) is not the first to stop in the batch, its completion logic (e.g., setting pending_message_complete at line 665) will be skipped because stopped_kind will be None or ToolUse (if a tool stopped first).

To fully resolve this for all block types, current_block_kind should also be replaced with a map (e.g., HashMap<u32, ContentBlockKind>) to track the kind of each active block by its index.

gemini-code-assist Bot reviewed May 15, 2026

View reviewed changes

Hmbown merged commit a528ea9 into Hmbown:main May 15, 2026
8 checks passed

Hmbown mentioned this pull request May 15, 2026

chore(release): prepare v0.8.38 #1698

Merged

h3c-hexin deleted the fix/openai-streaming-batch-tool-calls branch May 25, 2026 05:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(streaming): preserve all tool_calls in OpenAI batch responses#1686

fix(streaming): preserve all tool_calls in OpenAI batch responses#1686
Hmbown merged 1 commit into
Hmbown:mainfrom
h3c-hexin:fix/openai-streaming-batch-tool-calls

h3c-hexin commented May 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

h3c-hexin commented May 15, 2026

Summary

Testing

Checklist

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants