fix(streaming): preserve all tool_calls in OpenAI batch responses#1686
Conversation
When an OpenAI-compatible backend (vLLM, Ollama, LM Studio, Together AI,
self-hosted vLLM/SGLang, etc.) streams an assistant message containing
multiple tool_calls in a single round, only the **last** tool's
`Event::ToolCallStarted` was firing. The preceding N-1 tool calls
executed and produced tool_result events, but never announced their
start to consumers (TUI / runtime API / embedder bridges), leaving them
with N orphan tool_result blocks and no matching tool_use blocks in the
assistant history.
## Reproduction
```text
backend dispatches: 7 × write_file + 1 × exec_shell
log shows: 7 × ApprovalRequired events ✓
listeners receive: 1 × chat:tool_start, 7 × chat:tool_end
session history: 1 tool_use + 7 tool_result (6 orphans)
```
Tested against vLLM 0.7 + Qwen3.6-35B-A3B with a "scaffold 7-file Tauri
template" prompt. Any model+backend combo that emits batch tool_calls
trips this — typical when a single LLM round asks for multiple parallel
file writes or edits.
## Root cause
`run_turn` tracked the currently-streaming tool block with a single
`current_tool_index: Option<usize>`. The Anthropic-style adapter
(non-streaming response → events at `chat.rs::L1807`) emits
Start/Stop pairs in lockstep so the slot never overlaps. But the
OpenAI streaming parser (`chat.rs::L1954-2064`) emits every
`ContentBlockStart::ToolUse` as soon as a tool_call delta lands, then
batches every `ContentBlockStop` at `finish_reason`:
```text
Start { index: 0 } // tool #1
Delta { index: 0, .. }
Start { index: 1 } // tool #2 — overwrites current_tool_index
Delta { index: 1, .. }
…
Start { index: 6 } // current_tool_index = Some(6)
Delta { index: 6, .. }
Stop { index: 0 } // take() returns Some(6) ← wrong tool!
Stop { index: 1 } // take() returns None
Stop { index: 2 } // take() returns None
…
```
The first `Stop` consumes the last index and emits `ToolCallStarted`
for the wrong `tool_uses` entry; every subsequent `Stop` finds the
slot already `None` and skips the entire `if let Some(index) = …`
branch, dropping the announcement.
## Fix
Replace the single slot with `HashMap<u32 block_index, usize
tool_uses_idx>`:
- `ContentBlockStart::ToolUse` and `::ServerToolUse` insert the
`(event.index → tool_uses.len())` mapping.
- `InputJsonDelta` looks up by the `ContentBlockDelta` outer index.
- `ContentBlockStop` removes by the stop's index, so each Stop routes
to its own `tool_uses` entry regardless of arrival order.
Routing no longer depends on `current_block_kind` (which has the same
single-slot overwrite problem); `current_tool_indices.remove(&index)`
returning `Some(_)` already proves the Stop belongs to a tool block.
## Tests
Added `batch_tool_calls_preserve_all_tool_use_indices` in
`core/engine/turn_loop.rs::tests` — feeds 7 Starts and 7 Stops through
the same `HashMap` API used by `run_turn`, asserts every index round-trips.
Manual end-to-end verification: vLLM + Qwen3.6-35B + 7-file Tauri
template prompt → frontend `messages` history now contains all 7
`write_file` tool_use blocks paired with their tool_result blocks.
There was a problem hiding this comment.
Code Review
This pull request addresses a bug in handling OpenAI-compatible streaming responses where multiple tool calls are batched. It replaces the single current_tool_index with a HashMap to track multiple concurrent tool blocks by their index, ensuring that ContentBlockStop events are correctly routed. A regression test was also added to verify the fix. Feedback points out that while the tool use logic is corrected, other block types like Text and Thinking still rely on a single current_block_kind slot, which may lead to similar issues in batched responses; it is recommended to apply a similar HashMap approach to those block types to ensure full consistency.
| // Route the Stop using event.index (via | ||
| // `current_tool_indices`) rather than the single | ||
| // `current_block_kind` slot. In an OpenAI batch | ||
| // tool-call stream every Stop after the first sees | ||
| // `stopped_kind = None` because `take()` cleared the | ||
| // slot, so the original `matches!(stopped_kind, …)` | ||
| // check would skip every tool except the last. |
There was a problem hiding this comment.
While this fix correctly addresses the issue for ToolUse blocks by using the current_tool_indices map, the logic for Text and Thinking blocks remains broken for batched responses.
As noted in the comment at lines 681-684, stopped_kind is derived from current_block_kind.take() (at line 662), which clears the single slot. In an OpenAI-compatible stream where multiple ContentBlockStop events are batched at the end, only the first Stop event will have a non-None stopped_kind. If a Text block (usually index 0) is not the first to stop in the batch, its completion logic (e.g., setting pending_message_complete at line 665) will be skipped because stopped_kind will be None or ToolUse (if a tool stopped first).
To fully resolve this for all block types, current_block_kind should also be replaced with a map (e.g., HashMap<u32, ContentBlockKind>) to track the kind of each active block by its index.
Summary
Fixes a bug where multiple
tool_callsin a single assistant message lose all but the lastToolCallStartedevent when streaming from any OpenAI-compatible backend (vLLM / Ollama / LM Studio / Together AI / etc.).Symptom: when a model emits N batch tool_calls in one round, listeners see
1 × chat:tool_startandN × chat:tool_end. The TUI / runtime API / embedder bridges end up withNorphantool_resultblocks and no matchingtool_useblocks in assistant history. Session persistence drops the tool calls; subsequentrecall_archive/ cycle restart cannot reconstruct the turn.Reproduction (vLLM 0.7 + Qwen3.6-35B-A3B, prompt asks for 7-file Tauri scaffold):
Root cause:
run_turn(core/engine/turn_loop.rs:354) tracked the streaming tool block with a singlecurrent_tool_index: Option<usize>. The Anthropic-style adapter emitsStart/Stopin lockstep so the slot never overlaps. But the OpenAI streaming parser (client/chat.rs:1954-2064) emits allContentBlockStart::ToolUseevents as soon as their deltas land, then batches everyContentBlockStopatfinish_reason. Each newStartoverwrites the slot; the firstStop.take()returns the last index (wrong tool), every laterStop.take()returnsNone.Fix: replace the single slot with
HashMap<u32 block_index, usize tool_uses_idx>.Startinserts,InputJsonDeltalooks up by the outerContentBlockDeltaindex,Stopremoves by its own index. Routing no longer depends on the equally-overwrittencurrent_block_kind; a successfulremove(&index)already proves the Stop belongs to a tool block.5-place inline edit in
turn_loop.rs, no public API changes.Testing
cargo test --all-features— all related tests pass (3 pre-existing flakytools::web_run::tests::*failures unrelated, reproducible onmainunder high concurrency, pass individually)cargo fmt --all -- --checkcargo clippy --all-targets --all-features— zero warningsbatch_tool_calls_preserve_all_tool_use_indicesincore/engine/turn_loop.rs::testswrite_filetool_use blocks paired with their tool_result blocks in the assistant historyChecklist