Batch completion events to reduce notify() calls during streaming#46802
Merged
Batch completion events to reduce notify() calls during streaming#46802
Conversation
Instead of calling cx.notify() for every token during agent streaming, batch all immediately-available events using now_or_never() and process them in a single update() call with one notify() at the end. This is deterministic - it processes exactly what's available right now, adapting naturally to network speed. When tokens arrive slowly, you get one at a time. When they arrive in bursts, they batch automatically. Changes: - Remove cx.notify() from handle_text_event, handle_thinking_event, and handle_redacted_thinking_event - Batch events in the streaming loop using now_or_never() - Call cx.notify() once per batch instead of per-event - Keep cx.notify() in handle_tool_use_event for immediate tool feedback
Contributor
rtfeldman
added a commit
that referenced
this pull request
Jan 14, 2026
…6802) ## Problem Profiling showed that during agent streaming, `thread.rs:1393:23` was appearing constantly as a hotspot. The issue was that every single token from the model triggered: 1. `this.update(cx, ...)` 2. `handle_text_event()` (or thinking/redacted_thinking) 3. `cx.notify()` 4. UI re-render This created significant foreground thread pressure, contributing to ~500ms delays visible in the profiler. ## Solution Batch all immediately-available events using `now_or_never()` and process them in a single `update()` call with one `notify()` at the end. This approach is deterministic - it processes exactly what's available right now, adapting naturally to network speed: - When tokens arrive slowly, you get one at a time - When they arrive in bursts, they batch automatically ## Changes - Remove `cx.notify()` from `handle_text_event`, `handle_thinking_event`, and `handle_redacted_thinking_event` - Batch events in the streaming loop using `now_or_never()` - Call `cx.notify()` once per batch instead of per-event - Keep `cx.notify()` in `handle_tool_use_event` for immediate tool feedback Release Notes: - Improve streaming tool call performance by batching UI updates.
rtfeldman
pushed a commit
that referenced
this pull request
Jan 15, 2026
Closes #ISSUE Reproduction steps: - Zed main branch - Use Ollama (likely provider agnostic) in the agent panel with the following prompt: "Explore the codebase with subagents" - Panic ``` thread 'main' (36268) panicked at C:\Users\username\.cargo\registry\src\index.crates.io-1949cf8c6b5b557f\futures-util-0.3.31\src\stream\unfold.rs:108:21: Unfold must not be polled after it returned `Poll::Ready(None)` ``` Following the stack trace we get to `Thread::run_turn_internal`. I believe the panic happens in the following code which was introduced in #46802 ```rust // Collect all immediately available events to process as a batch let mut batch = vec![first_event]; while let Some(event) = events.next().now_or_never().flatten() { batch.push(event); } ``` Both `Option`s get flattened, however the inner `Option` represents the end of the stream, after which polling the stream using `.next()` will result in a panic. We could fix the logic in this particular spot, but I believe the simpler solution is to `.fuse()` the stream, which stops the stream from panic'ing even after it has ended. This also prevents misuse in the future. The panic was introduces on main and did not land on a release yet, so no release notes. Release Notes: - N/A *or* Added/Fixed/Improved ...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Profiling showed that during agent streaming,
thread.rs:1393:23was appearing constantly as a hotspot. The issue was that every single token from the model triggered:this.update(cx, ...)handle_text_event()(or thinking/redacted_thinking)cx.notify()This created significant foreground thread pressure, contributing to ~500ms delays visible in the profiler.
Solution
Batch all immediately-available events using
now_or_never()and process them in a singleupdate()call with onenotify()at the end.This approach is deterministic - it processes exactly what's available right now, adapting naturally to network speed:
Changes
cx.notify()fromhandle_text_event,handle_thinking_event, andhandle_redacted_thinking_eventnow_or_never()cx.notify()once per batch instead of per-eventcx.notify()inhandle_tool_use_eventfor immediate tool feedbackRelease Notes: