Skip to content

Batch completion events to reduce notify() calls during streaming#46802

Merged
rtfeldman merged 1 commit intomainfrom
improve-perf
Jan 14, 2026
Merged

Batch completion events to reduce notify() calls during streaming#46802
rtfeldman merged 1 commit intomainfrom
improve-perf

Conversation

@rtfeldman
Copy link
Contributor

@rtfeldman rtfeldman commented Jan 14, 2026

Problem

Profiling showed that during agent streaming, thread.rs:1393:23 was appearing constantly as a hotspot. The issue was that every single token from the model triggered:

  1. this.update(cx, ...)
  2. handle_text_event() (or thinking/redacted_thinking)
  3. cx.notify()
  4. UI re-render

This created significant foreground thread pressure, contributing to ~500ms delays visible in the profiler.

Solution

Batch all immediately-available events using now_or_never() and process them in a single update() call with one notify() at the end.

This approach is deterministic - it processes exactly what's available right now, adapting naturally to network speed:

  • When tokens arrive slowly, you get one at a time
  • When they arrive in bursts, they batch automatically

Changes

  • Remove cx.notify() from handle_text_event, handle_thinking_event, and handle_redacted_thinking_event
  • Batch events in the streaming loop using now_or_never()
  • Call cx.notify() once per batch instead of per-event
  • Keep cx.notify() in handle_tool_use_event for immediate tool feedback

Release Notes:

  • Improve streaming tool call performance by batching UI updates.

Instead of calling cx.notify() for every token during agent streaming,
batch all immediately-available events using now_or_never() and process
them in a single update() call with one notify() at the end.

This is deterministic - it processes exactly what's available right now,
adapting naturally to network speed. When tokens arrive slowly, you get
one at a time. When they arrive in bursts, they batch automatically.

Changes:
- Remove cx.notify() from handle_text_event, handle_thinking_event,
  and handle_redacted_thinking_event
- Batch events in the streaming loop using now_or_never()
- Call cx.notify() once per batch instead of per-event
- Keep cx.notify() in handle_tool_use_event for immediate tool feedback
@cla-bot cla-bot bot added the cla-signed The user has signed the Contributor License Agreement label Jan 14, 2026
@zed-industries-bot
Copy link
Contributor

Warnings
⚠️

This PR is missing release notes.

Please add a "Release Notes" section that describes the change:

Release Notes:

- Added/Fixed/Improved ...

If your change is not user-facing, you can use "N/A" for the entry:

Release Notes:

- N/A

Generated by 🚫 dangerJS against 3594c4f

@rtfeldman rtfeldman marked this pull request as ready for review January 14, 2026 17:14
@rtfeldman rtfeldman merged commit 5e9634c into main Jan 14, 2026
26 checks passed
@rtfeldman rtfeldman deleted the improve-perf branch January 14, 2026 17:14
rtfeldman added a commit that referenced this pull request Jan 14, 2026
…6802)

## Problem

Profiling showed that during agent streaming, `thread.rs:1393:23` was
appearing constantly as a hotspot. The issue was that every single token
from the model triggered:
1. `this.update(cx, ...)` 
2. `handle_text_event()` (or thinking/redacted_thinking)
3. `cx.notify()`
4. UI re-render

This created significant foreground thread pressure, contributing to
~500ms delays visible in the profiler.

## Solution

Batch all immediately-available events using `now_or_never()` and
process them in a single `update()` call with one `notify()` at the end.

This approach is deterministic - it processes exactly what's available
right now, adapting naturally to network speed:
- When tokens arrive slowly, you get one at a time
- When they arrive in bursts, they batch automatically

## Changes

- Remove `cx.notify()` from `handle_text_event`,
`handle_thinking_event`, and `handle_redacted_thinking_event`
- Batch events in the streaming loop using `now_or_never()`
- Call `cx.notify()` once per batch instead of per-event
- Keep `cx.notify()` in `handle_tool_use_event` for immediate tool
feedback

Release Notes:
- Improve streaming tool call performance by batching UI updates.
rtfeldman pushed a commit that referenced this pull request Jan 15, 2026
Closes #ISSUE

Reproduction steps:

- Zed main branch
- Use Ollama (likely provider agnostic) in the agent panel with the
following prompt: "Explore the codebase with subagents"
- Panic

```
thread 'main' (36268) panicked at C:\Users\username\.cargo\registry\src\index.crates.io-1949cf8c6b5b557f\futures-util-0.3.31\src\stream\unfold.rs:108:21:
Unfold must not be polled after it returned `Poll::Ready(None)`
```

Following the stack trace we get to `Thread::run_turn_internal`. I
believe the panic happens in the following code which was introduced in
#46802

```rust
// Collect all immediately available events to process as a batch
let mut batch = vec![first_event];
while let Some(event) = events.next().now_or_never().flatten() {
    batch.push(event);
}
```

Both `Option`s get flattened, however the inner `Option` represents the
end of the stream, after which polling the stream using `.next()` will
result in a panic.

We could fix the logic in this particular spot, but I believe the
simpler solution is to `.fuse()` the stream, which stops the stream from
panic'ing even after it has ended. This also prevents misuse in the
future.

The panic was introduces on main and did not land on a release yet, so
no release notes.

Release Notes:

- N/A *or* Added/Fixed/Improved ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed The user has signed the Contributor License Agreement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants