Remove `UpdateSignal::Stop` by IvanPleshkov · Pull Request #8050 · qdrant/qdrant

IvanPleshkov · 2026-02-03T18:48:01Z

Summary

This PR removes UpdateSignal::Stop and changes WAL truncation mechanism.

Description

When implementing the update queue feature, we discovered a problem with the current worker stop mechanism. If the update worker has many pending operations in the channel, sending UpdateSignal::Stop means we have to wait until all operations are processed before the worker actually stops - the stop signal sits at the end of the queue behind all pending operations.

As the solution, update worker does not have a stop signal. Instead, channel and stop signal are checked using tokio::select. Also, update worker returns the channel with all pending operations.

Wal truncation

Because new stopping mechanism allows to stop update worker immediately and get all pending operations, this PR also refactors WAL truncation, where it's not necessary anymore to make overcomplicated update worker.

Open problems

Can we change the update channel size via config update? If yes, there is a problem in the config optimizer update, where we cannot fill the new update worker if the config shrinks the update queue size

refactor wal truncation restart workers even if wal truncation fails

lib/collection/src/update_workers/update_worker.rs

lib/collection/src/shards/local_shard/shard_ops.rs

timvisee · 2026-02-05T09:42:59Z

lib/collection/src/shards/local_shard/updaters.rs

+        // Swap to new sender - new operations will go to the new channel
+        let _old_sender = self.update_sender.swap(Arc::new(update_sender));
+        // Signal all workers to stop
        update_handler.stop_flush_worker();
+        update_handler.stop_update_worker();
+
+        // Wait for workers to finish and get pending operations from the old channel
+        let pending_receiver = update_handler.wait_workers_stops().await?;
+
+        // Forward pending operations from old receiver to new channel
+        if let Some(mut old_receiver) = pending_receiver {
+            let sender = self.update_sender.load();
+            while let Ok(signal) = old_receiver.try_recv() {
+                // Forward pending operations to new channel
+                // Use try_send to avoid blocking - if channel is full, operations are dropped
+                let _ = sender.try_send(signal);
+            }
+        }


Why do we create a new channel here? Can't we repurpose the existing receiver since we now get it back?

Not sure how to behave properly.
If we repurpose the existing one - we cannot change the update queue size, only after restart.
If we change the update queue size, we may lost operations if a new size is smaller.
What I can propose. Create a new channel only when there is a new update queue size and old update queue fits the new size

Because update queue size is a node-level config and cannot be changed in runtime, I changed this logic and reuse old receiver.
In case of a non-existing receiver, I propose to create a new channel instead of returning an error.

This cannot be changed at runtime. We agreed to handle the case of 'lower setting than current required queue size' in a separate PR.

coderabbitai · 2026-02-05T14:26:08Z

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

The PR refactors LocalShard's worker management and shutdown flow by replacing channel-based Stop signals with direct method invocations. It introduces a CancellationToken mechanism for the update worker instead of atomic skip-updates guards. The UpdateSignal enum is simplified by removing the Stop variant and adjusting Plunger. The wait_update_workers_stop method signature changes to return pending operation receiver information. WAL truncation is reimplemented to acquire locks, stop workers, collect pending operations, perform truncation, and restart workers with fresh channels. Test expectations for WAL recovery are tightened accordingly.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Notify optimization handles to stop on shard drop #7765 — Directly related; adds notify_optimization_handles_to_stop method invoked during LocalShard drop.
WAL replay honors applied_seq #8008 — Related; modifies WAL replay workflow and UpdateSignal/update_handler channel-based receiver collection.
Persist applied seq at regular interval #7976 — Related; modifies update_worker_fn signature, return type, and UpdateHandler public APIs.

Suggested reviewers

timvisee
agourlay

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch remove-UpdateSignal-Stop

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

* remove UpdateSignal Stop refactor wal truncation restart workers even if wal truncation fails * dont duplicate notification code in update worker * send manually cancellation token to update worker * revert stop_update_workers and keep as is * refactor wal truncation * are you happy clippy * Inline breaks into single block * Remove closure and call directly * dont recreate channels in config update * more comments * debug assert * fmt --------- Co-authored-by: timvisee <tim@visee.me> Co-authored-by: Andrey Vasnetsov <andrey@vasnetsov.com>

IvanPleshkov added this to the Update queue milestone Feb 3, 2026

remove UpdateSignal Stop

15e914f

refactor wal truncation restart workers even if wal truncation fails

IvanPleshkov force-pushed the remove-UpdateSignal-Stop branch from f0481d0 to 15e914f Compare February 3, 2026 20:01

github-actions bot mentioned this pull request Feb 3, 2026

Flaky test common::stoppable_task::tests::test_task_stop #6876

Open

IvanPleshkov added 5 commits February 4, 2026 10:05

dont duplicate notification code in update worker

63006ce

send manually cancellation token to update worker

de84046

revert stop_update_workers and keep as is

2e98e2b

refactor wal truncation

07967b0

are you happy clippy

61e8572

IvanPleshkov marked this pull request as ready for review February 4, 2026 19:06

IvanPleshkov requested review from generall and timvisee February 4, 2026 19:06

timvisee added 2 commits February 5, 2026 10:46

Inline breaks into single block

a7c40a7

Remove closure and call directly

295d54b

timvisee reviewed Feb 5, 2026

View reviewed changes

qdrant deleted a comment from coderabbitai bot Feb 5, 2026

timvisee mentioned this pull request Feb 5, 2026

Enable update queue #8046

Merged

5 tasks

IvanPleshkov added 2 commits February 5, 2026 13:46

dont recreate channels in config update

fde297b

more comments

bd44817

qdrant deleted a comment from coderabbitai bot Feb 5, 2026

timvisee approved these changes Feb 5, 2026

View reviewed changes

generall added 2 commits February 5, 2026 15:22

debug assert

5e317e7

fmt

7545c83

generall approved these changes Feb 5, 2026

View reviewed changes

generall merged commit 90923e5 into dev Feb 5, 2026
12 checks passed

generall deleted the remove-UpdateSignal-Stop branch February 5, 2026 14:23

coderabbitai bot mentioned this pull request Feb 12, 2026

Dont lock WAL while serialization #8093

Merged

coderabbitai bot mentioned this pull request Feb 16, 2026

Enable lint detecting ignored unit patterns #8142

Closed

This was referenced Mar 10, 2026

Change backpressure policy for deferred points #8239

Merged

Add internal wait enum, don't let forward proxy wait on optimizers #8394

Merged

Fix update worker hanging in deferred wait loop on config change #8410

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove `UpdateSignal::Stop`#8050

Remove `UpdateSignal::Stop`#8050
generall merged 12 commits intodevfrom
remove-UpdateSignal-Stop

IvanPleshkov commented Feb 3, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

timvisee Feb 5, 2026

Uh oh!

IvanPleshkov Feb 5, 2026 •

edited

Loading

Uh oh!

IvanPleshkov Feb 5, 2026

Uh oh!

timvisee Feb 5, 2026

Uh oh!

Uh oh!

coderabbitai bot commented Feb 5, 2026

Review failed

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

IvanPleshkov commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Description

Wal truncation

Open problems

Uh oh!

Uh oh!

Uh oh!

timvisee Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

IvanPleshkov Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IvanPleshkov Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

timvisee Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot commented Feb 5, 2026

Review failed

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

IvanPleshkov commented Feb 3, 2026 •

edited

Loading

IvanPleshkov Feb 5, 2026 •

edited

Loading