Update queue dont keep ops in ram by IvanPleshkov · Pull Request #7951 · qdrant/qdrant

IvanPleshkov · 2026-01-20T13:58:22Z

Summary

This PR reduces the RAM usage of the pending operations in Update Worker.

Description

Update worker has a channel with pending operations. Each pending has an OperationData, which contains the request data.

In case of bulk upload the channel may be huge and it consumes a lot of RAM.

This PR changes the logic of the Update Worker, and instead of storing the request in RAM, the Update Worker reads the request from the WAL instead.

Performance

With buffering, there were no performance drops found.

Using the vector db benchmark, the size of the update didnt reach the buffer size limit.
Using BFB, I got a full update queue using params --threads=8 --num-vectors=1000000. But still, the difference between runs is larger than the difference between branches.

Without buffering (by changing DEFAULT_UPDATE_QUEUE_RAM_BUFFER to zero), there is a performance drop. Because of deserialization, local measurements shown that deserialization might take 1-2ms of a large batched upsert. It's a fine price for unlimited update queue where ram usage has a higher priority than deserialization cost.

cound only pending in update worker operations fix typo use channel size instead of wal index

timvisee · 2026-01-26T15:53:37Z

lib/collection/src/shards/local_shard/shard_ops.rs

+            let keep_operation_in_ram = pending_operations_count < DEFAULT_UPDATE_QUEUE_RAM_BUFFER;
+            let operation = keep_operation_in_ram.then_some(Box::new(operation.operation));
+
            channel_permit.send(UpdateSignal::Operation(OperationData {


This struct is at least 152 bytes per operation, even for just a pointer to the WAL.

Just the hw_measurements field takes 120 bytes.

With that, we should probably box it too. An alternative would be an enum that satisfies all variants, but I cannot come up with a good one with these three fields.

I think it's worth a little bit of effort.

Thanks for this measurement, I didnt attach it to the PR description.
Just a boxing wont help because we still need this field, box will move a memory without a profit. But I can try as a separate PR to refactor hw counter, where I can try to reduce Arc count here (by arc-ing the whole structure):

pub struct HwSharedDrain { pub(crate) cpu_counter: Arc<AtomicUsize>, pub(crate) payload_io_read_counter: Arc<AtomicUsize>, pub(crate) payload_io_write_counter: Arc<AtomicUsize>, pub(crate) payload_index_io_read_counter: Arc<AtomicUsize>, pub(crate) payload_index_io_write_counter: Arc<AtomicUsize>, pub(crate) vector_io_read_counter: Arc<AtomicUsize>, pub(crate) vector_io_write_counter: Arc<AtomicUsize>, }

Is it fine for you?

Just a boxing wont help because we still need this field, box will move a memory without a profit. But I can try as a separate PR to refactor hw counter, where I can try to reduce Arc count here (by arc-ing the whole structure):

Good point. Even for values that we drop from memory will still have their HW aggregator attached. Thanks for the correction.

If this is only to report hardware statistics to wait=true responses. Maybe it'd be worth making wait=true and using the queue mutually exclusive. 🤔

But I can try as a separate PR to refactor hw counter, where I can try to reduce Arc count here (by arc-ing the whole structure):

Not sure on that one. Let's focus on the current feature first.

lib/shard/src/wal.rs

agourlay · 2026-01-27T12:25:53Z

lib/collection/src/shards/local_shard/shard_ops.rs

            let update_sender = self.update_sender.load();
+            let pending_operations_count = update_sender
+                .max_capacity()
+                .saturating_sub(update_sender.capacity());


Can you add a little comment that capacity is actually remaining available slots.
This is rather confusing from Tokio.

Good point, I also was confused when I met capacity. Added a comment

timvisee · 2026-01-27T12:35:36Z

lib/collection/src/operations/shared_storage_config.rs

+/// Maximum number of operations which are stored in RAM in update worker queue.
+/// If there are more pending operations, operation data
+/// will be read from WAL when processing the operation.
+pub const DEFAULT_UPDATE_QUEUE_RAM_BUFFER: usize = 50;


Any strict reason for lowering this number from 100 or 200?

To make the changes measurable. And you did a measurement where you found an upload speed drop.
I replied to your message with bfb and reverted this constant back to 200. Now this PR does not affect dev perf while the update queue is 100.

timvisee

I'm seeing a performance degradation which I don't expect.

Using bfb -t8 -p8 -b1 --num-vectors=250000 --skip-wait-index --indexing-threshold 0:

on PR: 50 seconds upsert time (~4800/s)
on dev: 38 seconds upset time (~6500/s)

IvanPleshkov · 2026-01-27T13:22:12Z

I'm seeing a performance degradation which I don't expect.

Using bfb -t8 -p8 -b1 --num-vectors=250000 --skip-wait-index --indexing-threshold 0:

on PR: 50 seconds upsert time (~4800/s)

on dev: 38 seconds upset time (~6500/s)

Checked locally. Maybe my hardware was not good enough to measure the drop, as I have around 29K/s on both branches.

I described in PR that deserialization has a huge price, and while reading from WAL cannot be avoided. Your measurement is more expected than mine.
As a solution, I propose increasing the buffer size up to the default update queue size. In this case, the behaviour won't be changed in comparison with dev. But we are ready to increase the update queue up to millions in the next PRs

timvisee · 2026-01-27T13:25:21Z

Checked locally. Maybe my hardware was not good enough to measure the drop, as I have around 29K/s on both branches.

Note that I tested on a debug build.

I described in PR that deserialization has a huge price, and while reading from WAL cannot be avoided. Your measurement is more expected than mine.
As a solution, I propose increasing the buffer size up to the default update queue size. In this case, the behaviour won't be changed in comparison with dev. But we are ready to increase the update queue up to millions in the next PRs

Are you suggesting that the 'lock, deserialize, unlock's on the WAL are causing this? Yeah I'll give setting the buffer size to unlimited a shot to see if it shows a different picture.

I'll also run two release builds to see how they behave.

IvanPleshkov · 2026-01-27T13:38:37Z

Checked locally. Maybe my hardware was not good enough to measure the drop, as I have around 29K/s on both branches.

Note that I tested on a debug build.

I described in PR that deserialization has a huge price, and while reading from WAL cannot be avoided. Your measurement is more expected than mine.
As a solution, I propose increasing the buffer size up to the default update queue size. In this case, the behaviour won't be changed in comparison with dev. But we are ready to increase the update queue up to millions in the next PRs

Are you suggesting that the 'lock, deserialize, unlock's on the WAL are causing this? Yeah I'll give setting the buffer size to unlimited a shot to see if it shows a different picture.

I'll also run two release builds to see how they behave.

To make sure that the reason is a deserialization, I measured possible drops (WAL blocking, IO, deserialization) using regular times and got milliseconds for a deserialization even in a release build. All other possible drop points were much smaller

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@lib/collection/src/shards/local_shard/shard_ops.rs`:
- Around line 74-80: Typo in the comment near pending_operations_count: change
"regarging" to "regarding" in the comment that explains Sender::capacity
behavior; update the comment referencing update_sender, max_capacity(), and
capacity() so it reads "regarding tokio docs" (or rephrase to "as per the Tokio
docs") to clear the codespell failure.

lib/collection/src/shards/local_shard/shard_ops.rs

timvisee · 2026-01-27T14:04:05Z

Checked locally. Maybe my hardware was not good enough to measure the drop, as I have around 29K/s on both branches.

Note that I tested on a debug build.

I described in PR that deserialization has a huge price, and while reading from WAL cannot be avoided. Your measurement is more expected than mine.
As a solution, I propose increasing the buffer size up to the default update queue size. In this case, the behaviour won't be changed in comparison with dev. But we are ready to increase the update queue up to millions in the next PRs

Are you suggesting that the 'lock, deserialize, unlock's on the WAL are causing this? Yeah I'll give setting the buffer size to unlimited a shot to see if it shows a different picture.

I'll also run two release builds to see how they behave.

@IvanPleshkov

Using debug build and bfb -t8 -p8 -b1 --num-vectors=250000 --skip-wait-index --indexing-threshold 0:

on PR(limit: 50): 50 seconds upsert time (~4800/s)
on PR(limit: 200): 38 seconds upsert time (~6500/s)
on PR(limit: 1m): 38 seconds upsert time (~6500/s)
on dev: 38 seconds upset time (~6500/s)

Using perf build and bfb -t8 -p8 -b1 --num-vectors=1000000 --skip-wait-index --indexing-threshold 0:

on pr(limit: 50): 33 seconds upsert time
on pr(limit: 200): 33 seconds upsert time
on pr(limit: 1m): 33 seconds upsert time
on dev: 33 seconds upsert time

It shows the problem goes away in release builds, at least for this work load.

Any objection for reverting the queue size limit from 50 to 200, to at least make the debug build case better?

IvanPleshkov · 2026-01-27T14:12:24Z

Checked locally. Maybe my hardware was not good enough to measure the drop, as I have around 29K/s on both branches.

Note that I tested on a debug build.

I described in PR that deserialization has a huge price, and while reading from WAL cannot be avoided. Your measurement is more expected than mine.
As a solution, I propose increasing the buffer size up to the default update queue size. In this case, the behaviour won't be changed in comparison with dev. But we are ready to increase the update queue up to millions in the next PRs

Are you suggesting that the 'lock, deserialize, unlock's on the WAL are causing this? Yeah I'll give setting the buffer size to unlimited a shot to see if it shows a different picture.
I'll also run two release builds to see how they behave.

@IvanPleshkov

Using debug build and bfb -t8 -p8 -b1 --num-vectors=250000 --skip-wait-index --indexing-threshold 0:

on PR(limit: 50): 50 seconds upsert time (~4800/s)

on PR(limit: 200): 38 seconds upsert time (~6500/s)

on PR(limit: 1m): 38 seconds upsert time (~6500/s)

on dev: 38 seconds upset time (~6500/s)

Using perf build and bfb -t8 -p8 -b1 --num-vectors=1000000 --skip-wait-index --indexing-threshold 0:

on pr(limit: 50): 33 seconds upsert time

on pr(limit: 200): 33 seconds upsert time

on pr(limit: 1m): 33 seconds upsert time

on dev: 33 seconds upsert time

It shows the problem goes away in release builds, at least for this work load.

Any objection for reverting the queue size limit from 50 to 200, to at least make the debug build case better?

No objections, I have already pushed the buffer size limit changes (to 500, after 100 it's no matter how large it is)

* update queue dont keep ops in ram showcase * always load operation from WAL * revert operations buffering cound only pending in update worker operations fix typo use channel size instead of wal index * remove result expect * decrease buffering const * review remarks * are you happy codespell

IvanPleshkov requested a review from agourlay January 20, 2026 13:59

IvanPleshkov force-pushed the update-queue-dont-keep-ops-in-ram-showcase branch 2 times, most recently from 22b7ed6 to 7078662 Compare January 26, 2026 00:25

IvanPleshkov changed the base branch from dev to save-applied-seq January 26, 2026 00:25

IvanPleshkov changed the title ~~Update queue dont keep ops in ram showcase~~ Update queue dont keep ops in ram Jan 26, 2026

agourlay force-pushed the save-applied-seq branch from 0d8a486 to b6a9a36 Compare January 26, 2026 09:51

IvanPleshkov added 3 commits January 26, 2026 13:54

update queue dont keep ops in ram showcase

48d0e90

always load operation from WAL

e81bff5

revert operations buffering

87bcd5a

cound only pending in update worker operations fix typo use channel size instead of wal index

IvanPleshkov force-pushed the update-queue-dont-keep-ops-in-ram-showcase branch from 5a254d8 to 87bcd5a Compare January 26, 2026 12:54

IvanPleshkov changed the base branch from save-applied-seq to dev January 26, 2026 12:56

IvanPleshkov added 2 commits January 26, 2026 14:24

remove result expect

0cdca92

decrease buffering const

a8fd11c

timvisee reviewed Jan 26, 2026

View reviewed changes

IvanPleshkov marked this pull request as ready for review January 26, 2026 15:58

This comment was marked as resolved.

Sign in to view

IvanPleshkov requested a review from timvisee January 26, 2026 16:09

agourlay reviewed Jan 27, 2026

View reviewed changes

lib/shard/src/wal.rs Show resolved Hide resolved

agourlay reviewed Jan 27, 2026

View reviewed changes

timvisee reviewed Jan 27, 2026

View reviewed changes

review remarks

9702c60

are you happy codespell

7b11dd6

IvanPleshkov requested review from agourlay and timvisee January 27, 2026 13:40

coderabbitai bot reviewed Jan 27, 2026

View reviewed changes

lib/collection/src/shards/local_shard/shard_ops.rs Show resolved Hide resolved

agourlay approved these changes Jan 27, 2026

View reviewed changes

github-actions bot mentioned this pull request Jan 27, 2026

Flaky test tests::test_cancel_optimization #7794

Open

timvisee approved these changes Jan 27, 2026

View reviewed changes

IvanPleshkov merged commit 8db893a into dev Jan 27, 2026
15 checks passed

IvanPleshkov deleted the update-queue-dont-keep-ops-in-ram-showcase branch January 27, 2026 14:18

coderabbitai bot mentioned this pull request Jan 27, 2026

Persist applied seq at regular interval #7976

Merged

agourlay added this to the Update queue milestone Jan 28, 2026

This was referenced Jan 28, 2026

Drop wal from api #8000

Merged

Update queue info into CollectionInfo #8010

Merged

WAL replay honors applied_seq #8008

Merged

coderabbitai bot mentioned this pull request Feb 4, 2026

Remove UpdateSignal::Stop #8050

Merged

This was referenced Feb 10, 2026

Dont lock WAL while serialization #8093

Merged

non blocking reading from wal #8101

Merged

Ensure WAL and shard clocks consistency when creating shard snapshot #8104

Merged

This was referenced Mar 13, 2026

Add internal wait enum, don't let forward proxy wait on optimizers #8394

Merged

Fix update worker hanging in deferred wait loop on config change #8410

Merged

Conversation

IvanPleshkov commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Description

Performance

Uh oh!

timvisee Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

IvanPleshkov Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

timvisee Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

agourlay Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

IvanPleshkov Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

timvisee Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

IvanPleshkov Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

timvisee left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IvanPleshkov commented Jan 27, 2026

Uh oh!

timvisee commented Jan 27, 2026

Uh oh!

IvanPleshkov commented Jan 27, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

timvisee commented Jan 27, 2026

Uh oh!

IvanPleshkov commented Jan 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

IvanPleshkov commented Jan 20, 2026 •

edited

Loading

timvisee left a comment •

edited

Loading