Skip to content

Add deferred flusher to Gridstore#7446

Merged
timvisee merged 13 commits intodevfrom
gridstore-flush-closure
Nov 18, 2025
Merged

Add deferred flusher to Gridstore#7446
timvisee merged 13 commits intodevfrom
gridstore-flush-closure

Conversation

@timvisee
Copy link
Member

@timvisee timvisee commented Oct 23, 2025

Add a deferred flusher to Gridstore, using which we flush asynchronously through a closure.

This makes the flushing behavior of Gridstore consistent with the other storage types we have.

Specifically this:

  1. takes a copy of pending updates in the tracker
  2. deferred, in a closure:
    a. write the copied updates to disk (ignoring any new updates)
    b. bump the list of pending updates to remove the now persisted entries

This is a building block I need for #7443.

As a bonus this removes all Arc<RwLock<_>> wrappers around Gridstore, making the world a nicer place 🍀. It now also respects &self/&mut self.

Crasher run: https://github.com/qdrant/crasher/actions/runs/18755857443 🟢

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

@timvisee timvisee force-pushed the gridstore-flush-closure branch 2 times, most recently from c953108 to e98073c Compare October 23, 2025 16:47
@timvisee timvisee marked this pull request as ready for review October 23, 2025 16:47
@timvisee timvisee force-pushed the gridstore-flush-closure branch from e98073c to 8fb001f Compare October 23, 2025 16:50

/// Create flusher that durably persists all pending changes when invoked
pub fn flusher(&self) -> Flusher {
let pending_updates = self.tracker.read().pending_updates.clone();
Copy link
Member Author

@timvisee timvisee Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Key change: here we now take a snapshot of all pending changes. Only these are flushed. This is consistent with our other flushers.

Comment on lines +253 to +260
old_pointers.extend(updates.to_outdated_pointers());

// Bump the pending updates for this point offset, drop entry if fully persisted
if let Some(pending_updates) = self.pending_updates.get_mut(&point_offset)
&& pending_updates.drain_persisted_and_drop(&updates)
{
self.pending_updates.remove(&point_offset);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function now takes the snapshot of pending changes.

Here we bump the latest pending changes, so that we drain all changes that we now persist.

Once this function is complete, the list of pending changes will only contain changes that still need to be persisted.

@qdrant qdrant deleted a comment from coderabbitai bot Oct 23, 2025
coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Oct 23, 2025
coderabbitai[bot]

This comment was marked as resolved.

@timvisee timvisee force-pushed the gridstore-flush-closure branch from bda5571 to 2181b95 Compare October 24, 2025 10:43
@timvisee
Copy link
Member Author

This now has a bunch of tests added to ensure correct behavior. The tests are actually the majority of this PR.

@qdrant qdrant deleted a comment from coderabbitai bot Oct 24, 2025
@timvisee timvisee force-pushed the gridstore-flush-closure branch from 2181b95 to 4738476 Compare October 24, 2025 10:49
coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

let instant = Instant::now();

storage.flush().unwrap();
storage.flusher()().unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please run the flush_bench for a quick sanity check that the performance is similar to dev 🤞

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

It is worse than I expected. It may just be because now there's extra work. Not sure yet if we have to be very concerned about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done some basic profiling on the benchmark and I cannot find a clear root cause for the degradation.

The previous code was sub-millisecond fast already, maybe the additional locking makes things slower 🤷

I think this is fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 μs even for 100 updates. Likely measuring locks.

Agreed 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bustle_bench does not show any noticeable degradation 👍

coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Oct 27, 2025
@timvisee timvisee added the do not merge Pull requests blocked on an external event label Oct 31, 2025
@timvisee
Copy link
Member Author

Let's postpone merging this till after the 1.16 release. Then we can test it on chaos testing for some time without the risk of affecting the release.

@timvisee timvisee force-pushed the gridstore-flush-closure branch from 575ec80 to d482fa7 Compare November 18, 2025 12:30
coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Nov 18, 2025
@timvisee timvisee removed the do not merge Pull requests blocked on an external event label Nov 18, 2025
@timvisee timvisee requested a review from agourlay November 18, 2025 13:11
@timvisee timvisee merged commit 8bbeda5 into dev Nov 18, 2025
21 of 22 checks passed
@timvisee timvisee deleted the gridstore-flush-closure branch November 18, 2025 13:21
timvisee added a commit that referenced this pull request Nov 25, 2025
* Wrap the tracker and bitmask in Arc

* Implement deferred flusher for gridstore, defer tracker writes

* Remove clone

* Dynamically adjust list of pending updates to drain what is flushed

* Return proper flusher, defer premature Gridstore flushing errors

* Use deferred Gridstore flusher across storages

* Remove all Arc<RwLock<_>> wrappers around Gridstore in storages

* Flush pages inside closure, also wrap them in a lock

* Remove direct flush function from Gridstore

* Add test to assert behavior of deferred flushing in Gridstore

* Add much more extensive test, including concurrent flushes and deletes

* Add test for draining value pointer, make sure to drop when flushed

* Feature gate RocksDB components
@timvisee timvisee mentioned this pull request Nov 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants