Skip to content

Fix Gridstore flushing, correctly implement drain persist#7741

Merged
timvisee merged 4 commits intodevfrom
fix-gridstore-tracker-drain-persist
Dec 11, 2025
Merged

Fix Gridstore flushing, correctly implement drain persist#7741
timvisee merged 4 commits intodevfrom
fix-gridstore-tracker-drain-persist

Conversation

@timvisee
Copy link
Member

@timvisee timvisee commented Dec 10, 2025

Fix a critical Gridstore bug that breaks flushing with alternating puts and deletes.

Gridstore uses deferred flushing. It means that we copy the list of pending updates when the flusher is created. Once flushing is invoked and updates are persisted, we remove these from the current list of updates (which might already have new entries). This 'removing' we call 'drain persisted'. It is critical to persist every change only once.

This drain function was broken and failed on some edge cases. For the problematic edge case we found in testing, I've added a new test.

The problem only appears when deferring a flush (for some time). If flushing immediately, the pending and persisted set of updates are equal and everything works as expected.

Bear in mind that I use the concept of a 'set' and 'unset' list in this PR. I plan to open a separate PR to apply this concept everywhere as it's much more easy to understand. Since it's only partially applied in this PR it might look like boilerplate.

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

@timvisee timvisee marked this pull request as ready for review December 10, 2025 16:45
}

#[test]
fn test_value_pointer_drain_set_unset_set() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New test is here. This failed before fixing drain_persisted_and_drop.

coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Dec 10, 2025
Comment on lines +159 to +164
// In current history, range of unset entries
let unset_range = if self.latest_is_set {
0..self.history.len().saturating_sub(1)
} else {
0..self.history.len()
};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll significantly improve this in a future PR where I'll use a separate set/unset list.

coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Dec 10, 2025
Comment on lines +134 to +138
debug_assert_eq!(
self.history.iter().copied().collect::<AHashSet<_>>().len(),
self.history.len(),
"self must not have duplicate pointers in history",
);
Copy link
Member Author

@timvisee timvisee Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also do the same debug assertion in the caller of this function. But I vote to keep it for now, until I more properly refactor it later. It does not affect release builds.

Copy link
Contributor

@coszio coszio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes make sense. Though I made an alternative solution to (IMO) simplify draining logic in #7745. Let's discuss tomorrow 😄

Comment on lines +145 to +148
(Some(last), Some(set)) if last == set => {
self.history.pop();
self.latest_is_set = false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the same as checking if the entire PointerUpdates is the same? In such case we can return early here, since the rest of the code will remove the remaining history.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought so.

But in this function I implemented the complete thing on purpose, because I don't want to make any assumptions anymore. Don't want to spend another week on it in the future. 🙃

@timvisee timvisee merged commit 14ebe94 into dev Dec 11, 2025
15 checks passed
@timvisee timvisee deleted the fix-gridstore-tracker-drain-persist branch December 11, 2025 09:27
timvisee added a commit that referenced this pull request Dec 18, 2025
* Fix Gridstore tracker drain and persist not working properly

* Patch existing test

* Add new test to assert buggy scenario we found

* Mention PR in test
@timvisee timvisee mentioned this pull request Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants