Fix Gridstore flushing, correctly implement drain persist#7741
Conversation
lib/gridstore/src/tracker.rs
Outdated
| } | ||
|
|
||
| #[test] | ||
| fn test_value_pointer_drain_set_unset_set() { |
There was a problem hiding this comment.
New test is here. This failed before fixing drain_persisted_and_drop.
| // In current history, range of unset entries | ||
| let unset_range = if self.latest_is_set { | ||
| 0..self.history.len().saturating_sub(1) | ||
| } else { | ||
| 0..self.history.len() | ||
| }; |
There was a problem hiding this comment.
I'll significantly improve this in a future PR where I'll use a separate set/unset list.
| debug_assert_eq!( | ||
| self.history.iter().copied().collect::<AHashSet<_>>().len(), | ||
| self.history.len(), | ||
| "self must not have duplicate pointers in history", | ||
| ); |
There was a problem hiding this comment.
We also do the same debug assertion in the caller of this function. But I vote to keep it for now, until I more properly refactor it later. It does not affect release builds.
| (Some(last), Some(set)) if last == set => { | ||
| self.history.pop(); | ||
| self.latest_is_set = false; | ||
| } |
There was a problem hiding this comment.
Isn't this the same as checking if the entire PointerUpdates is the same? In such case we can return early here, since the rest of the code will remove the remaining history.
There was a problem hiding this comment.
I also thought so.
But in this function I implemented the complete thing on purpose, because I don't want to make any assumptions anymore. Don't want to spend another week on it in the future. 🙃
* Fix Gridstore tracker drain and persist not working properly * Patch existing test * Add new test to assert buggy scenario we found * Mention PR in test
Fix a critical Gridstore bug that breaks flushing with alternating puts and deletes.
Gridstore uses deferred flushing. It means that we copy the list of pending updates when the flusher is created. Once flushing is invoked and updates are persisted, we remove these from the current list of updates (which might already have new entries). This 'removing' we call 'drain persisted'. It is critical to persist every change only once.
This drain function was broken and failed on some edge cases. For the problematic edge case we found in testing, I've added a new test.
The problem only appears when deferring a flush (for some time). If flushing immediately, the pending and persisted set of updates are equal and everything works as expected.
Bear in mind that I use the concept of a 'set' and 'unset' list in this PR. I plan to open a separate PR to apply this concept everywhere as it's much more easy to understand. Since it's only partially applied in this PR it might look like boilerplate.
All Submissions:
devbranch. Did you create your branch fromdev?Changes to Core Features: