Bump segment version on {delete,update,upsert}-by-filter requests#7157
Bump segment version on {delete,update,upsert}-by-filter requests#7157
Conversation
|
Thanks! Happy to see a test for this as well. Before I approve, I tagged @ffuugoo for a review as well. We must validate this doesn't have a negative effect on partial snapshots for R/W segregation. |
it will likely have. |
|
@JojiiOfficial could you do some tests on this one to assert the behavior alongside partial snapshots? The concern is that all segments that get their version bumped are transferred (almost) in full, even though there is no data change. That may be expensive on (large) R/W deployments. An alternative might be to keep a separate version number in the segment holder. These empty updates could simply bump this separate version number. Though I'd rather keep it as is - not adding an extra version number if not required. |
|
Why WAL flush/truncate does not solve this? Do we only truncate up to common (min? max?) version among all segments? UPD: I think I remember now, that we truncate to either min or max flushed segment version, so if you do 100 ops that did not hit any segment you get 100 untruncated ops in WAL? |
Correct. So here we (currently) artificially bump a segment version if a -by-filter operation matched zero points, and then acknowledge it in the WAL, so that we prevent this problem. Or well, prevent a problem with WAL replay and very expensive operations taking a very long time, even if they were already properly flushed. |
|
Hm. Maybe we could add an in-memory counter specifically for filtered ops that did not match anything then... 😞 |
956de24 to
1f773bf
Compare
) * Bump segment versions on *-by-filter operations to acknowledge WAL * Assert exactly one segment with the new operation version * Review remarks * Separate segment version number to not affect partial snapshots * Adjust unit test * Remove old bump_segment_version implementations * Clarify max_persisted_segment_version_overwrite in flush_all * Move trait implementations - Clippy * Apply segment bumping to all *-by-filter functions * Remove AtomicOptionU64 * Update function name and comments * Remove max condition, start with overwrite value * Add assertion, can never have zero segments * Make max_persisted_segment_version_overwrite monotonic with fetch_max --------- Co-authored-by: timvisee <tim@visee.me>
Currently we don't acknowledge update-requests by filters that don't match any points, in write ahead log.
This triggers WAL replay when restart Qdrant. If there were many of such no-op operations, or large filters, this
sometimes significantly slowed down startup needlessly.
This PR fixes this issue by bumping segment versions manually, if no point was matched.