Conversation
lib/collection/src/collection_manager/optimizers/segment_optimizer.rs
Outdated
Show resolved
Hide resolved
lib/collection/src/collection_manager/optimizers/segment_optimizer.rs
Outdated
Show resolved
Hide resolved
90e710f to
9e8323e
Compare
363a137 to
f98b8f5
Compare
* fix deadlock * move LockedSegmentHolder into a dedicated file * reorder locks for snapshot * fmt * upd comments
| // Unproxy all segments | ||
| // Always do this to prevent leaving proxy segments behind | ||
| log::trace!("Unproxying all shard segments after function is applied"); | ||
| let _update_guard = update_lock.blocking_write(); |
There was a problem hiding this comment.
I think this was incorrectly used. Logic inside try_unproxy_segment and unproxy_all_segments was usign 2-step propagation of updates, which assumes segment could change during the first stage, but this lock should have prevented it already on the first step.
New logic changes it into just a single step propagation with updates guard as an explicit parameter, so function can't be misused.
| proxy_segment: LockedSegment, | ||
| ) -> Result<RwLockUpgradableReadGuard<SegmentHolder>, RwLockUpgradableReadGuard<SegmentHolder>> | ||
| { | ||
| updates_guard: parking_lot::MutexGuard<'a, ()>, // Guarantee no updates are happening |
There was a problem hiding this comment.
this lock should guarantee there are no parallel updates happening
There was a problem hiding this comment.
Making it a newtype could make it clearer.
There was a problem hiding this comment.
new type for guard? you mean like this pub type = ... ?
There was a problem hiding this comment.
I mean a wrapper type: https://doc.rust-lang.org/rust-by-example/generics/new_types.html
pub type doesn't create a new type, it creates merely an alias.
There was a problem hiding this comment.
@copilot suggest a type wrapper for updates_guard
| let mut write_segments = RwLockUpgradableReadGuard::upgrade(segments_lock); | ||
|
|
||
| // Batch 2: propagate changes to wrapped segment with segment holder write lock | ||
| // Propagate proxied changes to wrapped segment, take it out and swap with proxy | ||
| // Important: put the wrapped segment back with its original segment ID | ||
| let wrapped_segment = { | ||
| let mut proxy_segment = proxy_segment.write(); | ||
| if let Err(err) = proxy_segment.propagate_to_wrapped() { | ||
| log::error!( | ||
| "Propagating proxy segment {segment_id} changes to wrapped segment failed, ignoring: {err}", | ||
| ); | ||
| } | ||
| proxy_segment.wrapped_segment.clone() | ||
| }; | ||
| let wrapped_segment = proxy_segment.read().wrapped_segment.clone(); | ||
| write_segments.replace(segment_id, wrapped_segment).unwrap(); | ||
|
|
||
| drop(updates_guard); // Release updates lock as soon as possible |
There was a problem hiding this comment.
section where reads are locked is very minimal now. Updates can hold in queue
lib/collection/src/collection_manager/optimizers/segment_optimizer.rs
Outdated
Show resolved
Hide resolved
…izer.rs Co-authored-by: Ivan Boldyrev <ivan.boldyrev@qdrant.com>
This comment was marked as resolved.
This comment was marked as resolved.
* Initial plan * Add UpdatesGuard newtype wrapper for updates_guard Co-authored-by: generall <1935623+generall@users.noreply.github.com> * Fix linter issue - remove trailing space in doc comment Co-authored-by: generall <1935623+generall@users.noreply.github.com> * Remove unnecessary new() method, use tuple constructor directly Co-authored-by: generall <1935623+generall@users.noreply.github.com> * fix clippy --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: generall <1935623+generall@users.noreply.github.com> Co-authored-by: generall <andrey@vasnetsov.com>
Co-authored-by: Jojii <15957865+JojiiOfficial@users.noreply.github.com>
timvisee
left a comment
There was a problem hiding this comment.
I'd like to see a benchmark on this one. I'm fine with triggering one in CI after merging.
* non-locking retrieve * optimization & write exclusive lock via upgradable_read * faster segment holder locks debug (#8024) * fix deadlock * move LockedSegmentHolder into a dedicated file * reorder locks for snapshot * fmt * upd comments * Update lib/collection/src/collection_manager/optimizers/segment_optimizer.rs Co-authored-by: Ivan Boldyrev <ivan.boldyrev@qdrant.com> * Add UpdatesGuard type wrapper for updates_guard (#8031) * Initial plan * Add UpdatesGuard newtype wrapper for updates_guard Co-authored-by: generall <1935623+generall@users.noreply.github.com> * Fix linter issue - remove trailing space in doc comment Co-authored-by: generall <1935623+generall@users.noreply.github.com> * Remove unnecessary new() method, use tuple constructor directly Co-authored-by: generall <1935623+generall@users.noreply.github.com> * fix clippy --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: generall <1935623+generall@users.noreply.github.com> Co-authored-by: generall <andrey@vasnetsov.com> * Remove aliassed lifetime Co-authored-by: Jojii <15957865+JojiiOfficial@users.noreply.github.com> --------- Co-authored-by: Ivan Boldyrev <ivan.boldyrev@qdrant.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: generall <1935623+generall@users.noreply.github.com> Co-authored-by: Tim Visée <tim+github@visee.me> Co-authored-by: Jojii <15957865+JojiiOfficial@users.noreply.github.com>
This PR showcases an alternative approach for dealing with a lock of SegmentsHolder.
Previously, we used to lock SegmentsHolder for the whole duration of the read and write operations, while locking for write during swap of optimized segments.
This approach could cause the following:
This can cause tail latencies spikes and maybe even timeouts.
What new approach does:
As a result, searches should be minimally affected by optimized segment swap.
Downside: we need to do extra copy of segment arcs, but it seems to be an unmeasurable difference.
UPD:
During implementation, I faced unexpected deadlock scenario, which led to further investigation of how LockerSegmentHolder is used. I changed it from type alias to dedicated type, so code navidation now becomes a little bit easier. Additionally I had to inspect and change locking pattern in the snapshot fucntion.
Overall locking schema how looks like this:
ToDo:
this PR only introduces new mechanism to one search request. Further PRs should apply it to all suitable read methods.