Add debug assertion for deleted version on proxy#7300
Conversation
| let point_version_in_write_segment = write_segment_guard | ||
| .point_version(point_id) | ||
| .unwrap_or_default(); | ||
| let mut was_deleted_in_writable = false; |
There was a problem hiding this comment.
a debug_assert here failed showing that we apply the same op_num twice on the write segment
Co-authored-by: Tim Visée <tim+github@visee.me>
Co-authored-by: Tim Visée <tim+github@visee.me>
Co-authored-by: Tim Visée <tim+github@visee.me>
📝 WalkthroughWalkthroughThe changes add safety checks when inserting deleted-point markers: move_if_exists (lib/shard/src/proxy_segment/mod.rs) and delete_point (lib/shard/src/proxy_segment/segment_entry.rs) now capture the previous value returned by the insert into deleted_points, compute whether the point was newly deleted, and include a debug_assert that any previously stored operation_version is strictly less than the current op_num. No public APIs were changed. Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/shard/src/proxy_segment/mod.rs(1 hunks)lib/shard/src/proxy_segment/segment_entry.rs(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead
Files:
lib/shard/src/proxy_segment/mod.rslib/shard/src/proxy_segment/segment_entry.rs
**/src/**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)
Files:
lib/shard/src/proxy_segment/mod.rslib/shard/src/proxy_segment/segment_entry.rs
🧠 Learnings (3)
📓 Common learnings
Learnt from: generall
PR: qdrant/qdrant#6403
File: lib/collection/src/collection_manager/holders/proxy_segment.rs:1178-1197
Timestamp: 2025-04-20T19:08:45.034Z
Learning: In the ProxySegment implementation, overwriting previous index changes when inserting a DeleteIfIncompatible operation is intentional. When an incompatible index is detected, the DeleteIfIncompatible operation should take precedence over any previously scheduled operations on that index.
Learnt from: generall
PR: qdrant/qdrant#7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.986Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.
📚 Learning: 2025-08-10T18:30:02.986Z
Learnt from: generall
PR: qdrant/qdrant#7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.986Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.
Applied to files:
lib/shard/src/proxy_segment/mod.rs
📚 Learning: 2025-04-20T19:08:45.034Z
Learnt from: generall
PR: qdrant/qdrant#6403
File: lib/collection/src/collection_manager/holders/proxy_segment.rs:1178-1197
Timestamp: 2025-04-20T19:08:45.034Z
Learning: In the ProxySegment implementation, overwriting previous index changes when inserting a DeleteIfIncompatible operation is intentional. When an incompatible index is detected, the DeleteIfIncompatible operation should take precedence over any previously scheduled operations on that index.
Applied to files:
lib/shard/src/proxy_segment/mod.rslib/shard/src/proxy_segment/segment_entry.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: e2e-tests
- GitHub Check: test-consistency
- GitHub Check: test-consensus-compose
- GitHub Check: integration-tests-consensus
- GitHub Check: storage-compat-test
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: integration-tests
- GitHub Check: lint
🔇 Additional comments (1)
lib/shard/src/proxy_segment/segment_entry.rs (1)
220-233: Same overwrite risk in the Proxy branch; apply the same guardMirror the fix from the Original branch to avoid clobbering a newer/equal delete record.
Apply this diff:
- let prev = self.deleted_points.write().insert( - point_id, - ProxyDeletedPoint { - local_version: op_num, - operation_version: op_num, - }, - ); - was_deleted = prev.is_none(); - if let Some(prev) = prev { - debug_assert!( - prev.operation_version < op_num, - "Overriding deleted flag {prev:?} with older op_num:{op_num}", - ) - } + let mut deleted_points = self.deleted_points.write(); + match deleted_points.get(&point_id).copied() { + Some(prev) => { + if prev.operation_version < op_num { + deleted_points.insert( + point_id, + ProxyDeletedPoint { + local_version: op_num, + operation_version: op_num, + }, + ); + } else { + debug_assert!( + prev.operation_version >= op_num, + "Refusing to override deleted flag {prev:?} with older or equal op_num:{op_num}", + ); + } + } + None => { + deleted_points.insert( + point_id, + ProxyDeletedPoint { + local_version: op_num, + operation_version: op_num, + }, + ); + was_deleted = true; + } + }
| let prev = deleted_points_write.insert( | ||
| point_id, | ||
| ProxyDeletedPoint { | ||
| local_version, | ||
| operation_version: op_num, | ||
| }, | ||
| ); | ||
| if let Some(prev) = prev { | ||
| debug_assert!( | ||
| prev.operation_version < op_num, | ||
| "Overriding deleted flag {prev:?} with older op_num:{op_num}", | ||
| ) | ||
| } |
There was a problem hiding this comment.
Avoid TOCTOU overwrite: re-check under write lock and only insert when strictly newer
Between the earlier read and this insert, another thread/proxy can add a newer delete for the same point. In release builds you’ll overwrite it, despite the debug_assert (debug-only). Re-check the existing entry under the write lock and skip insert unless the incoming op is strictly newer, and also consider local_version.
Apply this diff:
- let prev = deleted_points_write.insert(
- point_id,
- ProxyDeletedPoint {
- local_version,
- operation_version: op_num,
- },
- );
- if let Some(prev) = prev {
- debug_assert!(
- prev.operation_version < op_num,
- "Overriding deleted flag {prev:?} with older op_num:{op_num}",
- )
- }
+ match deleted_points_write.get(&point_id).copied() {
+ Some(prev) if prev.local_version >= local_version || prev.operation_version >= op_num => {
+ // Keep newer/equal record; do not override
+ }
+ _ => {
+ let prev = deleted_points_write.insert(
+ point_id,
+ ProxyDeletedPoint {
+ local_version,
+ operation_version: op_num,
+ },
+ );
+ if let Some(prev) = prev {
+ debug_assert!(
+ prev.operation_version < op_num,
+ "Overriding deleted flag {prev:?} with older op_num:{op_num}",
+ )
+ }
+ }
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let prev = deleted_points_write.insert( | |
| point_id, | |
| ProxyDeletedPoint { | |
| local_version, | |
| operation_version: op_num, | |
| }, | |
| ); | |
| if let Some(prev) = prev { | |
| debug_assert!( | |
| prev.operation_version < op_num, | |
| "Overriding deleted flag {prev:?} with older op_num:{op_num}", | |
| ) | |
| } | |
| match deleted_points_write.get(&point_id).copied() { | |
| Some(prev) if prev.local_version >= local_version || prev.operation_version >= op_num => { | |
| // Keep newer/equal record; do not override | |
| } | |
| _ => { | |
| let prev = deleted_points_write.insert( | |
| point_id, | |
| ProxyDeletedPoint { | |
| local_version, | |
| operation_version: op_num, | |
| }, | |
| ); | |
| if let Some(prev) = prev { | |
| debug_assert!( | |
| prev.operation_version < op_num, | |
| "Overriding deleted flag {prev:?} with older op_num:{op_num}", | |
| ) | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
In lib/shard/src/proxy_segment/mod.rs around lines 197 to 209, after acquiring
the write lock you must re-check any existing entry for point_id and only
insert/replace when the incoming ProxyDeletedPoint (op_num and local_version) is
strictly newer; fetch the current entry from deleted_points_write, compare
operation_version (and tie-break with local_version if needed), and if the
existing entry is newer or equal then skip the insert; otherwise perform the
insert/replace with the new ProxyDeletedPoint so you avoid the TOCTOU overwrite
that the debug_assert alone does not prevent in release builds.
| let prev = self.deleted_points.write().insert( | ||
| point_id, | ||
| ProxyDeletedPoint { | ||
| local_version: op_num, | ||
| operation_version: op_num, | ||
| }, | ||
| ); | ||
| was_deleted = prev.is_none(); | ||
| if let Some(prev) = prev { | ||
| debug_assert!( | ||
| prev.operation_version < op_num, | ||
| "Overriding deleted flag {prev:?} with older op_num:{op_num}", | ||
| ) | ||
| .is_none(); | ||
| } |
There was a problem hiding this comment.
Do not overwrite a newer/equal delete; debug_assert is insufficient in release builds
This unconditionally overwrites an existing entry, and the debug_assert only fires in debug. In release, an older/equal op_num can silently clobber a newer delete scheduled by another thread/proxy, breaking correctness.
Guard before insert and only replace when the incoming op_num is strictly newer.
Apply this diff:
- let prev = self.deleted_points.write().insert(
- point_id,
- ProxyDeletedPoint {
- local_version: op_num,
- operation_version: op_num,
- },
- );
- was_deleted = prev.is_none();
- if let Some(prev) = prev {
- debug_assert!(
- prev.operation_version < op_num,
- "Overriding deleted flag {prev:?} with older op_num:{op_num}",
- )
- }
+ let mut deleted_points = self.deleted_points.write();
+ match deleted_points.get(&point_id).copied() {
+ Some(prev) => {
+ if prev.operation_version < op_num {
+ // Upgrade to newer op
+ deleted_points.insert(
+ point_id,
+ ProxyDeletedPoint {
+ local_version: op_num,
+ operation_version: op_num,
+ },
+ );
+ } else {
+ // Keep newer/equal entry
+ debug_assert!(
+ prev.operation_version >= op_num,
+ "Refusing to override deleted flag {prev:?} with older or equal op_num:{op_num}",
+ );
+ }
+ }
+ None => {
+ deleted_points.insert(
+ point_id,
+ ProxyDeletedPoint {
+ local_version: op_num,
+ operation_version: op_num,
+ },
+ );
+ was_deleted = true;
+ }
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let prev = self.deleted_points.write().insert( | |
| point_id, | |
| ProxyDeletedPoint { | |
| local_version: op_num, | |
| operation_version: op_num, | |
| }, | |
| ); | |
| was_deleted = prev.is_none(); | |
| if let Some(prev) = prev { | |
| debug_assert!( | |
| prev.operation_version < op_num, | |
| "Overriding deleted flag {prev:?} with older op_num:{op_num}", | |
| ) | |
| .is_none(); | |
| } | |
| let mut deleted_points = self.deleted_points.write(); | |
| match deleted_points.get(&point_id).copied() { | |
| Some(prev) => { | |
| if prev.operation_version < op_num { | |
| // Upgrade to newer op | |
| deleted_points.insert( | |
| point_id, | |
| ProxyDeletedPoint { | |
| local_version: op_num, | |
| operation_version: op_num, | |
| }, | |
| ); | |
| } else { | |
| // Keep newer/equal entry | |
| debug_assert!( | |
| prev.operation_version >= op_num, | |
| "Refusing to override deleted flag {prev:?} with older or equal op_num:{op_num}", | |
| ); | |
| } | |
| } | |
| None => { | |
| deleted_points.insert( | |
| point_id, | |
| ProxyDeletedPoint { | |
| local_version: op_num, | |
| operation_version: op_num, | |
| }, | |
| ); | |
| was_deleted = true; | |
| } | |
| } |
🤖 Prompt for AI Agents
In lib/shard/src/proxy_segment/segment_entry.rs around lines 201 to 214, the
code unconditionally inserts a ProxyDeletedPoint and relies on debug_assert to
catch stale overwrites; in release builds this can silently overwrite a newer or
equal delete. Change the logic to first check existing entry from
self.deleted_points.write() and only insert/replace if there is no existing
entry or the existing.entry.operation_version is strictly less than op_num; if
existing.operation_version >= op_num, do nothing (no overwrite). Keep the
debug_assert for the unexpected case where existing.operation_version >= op_num
if you want, but ensure the actual write happens only when op_num is strictly
newer.
* Fix applying stale delete operation on proxy * correct init of flag * Update lib/shard/src/proxy_segment/mod.rs Co-authored-by: Tim Visée <tim+github@visee.me> * Update lib/shard/src/proxy_segment/segment_entry.rs Co-authored-by: Tim Visée <tim+github@visee.me> * Update lib/shard/src/proxy_segment/segment_entry.rs Co-authored-by: Tim Visée <tim+github@visee.me> * rollback unecessary check --------- Co-authored-by: Tim Visée <tim+github@visee.me>
The continuous snapshot test failing in #7295 has shown that we are applying stale delete operation on write segments.
I discovered this while sprinkling the handling of delete version with debug assertions.
The fix proposed here sadly does not fix #7295 but improve correctness.