Skip to content

Add debug assertion for deleted version on proxy#7300

Merged
agourlay merged 6 commits intodevfrom
fix-applying--stale-delete-op-on-proxy
Sep 24, 2025
Merged

Add debug assertion for deleted version on proxy#7300
agourlay merged 6 commits intodevfrom
fix-applying--stale-delete-op-on-proxy

Conversation

@agourlay
Copy link
Member

@agourlay agourlay commented Sep 24, 2025

The continuous snapshot test failing in #7295 has shown that we are applying stale delete operation on write segments.

I discovered this while sprinkling the handling of delete version with debug assertions.

The fix proposed here sadly does not fix #7295 but improve correctness.

let point_version_in_write_segment = write_segment_guard
.point_version(point_id)
.unwrap_or_default();
let mut was_deleted_in_writable = false;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a debug_assert here failed showing that we apply the same op_num twice on the write segment

@agourlay agourlay marked this pull request as ready for review September 24, 2025 10:28
coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Sep 24, 2025
agourlay and others added 3 commits September 24, 2025 14:24
Co-authored-by: Tim Visée <tim+github@visee.me>
Co-authored-by: Tim Visée <tim+github@visee.me>
Co-authored-by: Tim Visée <tim+github@visee.me>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 24, 2025

📝 Walkthrough

Walkthrough

The changes add safety checks when inserting deleted-point markers: move_if_exists (lib/shard/src/proxy_segment/mod.rs) and delete_point (lib/shard/src/proxy_segment/segment_entry.rs) now capture the previous value returned by the insert into deleted_points, compute whether the point was newly deleted, and include a debug_assert that any previously stored operation_version is strictly less than the current op_num. No public APIs were changed.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • timvisee
  • generall

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title "Add debug assertion for deleted version on proxy" succinctly and accurately summarizes the main change — adding debug assertions to validate deleted-version handling on proxy segments — and is concise and specific to the diff.
Description Check ✅ Passed The PR description clearly states the motivation (continuous snapshot test failure) and explains that debug assertions were added to improve correctness of delete-version handling on proxy segments, which directly matches the code changes in the diff.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-applying--stale-delete-op-on-proxy

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9c33562 and e14ba35.

📒 Files selected for processing (1)
  • lib/shard/src/proxy_segment/segment_entry.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • lib/shard/src/proxy_segment/segment_entry.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: storage-compat-test
  • GitHub Check: lint
  • GitHub Check: e2e-tests
  • GitHub Check: test-consistency
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-consensus-compose
  • GitHub Check: integration-tests
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: integration-tests-consensus
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@agourlay agourlay changed the title Fix applying stale delete operation on proxy Add debug assertion for deleted version on proxy Sep 24, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b91472c and 9c33562.

📒 Files selected for processing (2)
  • lib/shard/src/proxy_segment/mod.rs (1 hunks)
  • lib/shard/src/proxy_segment/segment_entry.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead

Files:

  • lib/shard/src/proxy_segment/mod.rs
  • lib/shard/src/proxy_segment/segment_entry.rs
**/src/**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)

Files:

  • lib/shard/src/proxy_segment/mod.rs
  • lib/shard/src/proxy_segment/segment_entry.rs
🧠 Learnings (3)
📓 Common learnings
Learnt from: generall
PR: qdrant/qdrant#6403
File: lib/collection/src/collection_manager/holders/proxy_segment.rs:1178-1197
Timestamp: 2025-04-20T19:08:45.034Z
Learning: In the ProxySegment implementation, overwriting previous index changes when inserting a DeleteIfIncompatible operation is intentional. When an incompatible index is detected, the DeleteIfIncompatible operation should take precedence over any previously scheduled operations on that index.
Learnt from: generall
PR: qdrant/qdrant#7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.986Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.
📚 Learning: 2025-08-10T18:30:02.986Z
Learnt from: generall
PR: qdrant/qdrant#7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.986Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.

Applied to files:

  • lib/shard/src/proxy_segment/mod.rs
📚 Learning: 2025-04-20T19:08:45.034Z
Learnt from: generall
PR: qdrant/qdrant#6403
File: lib/collection/src/collection_manager/holders/proxy_segment.rs:1178-1197
Timestamp: 2025-04-20T19:08:45.034Z
Learning: In the ProxySegment implementation, overwriting previous index changes when inserting a DeleteIfIncompatible operation is intentional. When an incompatible index is detected, the DeleteIfIncompatible operation should take precedence over any previously scheduled operations on that index.

Applied to files:

  • lib/shard/src/proxy_segment/mod.rs
  • lib/shard/src/proxy_segment/segment_entry.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: e2e-tests
  • GitHub Check: test-consistency
  • GitHub Check: test-consensus-compose
  • GitHub Check: integration-tests-consensus
  • GitHub Check: storage-compat-test
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
  • GitHub Check: integration-tests
  • GitHub Check: lint
🔇 Additional comments (1)
lib/shard/src/proxy_segment/segment_entry.rs (1)

220-233: Same overwrite risk in the Proxy branch; apply the same guard

Mirror the fix from the Original branch to avoid clobbering a newer/equal delete record.

Apply this diff:

-                    let prev = self.deleted_points.write().insert(
-                        point_id,
-                        ProxyDeletedPoint {
-                            local_version: op_num,
-                            operation_version: op_num,
-                        },
-                    );
-                    was_deleted = prev.is_none();
-                    if let Some(prev) = prev {
-                        debug_assert!(
-                            prev.operation_version < op_num,
-                            "Overriding deleted flag {prev:?} with older op_num:{op_num}",
-                        )
-                    }
+                    let mut deleted_points = self.deleted_points.write();
+                    match deleted_points.get(&point_id).copied() {
+                        Some(prev) => {
+                            if prev.operation_version < op_num {
+                                deleted_points.insert(
+                                    point_id,
+                                    ProxyDeletedPoint {
+                                        local_version: op_num,
+                                        operation_version: op_num,
+                                    },
+                                );
+                            } else {
+                                debug_assert!(
+                                    prev.operation_version >= op_num,
+                                    "Refusing to override deleted flag {prev:?} with older or equal op_num:{op_num}",
+                                );
+                            }
+                        }
+                        None => {
+                            deleted_points.insert(
+                                point_id,
+                                ProxyDeletedPoint {
+                                    local_version: op_num,
+                                    operation_version: op_num,
+                                },
+                            );
+                            was_deleted = true;
+                        }
+                    }

Comment on lines +197 to +209
let prev = deleted_points_write.insert(
point_id,
ProxyDeletedPoint {
local_version,
operation_version: op_num,
},
);
if let Some(prev) = prev {
debug_assert!(
prev.operation_version < op_num,
"Overriding deleted flag {prev:?} with older op_num:{op_num}",
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid TOCTOU overwrite: re-check under write lock and only insert when strictly newer

Between the earlier read and this insert, another thread/proxy can add a newer delete for the same point. In release builds you’ll overwrite it, despite the debug_assert (debug-only). Re-check the existing entry under the write lock and skip insert unless the incoming op is strictly newer, and also consider local_version.

Apply this diff:

-            let prev = deleted_points_write.insert(
-                point_id,
-                ProxyDeletedPoint {
-                    local_version,
-                    operation_version: op_num,
-                },
-            );
-            if let Some(prev) = prev {
-                debug_assert!(
-                    prev.operation_version < op_num,
-                    "Overriding deleted flag {prev:?} with older op_num:{op_num}",
-                )
-            }
+            match deleted_points_write.get(&point_id).copied() {
+                Some(prev) if prev.local_version >= local_version || prev.operation_version >= op_num => {
+                    // Keep newer/equal record; do not override
+                }
+                _ => {
+                    let prev = deleted_points_write.insert(
+                        point_id,
+                        ProxyDeletedPoint {
+                            local_version,
+                            operation_version: op_num,
+                        },
+                    );
+                    if let Some(prev) = prev {
+                        debug_assert!(
+                            prev.operation_version < op_num,
+                            "Overriding deleted flag {prev:?} with older op_num:{op_num}",
+                        )
+                    }
+                }
+            }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let prev = deleted_points_write.insert(
point_id,
ProxyDeletedPoint {
local_version,
operation_version: op_num,
},
);
if let Some(prev) = prev {
debug_assert!(
prev.operation_version < op_num,
"Overriding deleted flag {prev:?} with older op_num:{op_num}",
)
}
match deleted_points_write.get(&point_id).copied() {
Some(prev) if prev.local_version >= local_version || prev.operation_version >= op_num => {
// Keep newer/equal record; do not override
}
_ => {
let prev = deleted_points_write.insert(
point_id,
ProxyDeletedPoint {
local_version,
operation_version: op_num,
},
);
if let Some(prev) = prev {
debug_assert!(
prev.operation_version < op_num,
"Overriding deleted flag {prev:?} with older op_num:{op_num}",
)
}
}
}
🤖 Prompt for AI Agents
In lib/shard/src/proxy_segment/mod.rs around lines 197 to 209, after acquiring
the write lock you must re-check any existing entry for point_id and only
insert/replace when the incoming ProxyDeletedPoint (op_num and local_version) is
strictly newer; fetch the current entry from deleted_points_write, compare
operation_version (and tie-break with local_version if needed), and if the
existing entry is newer or equal then skip the insert; otherwise perform the
insert/replace with the new ProxyDeletedPoint so you avoid the TOCTOU overwrite
that the debug_assert alone does not prevent in release builds.

Comment on lines +201 to +214
let prev = self.deleted_points.write().insert(
point_id,
ProxyDeletedPoint {
local_version: op_num,
operation_version: op_num,
},
);
was_deleted = prev.is_none();
if let Some(prev) = prev {
debug_assert!(
prev.operation_version < op_num,
"Overriding deleted flag {prev:?} with older op_num:{op_num}",
)
.is_none();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Do not overwrite a newer/equal delete; debug_assert is insufficient in release builds

This unconditionally overwrites an existing entry, and the debug_assert only fires in debug. In release, an older/equal op_num can silently clobber a newer delete scheduled by another thread/proxy, breaking correctness.

Guard before insert and only replace when the incoming op_num is strictly newer.

Apply this diff:

-                    let prev = self.deleted_points.write().insert(
-                        point_id,
-                        ProxyDeletedPoint {
-                            local_version: op_num,
-                            operation_version: op_num,
-                        },
-                    );
-                    was_deleted = prev.is_none();
-                    if let Some(prev) = prev {
-                        debug_assert!(
-                            prev.operation_version < op_num,
-                            "Overriding deleted flag {prev:?} with older op_num:{op_num}",
-                        )
-                    }
+                    let mut deleted_points = self.deleted_points.write();
+                    match deleted_points.get(&point_id).copied() {
+                        Some(prev) => {
+                            if prev.operation_version < op_num {
+                                // Upgrade to newer op
+                                deleted_points.insert(
+                                    point_id,
+                                    ProxyDeletedPoint {
+                                        local_version: op_num,
+                                        operation_version: op_num,
+                                    },
+                                );
+                            } else {
+                                // Keep newer/equal entry
+                                debug_assert!(
+                                    prev.operation_version >= op_num,
+                                    "Refusing to override deleted flag {prev:?} with older or equal op_num:{op_num}",
+                                );
+                            }
+                        }
+                        None => {
+                            deleted_points.insert(
+                                point_id,
+                                ProxyDeletedPoint {
+                                    local_version: op_num,
+                                    operation_version: op_num,
+                                },
+                            );
+                            was_deleted = true;
+                        }
+                    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let prev = self.deleted_points.write().insert(
point_id,
ProxyDeletedPoint {
local_version: op_num,
operation_version: op_num,
},
);
was_deleted = prev.is_none();
if let Some(prev) = prev {
debug_assert!(
prev.operation_version < op_num,
"Overriding deleted flag {prev:?} with older op_num:{op_num}",
)
.is_none();
}
let mut deleted_points = self.deleted_points.write();
match deleted_points.get(&point_id).copied() {
Some(prev) => {
if prev.operation_version < op_num {
// Upgrade to newer op
deleted_points.insert(
point_id,
ProxyDeletedPoint {
local_version: op_num,
operation_version: op_num,
},
);
} else {
// Keep newer/equal entry
debug_assert!(
prev.operation_version >= op_num,
"Refusing to override deleted flag {prev:?} with older or equal op_num:{op_num}",
);
}
}
None => {
deleted_points.insert(
point_id,
ProxyDeletedPoint {
local_version: op_num,
operation_version: op_num,
},
);
was_deleted = true;
}
}
🤖 Prompt for AI Agents
In lib/shard/src/proxy_segment/segment_entry.rs around lines 201 to 214, the
code unconditionally inserts a ProxyDeletedPoint and relies on debug_assert to
catch stale overwrites; in release builds this can silently overwrite a newer or
equal delete. Change the logic to first check existing entry from
self.deleted_points.write() and only insert/replace if there is no existing
entry or the existing.entry.operation_version is strictly less than op_num; if
existing.operation_version >= op_num, do nothing (no overwrite). Keep the
debug_assert for the unexpected case where existing.operation_version >= op_num
if you want, but ensure the actual write happens only when op_num is strictly
newer.

@qdrant qdrant deleted a comment from coderabbitai bot Sep 24, 2025
@agourlay agourlay merged commit bf8e86f into dev Sep 24, 2025
16 checks passed
@agourlay agourlay deleted the fix-applying--stale-delete-op-on-proxy branch September 24, 2025 13:01
timvisee added a commit that referenced this pull request Sep 29, 2025
* Fix applying stale delete operation on proxy

* correct init of flag

* Update lib/shard/src/proxy_segment/mod.rs

Co-authored-by: Tim Visée <tim+github@visee.me>

* Update lib/shard/src/proxy_segment/segment_entry.rs

Co-authored-by: Tim Visée <tim+github@visee.me>

* Update lib/shard/src/proxy_segment/segment_entry.rs

Co-authored-by: Tim Visée <tim+github@visee.me>

* rollback unecessary check

---------

Co-authored-by: Tim Visée <tim+github@visee.me>
@coderabbitai coderabbitai bot mentioned this pull request Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants