Skip to content

Fix non visible points during snapshots#7248

Merged
agourlay merged 3 commits intodevfrom
fix-non-visible-point-during-snapshot
Sep 12, 2025
Merged

Fix non visible points during snapshots#7248
agourlay merged 3 commits intodevfrom
fix-non-visible-point-during-snapshot

Conversation

@agourlay
Copy link
Member

@agourlay agourlay commented Sep 12, 2025

This PR fixes an issue where points previously inserted/updated are not visible during a snapshot.

During a snapshot, all segments are proxified and share the same write segment for the sake of efficiency.

Each segment is snapshotted and unproxified immediately if possible, meaning the inner wrapped segment is promoted to the segment holder.

As a follow up, after all segments have been unproxified, the shared write segment is promoted to the segment holder if it contains points accumulated during the snapshot process.

An issue appears between the unproxifying of the last segment and the promotion of the shared write segment.

During a short period of time, the content of the write segment is not visible to readers because:

  • it is not accessible via any proxy segments
  • the write segment itself has not been promoted yet

This problem is visible in the provided test which displays a breakage of the read-my-write consistency during snapshot.

This PR proposes to fix the issue by making sure we keep at least one proxy segment alive to be promoted atomically with the shared write segment.

@agourlay agourlay marked this pull request as ready for review September 12, 2025 14:02
let mut result = Ok(());
let mut unproxied_segment_ids = Vec::with_capacity(proxies.len());
// Reverse to unproxify first non-appenable segments
proxies.reverse();
Copy link
Member Author

@agourlay agourlay Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have a rock solid rational for this change, but it is necessary for the fix below to work properly

}
LockedSegment::Proxy(_) => None,
LockedSegment::Proxy(_) => {
log::debug!("Double proxy segment creation");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we demote this to trace?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to keep this way because double proxies are mysterious beasts.

Seeing it in the log is a clear sign that we are dealing with a complex situation like snapshot + optimizer.

coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Sep 12, 2025
Co-authored-by: Tim Visée <tim+github@visee.me>
coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Sep 12, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 12, 2025

📝 Walkthrough

Walkthrough

  • Adds per-point retrieval and per-point payload updates in continuous_snapshot_test.rs, including new imports for per-point operations.
  • Updates ProxySegment::new to log a debug message when encountering a double proxy creation in lib/shard/src/proxy_segment/mod.rs.
  • Adjusts unproxy processing in lib/shard/src/segment_holder/mod.rs: reverse unproxy order to handle non-appendable segments first, ensure at least one proxy remains before the final unproxy-all pass, and filter out successfully unproxied segments.
  • Adds a debug-only assertion in upsert_with_payload in lib/shard/src/update.rs to verify point existence post-upsert without payload.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • timvisee
  • generall

Pre-merge checks (3 passed)

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly and accurately summarizes the primary fix: preventing points from becoming not visible during snapshots by adjusting proxy/unproxy behavior. It is concise, on-topic, and directly related to the changes described in the PR and test updates. A minor wording improvement (e.g., "Fix invisible points during snapshots") could improve grammar but is optional.
Description Check ✅ Passed The description clearly explains the bug, its root cause, and the proposed fix (keeping at least one proxy alive so the shared write segment can be promoted atomically), and it references the included test that reproduces the issue. It is directly related to the changes in the raw summary and PR objectives and provides adequate context for reviewers.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-non-visible-point-during-snapshot

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
lib/shard/src/update.rs (1)

294-298: Include op_num in debug assertion for faster forensics

Adding the operation number helps correlate with WAL/logs when this ever fires.

-    debug_assert!(
-        segment.has_point(point_id),
-        "the point {point_id} should be present immediately after the upsert"
-    );
+    debug_assert!(
+        segment.has_point(point_id),
+        "point {point_id} should exist right after upsert, op={op_num}"
+    );
lib/collection/tests/integration/continuous_snapshot_test.rs (3)

151-169: Strengthen retrieval check by asserting the returned ID

This guards against accidental duplicates/mismatches during the snapshot window.

-                for i in 0..points_count {
-                    let retrieve_point = PointRequestInternal {
-                        ids: vec![i.into()],
+                for i in 0..points_count {
+                    let expected_id = i.into();
+                    let retrieve_point = PointRequestInternal {
+                        ids: vec![expected_id],
                         with_payload: None,
                         with_vector: WithVector::Bool(false),
                     };
                     let hw_counter = HwMeasurementAcc::disposable();
                     let retrieve_result = collection
                         .retrieve(
                             retrieve_point,
                             None,
                             &ShardSelectorInternal::All,
                             None,
                             hw_counter,
                         )
                         .await?;
                     assert_eq!(retrieve_result.len(), 1);
+                    assert_eq!(retrieve_result[0].id, expected_id);
                 }

171-194: Optional: Validate payload persistence

After SetPayload, consider retrieving the same point once (with payload requested) and asserting the fields to catch regressions end-to-end.


12-16: Imports look good; minor style note

If you prefer stricter explicitness, you can replace i.into() in this test with PointIdType::from(i) and extend the import to include PointIdType. Pure style; no functional change.

Also applies to: 24-27

lib/shard/src/segment_holder/mod.rs (2)

1002-1015: Make the “keep one proxy” guard more self-explanatory

A tiny readability tweak clarifies intent and avoids mental math during reviews.

-            let remaining = proxies.len() - unproxied_segment_ids.len();
-            if remaining > 1 {
+            let is_last_proxy = unproxied_segment_ids.len() + 1 == proxies.len();
+            // Keep at least one proxy alive until the final unproxy pass.
+            if !is_last_proxy {
                 match Self::try_unproxy_segment(segments_lock, *segment_id, proxy_segment.clone()) {
                     Ok(lock) => {
                         segments_lock = lock;
                         unproxied_segment_ids.push(*segment_id);
                     }
                     Err(lock) => segments_lock = lock,
                 }
             }

1017-1017: Avoid O(n²) retain by using a set

For many segments, this reduces retain’s per-check cost.

-proxies.retain(|(id, _)| !unproxied_segment_ids.contains(id));
+let removed: HashSet<_> = unproxied_segment_ids.iter().copied().collect();
+proxies.retain(|(id, _)| !removed.contains(id));
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 160becb and 58f2f68.

📒 Files selected for processing (4)
  • lib/collection/tests/integration/continuous_snapshot_test.rs (2 hunks)
  • lib/shard/src/proxy_segment/mod.rs (1 hunks)
  • lib/shard/src/segment_holder/mod.rs (2 hunks)
  • lib/shard/src/update.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead

Files:

  • lib/shard/src/proxy_segment/mod.rs
  • lib/shard/src/update.rs
  • lib/collection/tests/integration/continuous_snapshot_test.rs
  • lib/shard/src/segment_holder/mod.rs
**/src/**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)

Files:

  • lib/shard/src/proxy_segment/mod.rs
  • lib/shard/src/update.rs
  • lib/shard/src/segment_holder/mod.rs
**/{tests,benches}/**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

Using .unwrap() and panic!() in tests and benchmarks is acceptable and should not be flagged

Files:

  • lib/collection/tests/integration/continuous_snapshot_test.rs
🧠 Learnings (1)
📚 Learning: 2025-08-10T18:30:02.986Z
Learnt from: generall
PR: qdrant/qdrant#7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.986Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.

Applied to files:

  • lib/collection/tests/integration/continuous_snapshot_test.rs
🧬 Code graph analysis (1)
lib/shard/src/update.rs (3)
lib/segment/src/entry/entry_point.rs (1)
  • has_point (207-207)
lib/segment/src/segment/entry.rs (1)
  • has_point (446-448)
lib/shard/src/proxy_segment/segment_entry.rs (1)
  • has_point (605-612)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: storage-compat-test
  • GitHub Check: lint
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-consistency
  • GitHub Check: e2e-tests
  • GitHub Check: test-consensus-compose
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: integration-tests-consensus
  • GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
  • GitHub Check: integration-tests
🔇 Additional comments (2)
lib/shard/src/proxy_segment/mod.rs (1)

58-61: Double-proxy debug log is fine

This only triggers in rare/misconfigured paths; keeping it at debug makes sense for diagnostics. LGTM.

lib/shard/src/segment_holder/mod.rs (1)

972-974: Unproxy non-appendable first — good call

Reversing here ensures we drop non-appendables earlier, reducing the proxy window on immutable segments. LGTM.

@agourlay agourlay merged commit de19cb4 into dev Sep 12, 2025
16 checks passed
@agourlay agourlay deleted the fix-non-visible-point-during-snapshot branch September 12, 2025 16:40
timvisee added a commit that referenced this pull request Sep 29, 2025
* Fix non visible points during snapshots

* Update lib/shard/src/segment_holder/mod.rs

Co-authored-by: Tim Visée <tim+github@visee.me>

* code review from mr.rabbit

---------

Co-authored-by: Tim Visée <tim+github@visee.me>
@timvisee timvisee mentioned this pull request Sep 29, 2025
@coderabbitai coderabbitai bot mentioned this pull request Oct 14, 2025
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants