Fix non visible points during snapshots#7248
Conversation
| let mut result = Ok(()); | ||
| let mut unproxied_segment_ids = Vec::with_capacity(proxies.len()); | ||
| // Reverse to unproxify first non-appenable segments | ||
| proxies.reverse(); |
There was a problem hiding this comment.
I do not have a rock solid rational for this change, but it is necessary for the fix below to work properly
| } | ||
| LockedSegment::Proxy(_) => None, | ||
| LockedSegment::Proxy(_) => { | ||
| log::debug!("Double proxy segment creation"); |
There was a problem hiding this comment.
Shall we demote this to trace?
There was a problem hiding this comment.
I'd like to keep this way because double proxies are mysterious beasts.
Seeing it in the log is a clear sign that we are dealing with a complex situation like snapshot + optimizer.
Co-authored-by: Tim Visée <tim+github@visee.me>
📝 WalkthroughWalkthrough
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks (3 passed)✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (6)
lib/shard/src/update.rs (1)
294-298: Include op_num in debug assertion for faster forensicsAdding the operation number helps correlate with WAL/logs when this ever fires.
- debug_assert!( - segment.has_point(point_id), - "the point {point_id} should be present immediately after the upsert" - ); + debug_assert!( + segment.has_point(point_id), + "point {point_id} should exist right after upsert, op={op_num}" + );lib/collection/tests/integration/continuous_snapshot_test.rs (3)
151-169: Strengthen retrieval check by asserting the returned IDThis guards against accidental duplicates/mismatches during the snapshot window.
- for i in 0..points_count { - let retrieve_point = PointRequestInternal { - ids: vec![i.into()], + for i in 0..points_count { + let expected_id = i.into(); + let retrieve_point = PointRequestInternal { + ids: vec![expected_id], with_payload: None, with_vector: WithVector::Bool(false), }; let hw_counter = HwMeasurementAcc::disposable(); let retrieve_result = collection .retrieve( retrieve_point, None, &ShardSelectorInternal::All, None, hw_counter, ) .await?; assert_eq!(retrieve_result.len(), 1); + assert_eq!(retrieve_result[0].id, expected_id); }
171-194: Optional: Validate payload persistenceAfter SetPayload, consider retrieving the same point once (with payload requested) and asserting the fields to catch regressions end-to-end.
12-16: Imports look good; minor style noteIf you prefer stricter explicitness, you can replace i.into() in this test with PointIdType::from(i) and extend the import to include PointIdType. Pure style; no functional change.
Also applies to: 24-27
lib/shard/src/segment_holder/mod.rs (2)
1002-1015: Make the “keep one proxy” guard more self-explanatoryA tiny readability tweak clarifies intent and avoids mental math during reviews.
- let remaining = proxies.len() - unproxied_segment_ids.len(); - if remaining > 1 { + let is_last_proxy = unproxied_segment_ids.len() + 1 == proxies.len(); + // Keep at least one proxy alive until the final unproxy pass. + if !is_last_proxy { match Self::try_unproxy_segment(segments_lock, *segment_id, proxy_segment.clone()) { Ok(lock) => { segments_lock = lock; unproxied_segment_ids.push(*segment_id); } Err(lock) => segments_lock = lock, } }
1017-1017: Avoid O(n²) retain by using a setFor many segments, this reduces retain’s per-check cost.
-proxies.retain(|(id, _)| !unproxied_segment_ids.contains(id)); +let removed: HashSet<_> = unproxied_segment_ids.iter().copied().collect(); +proxies.retain(|(id, _)| !removed.contains(id));
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
lib/collection/tests/integration/continuous_snapshot_test.rs(2 hunks)lib/shard/src/proxy_segment/mod.rs(1 hunks)lib/shard/src/segment_holder/mod.rs(2 hunks)lib/shard/src/update.rs(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead
Files:
lib/shard/src/proxy_segment/mod.rslib/shard/src/update.rslib/collection/tests/integration/continuous_snapshot_test.rslib/shard/src/segment_holder/mod.rs
**/src/**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)
Files:
lib/shard/src/proxy_segment/mod.rslib/shard/src/update.rslib/shard/src/segment_holder/mod.rs
**/{tests,benches}/**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
Using .unwrap() and panic!() in tests and benchmarks is acceptable and should not be flagged
Files:
lib/collection/tests/integration/continuous_snapshot_test.rs
🧠 Learnings (1)
📚 Learning: 2025-08-10T18:30:02.986Z
Learnt from: generall
PR: qdrant/qdrant#7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.986Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.
Applied to files:
lib/collection/tests/integration/continuous_snapshot_test.rs
🧬 Code graph analysis (1)
lib/shard/src/update.rs (3)
lib/segment/src/entry/entry_point.rs (1)
has_point(207-207)lib/segment/src/segment/entry.rs (1)
has_point(446-448)lib/shard/src/proxy_segment/segment_entry.rs (1)
has_point(605-612)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: storage-compat-test
- GitHub Check: lint
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-consistency
- GitHub Check: e2e-tests
- GitHub Check: test-consensus-compose
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: integration-tests-consensus
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: integration-tests
🔇 Additional comments (2)
lib/shard/src/proxy_segment/mod.rs (1)
58-61: Double-proxy debug log is fineThis only triggers in rare/misconfigured paths; keeping it at debug makes sense for diagnostics. LGTM.
lib/shard/src/segment_holder/mod.rs (1)
972-974: Unproxy non-appendable first — good callReversing here ensures we drop non-appendables earlier, reducing the proxy window on immutable segments. LGTM.
* Fix non visible points during snapshots * Update lib/shard/src/segment_holder/mod.rs Co-authored-by: Tim Visée <tim+github@visee.me> * code review from mr.rabbit --------- Co-authored-by: Tim Visée <tim+github@visee.me>
This PR fixes an issue where points previously inserted/updated are not visible during a snapshot.
During a snapshot, all segments are proxified and share the same write segment for the sake of efficiency.
Each segment is snapshotted and unproxified immediately if possible, meaning the inner wrapped segment is promoted to the segment holder.
As a follow up, after all segments have been unproxified, the shared write segment is promoted to the segment holder if it contains points accumulated during the snapshot process.
An issue appears between the unproxifying of the last segment and the promotion of the shared write segment.
During a short period of time, the content of the write segment is not visible to readers because:
This problem is visible in the provided test which displays a breakage of the read-my-write consistency during snapshot.
This PR proposes to fix the issue by making sure we keep at least one proxy segment alive to be promoted atomically with the shared write segment.