Skip to content

fix segment repair on load#7400

Merged
timvisee merged 5 commits intodevfrom
fix-segment-clearing
Oct 14, 2025
Merged

fix segment repair on load#7400
timvisee merged 5 commits intodevfrom
fix-segment-clearing

Conversation

@generall
Copy link
Member

Consider following situation:

  • We remove some point from the segment
  • Flush operation begins, it flushes point ids (deletes them)
  • Process gets killed

In this situation we end up in a configuration, where WAL contains delete operation by point-id, but no such point-id is registered anywhere. But the payload and payload indexes are still there, displaying wrong statistics.

We actually had a mechanism, which was supposed to clean-up leftovers of the deleted points.

Unfortunatelly, the existing mechanism was broken.
This PR fixes the segment repair process. In order to do so effectively, it intoduces two hacks:

  • Points versions are set to 0 on deletions. If recovery process sees the non-zero version with no point-is attaches, it assumes this point needs cleaning.
  • LocalShards are initialized with empty WAL operation to bump first meaningful operration version to be >0. This is our long-living problem, which often causes troubles. In this case I didn't find a suitable workaround. (I did consider u64:MAX, but it will compromise the efficiency of compressed versions representation)

@generall generall requested review from coszio and timvisee October 13, 2025 22:27
@generall generall mentioned this pull request Oct 13, 2025
Comment on lines +613 to +624
/// This operation inserts an empty operation into WAL.
/// Operation does nothing, but takes a spot in WAL.
/// We need it mostly to force WAL to start with something besides zero as a first operation number.
pub async fn insert_fake_operation(&self) -> CollectionResult<()> {
let mut operation = OperationWithClockTag {
operation: CollectionUpdateOperations::PointOperation(PointOperations::UpsertPoints(
PointInsertOperationsInternal::from(vec![]),
)),
clock_tag: None,
};
self.wal.lock_and_write(&mut operation).await?;
Ok(())
Copy link
Member Author

@generall generall Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hack I couldn't avoid.

Ok(applied)
}

pub fn delete_point_internal(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this into separate fucntion to have a single source of truth for clearing segment data on delete

// Get rid of unmapped points.
let mut internal_ids_to_delete = HashSet::new();
let id_tracker = self.id_tracker.borrow();
for internal_id in id_tracker.iter_ids() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was broken, cause iter_ids skips points which are already partially deleted.

coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Oct 14, 2025
operation: CollectionUpdateOperations::PointOperation(PointOperations::UpsertPoints(
PointInsertOperationsInternal::from(vec![]),
)),
clock_tag: None,
Copy link
Member

@timvisee timvisee Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's best to set this clock tag, to aid with WAL delta. Let me validate that on the side.

In WAL delta resolution, we use version 0 as a 'receive everything' trick. We rely on an operation with clock tick 0 to exist, or at least we hope it exists.

If an operation with tick 0 cannot be found, it will fall back to other transfer mechanisms such as stream records.

Having an operation with clock tick 0 is semi-critical. So I vote to set the clock tag here. Sending this operation over to other peers is not a problem.

Even when clock tick 0 is requested, the WAL delta transfer may still be much faster than stream records. This may be the case when a new peer recently joined, and we only request clock tick 0 for that peer.

Copy link
Member

@timvisee timvisee Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me rectify that. After closer inspection, I believe we're good with not setting the clock.

Clocks track their own versions, and it is not related to WAL operation versions. That means that the next operation will get clock tick 0 which we desire to have.

Having this no-op operation here acts exactly the same as if the WAL was still completely empty.

Conclusion: no change is necessary! 🙌

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
lib/segment/src/id_tracker/mutable_id_tracker.rs (1)

382-389: Avoid unwrap on BufWriter::into_inner to prevent panic in IO path

Using unwrap() can panic on a late flush error. Propagate as OperationError like in versions path.

Apply this diff:

-    writer.flush()?;
-    let file = writer.into_inner().unwrap();
+    writer.flush()?;
+    let file = writer.into_inner().map_err(|err| err.into_error())?;
lib/segment/src/id_tracker/in_memory_id_tracker.rs (1)

52-61: Unify set_internal_version semantics with other trackers

Current guard only updates when a mapping exists, violating the trait doc for drop_internal (“still removes (unsets) version”) and making cleanup brittle. Align with Mutable/Immutable: always size the vector and set.

Apply this diff:

-        if self.external_id(internal_id).is_some() {
-            if let Some(old_version) = self.internal_to_version.get_mut(internal_id as usize) {
-                *old_version = version;
-            } else {
-                self.internal_to_version.resize(internal_id as usize + 1, 0);
-                self.internal_to_version[internal_id as usize] = version;
-            }
-        }
+        if internal_id as usize >= self.internal_to_version.len() {
+            self.internal_to_version.resize(internal_id as usize + 1, 0);
+        }
+        self.internal_to_version[internal_id as usize] = version;
🧹 Nitpick comments (2)
lib/segment/src/id_tracker/id_tracker_base.rs (2)

63-66: Minor doc wording nit

“removes( unsets )” → “removes (unsets)”.

Apply this diff:

-    /// Same as `drop`, but by internal ID
-    /// If mapping doesn't exist, still removes( unsets ) version.
+    /// Same as `drop`, but by internal ID
+    /// If mapping doesn't exist, still removes (unsets) version.

176-214: Cleanup logic works; consider calling drop_internal and adjust docs

  • The implementation also handles “mapping without version,” not only “version without mapping.” Update docs for clarity.
  • Prefer drop_internal(internal_id) over drop(external_id) to make behavior uniform across implementations and avoid extra lookups.

Apply this diff:

-    ///
-    /// Returns a list of internal ids, that have non-zero versions, but are missing in the id mapping.
-    /// Those points should be removed from all other parts of the segment.
+    ///
+    /// Returns a list of internal IDs that are inconsistent:
+    /// - have non-zero versions but no mapping, or
+    /// - have a mapping but no version (insertion didn’t complete).
+    /// The caller should remove these points from vector/payload storages.
@@
-        let mut to_remove = Vec::new();
+        let mut to_remove_internal = Vec::new();
@@
-                if let Some(external_id) = self.external_id(internal_id) {
-                    to_remove.push(external_id);
-                    to_return.push(internal_id);
-                } else {
+                if self.external_id(internal_id).is_some() {
+                    to_remove_internal.push(internal_id);
+                    to_return.push(internal_id);
+                } else {
                     debug_assert!(false, "internal id {internal_id} has no external id");
                 }
             }
         }
-        for external_id in to_remove {
-            self.drop(external_id)?;
+        for internal_id in to_remove_internal {
+            self.drop_internal(internal_id)?;
             #[cfg(debug_assertions)]
             {
-                log::debug!("dropped version for point {external_id} without version");
+                log::debug!("dropped mapping for point {internal_id} without version");
             }
         }

If the rocksdb feature is enabled, SimpleIdTracker’s drop_internal currently does not zero versions. That can cause repeated cleanup for “version without mapping” cases. Confirm rocksdb is disabled in target builds, or plan a follow-up to align SimpleIdTracker semantics. Based on learnings.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 662bc35 and 43f5113.

📒 Files selected for processing (13)
  • lib/collection/src/shards/local_shard/mod.rs (4 hunks)
  • lib/collection/src/tests/wal_recovery_test.rs (1 hunks)
  • lib/common/common/src/counter/counter_cell.rs (1 hunks)
  • lib/segment/src/fixtures/payload_context_fixture.rs (3 hunks)
  • lib/segment/src/id_tracker/compressed/versions_store.rs (2 hunks)
  • lib/segment/src/id_tracker/id_tracker_base.rs (5 hunks)
  • lib/segment/src/id_tracker/immutable_id_tracker.rs (4 hunks)
  • lib/segment/src/id_tracker/in_memory_id_tracker.rs (3 hunks)
  • lib/segment/src/id_tracker/mutable_id_tracker.rs (4 hunks)
  • lib/segment/src/id_tracker/simple_id_tracker.rs (2 hunks)
  • lib/segment/src/segment/entry.rs (1 hunks)
  • lib/segment/src/segment/mod.rs (1 hunks)
  • lib/segment/src/segment/segment_ops.rs (4 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead

Files:

  • lib/collection/src/tests/wal_recovery_test.rs
  • lib/segment/src/id_tracker/in_memory_id_tracker.rs
  • lib/collection/src/shards/local_shard/mod.rs
  • lib/common/common/src/counter/counter_cell.rs
  • lib/segment/src/id_tracker/id_tracker_base.rs
  • lib/segment/src/id_tracker/simple_id_tracker.rs
  • lib/segment/src/segment/segment_ops.rs
  • lib/segment/src/id_tracker/compressed/versions_store.rs
  • lib/segment/src/id_tracker/immutable_id_tracker.rs
  • lib/segment/src/segment/entry.rs
  • lib/segment/src/fixtures/payload_context_fixture.rs
  • lib/segment/src/segment/mod.rs
  • lib/segment/src/id_tracker/mutable_id_tracker.rs
**/src/**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)

Files:

  • lib/collection/src/tests/wal_recovery_test.rs
  • lib/segment/src/id_tracker/in_memory_id_tracker.rs
  • lib/collection/src/shards/local_shard/mod.rs
  • lib/common/common/src/counter/counter_cell.rs
  • lib/segment/src/id_tracker/id_tracker_base.rs
  • lib/segment/src/id_tracker/simple_id_tracker.rs
  • lib/segment/src/segment/segment_ops.rs
  • lib/segment/src/id_tracker/compressed/versions_store.rs
  • lib/segment/src/id_tracker/immutable_id_tracker.rs
  • lib/segment/src/segment/entry.rs
  • lib/segment/src/fixtures/payload_context_fixture.rs
  • lib/segment/src/segment/mod.rs
  • lib/segment/src/id_tracker/mutable_id_tracker.rs
**/{tests,benches}/**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

Using .unwrap() and panic!() in tests and benchmarks is acceptable and should not be flagged

Files:

  • lib/collection/src/tests/wal_recovery_test.rs
🧠 Learnings (2)
📓 Common learnings
Learnt from: timvisee
PR: qdrant/qdrant#7157
File: lib/shard/src/segment_holder/mod.rs:808-814
Timestamp: 2025-09-01T11:42:06.964Z
Learning: In Qdrant's segment holder, panicking when no segments exist during flush_all is intentional and preferred over graceful error handling, as having zero segments could permanently corrupt the WAL by acknowledging u64::MAX. The maintainers consider this condition impossible and use the panic as a fail-fast safety mechanism to prevent data corruption.
📚 Learning: 2025-10-13T22:58:03.099Z
Learnt from: generall
PR: qdrant/qdrant#7400
File: lib/segment/src/id_tracker/simple_id_tracker.rs:234-241
Timestamp: 2025-10-13T22:58:03.099Z
Learning: SimpleIdTracker in lib/segment/src/id_tracker/simple_id_tracker.rs is being deprecated and should not receive fixes related to version tracking or recovery logic, as it has a different version storage structure that is incompatible with newer trackers.

Applied to files:

  • lib/segment/src/id_tracker/in_memory_id_tracker.rs
  • lib/segment/src/id_tracker/id_tracker_base.rs
  • lib/segment/src/id_tracker/simple_id_tracker.rs
  • lib/segment/src/id_tracker/immutable_id_tracker.rs
  • lib/segment/src/segment/entry.rs
  • lib/segment/src/fixtures/payload_context_fixture.rs
  • lib/segment/src/id_tracker/mutable_id_tracker.rs
🧬 Code graph analysis (9)
lib/segment/src/id_tracker/in_memory_id_tracker.rs (7)
lib/segment/src/fixtures/payload_context_fixture.rs (5)
  • internal_id (61-70)
  • external_id (72-76)
  • drop_internal (91-97)
  • iter_internal_versions (181-185)
  • new (39-45)
lib/segment/src/id_tracker/id_tracker_base.rs (8)
  • internal_id (46-46)
  • internal_id (274-282)
  • external_id (51-51)
  • external_id (284-292)
  • drop_internal (65-65)
  • drop_internal (326-334)
  • iter_internal_versions (169-171)
  • iter_internal_versions (463-473)
lib/segment/src/id_tracker/mutable_id_tracker.rs (4)
  • internal_id (211-213)
  • external_id (215-217)
  • drop_internal (242-253)
  • iter_internal_versions (337-346)
lib/segment/src/id_tracker/immutable_id_tracker.rs (5)
  • internal_id (407-409)
  • external_id (411-413)
  • drop_internal (434-443)
  • iter_internal_versions (503-507)
  • new (281-357)
lib/segment/src/id_tracker/simple_id_tracker.rs (4)
  • internal_id (210-212)
  • external_id (214-216)
  • drop_internal (234-241)
  • iter_internal_versions (300-309)
lib/segment/src/id_tracker/point_mappings.rs (3)
  • internal_id (81-86)
  • external_id (88-96)
  • new (41-53)
lib/segment/src/segment_constructor/segment_builder.rs (1)
  • new (83-168)
lib/collection/src/shards/local_shard/mod.rs (1)
lib/shard/src/wal.rs (3)
  • path (187-189)
  • first_index (200-203)
  • last_index (206-208)
lib/segment/src/id_tracker/id_tracker_base.rs (8)
lib/segment/src/fixtures/payload_context_fixture.rs (6)
  • drop_internal (91-97)
  • internal_id (61-70)
  • iter_internal_versions (181-185)
  • cleanup_versions (187-190)
  • new (39-45)
  • external_id (72-76)
lib/segment/src/id_tracker/mutable_id_tracker.rs (4)
  • drop_internal (242-253)
  • internal_id (211-213)
  • iter_internal_versions (337-346)
  • external_id (215-217)
lib/segment/src/id_tracker/immutable_id_tracker.rs (5)
  • drop_internal (434-443)
  • internal_id (407-409)
  • iter_internal_versions (503-507)
  • new (281-357)
  • external_id (411-413)
lib/segment/src/id_tracker/in_memory_id_tracker.rs (5)
  • drop_internal (89-97)
  • internal_id (64-66)
  • iter_internal_versions (158-167)
  • new (24-26)
  • external_id (68-70)
lib/segment/src/id_tracker/simple_id_tracker.rs (4)
  • drop_internal (234-241)
  • internal_id (210-212)
  • iter_internal_versions (300-309)
  • external_id (214-216)
lib/segment/src/id_tracker/point_mappings.rs (3)
  • internal_id (81-86)
  • new (41-53)
  • external_id (88-96)
lib/segment/src/id_tracker/compressed/compressed_point_mappings.rs (2)
  • internal_id (86-88)
  • external_id (90-96)
lib/segment/src/segment/segment_ops.rs (1)
  • cleanup_versions (657-659)
lib/segment/src/id_tracker/simple_id_tracker.rs (6)
lib/segment/src/fixtures/payload_context_fixture.rs (5)
  • drop_internal (91-97)
  • internal_id (61-70)
  • external_id (72-76)
  • iter_internal_versions (181-185)
  • new (39-45)
lib/segment/src/id_tracker/id_tracker_base.rs (8)
  • drop_internal (65-65)
  • drop_internal (326-334)
  • internal_id (46-46)
  • internal_id (274-282)
  • external_id (51-51)
  • external_id (284-292)
  • iter_internal_versions (169-171)
  • iter_internal_versions (463-473)
lib/segment/src/id_tracker/mutable_id_tracker.rs (4)
  • drop_internal (242-253)
  • internal_id (211-213)
  • external_id (215-217)
  • iter_internal_versions (337-346)
lib/segment/src/id_tracker/immutable_id_tracker.rs (5)
  • drop_internal (434-443)
  • internal_id (407-409)
  • external_id (411-413)
  • iter_internal_versions (503-507)
  • new (281-357)
lib/segment/src/id_tracker/in_memory_id_tracker.rs (5)
  • drop_internal (89-97)
  • internal_id (64-66)
  • external_id (68-70)
  • iter_internal_versions (158-167)
  • new (24-26)
lib/segment/src/id_tracker/point_mappings.rs (3)
  • internal_id (81-86)
  • external_id (88-96)
  • new (41-53)
lib/segment/src/segment/segment_ops.rs (3)
lib/segment/src/fixtures/payload_context_fixture.rs (2)
  • internal_id (61-70)
  • cleanup_versions (187-190)
lib/segment/src/id_tracker/id_tracker_base.rs (4)
  • internal_id (46-46)
  • internal_id (274-282)
  • cleanup_versions (179-214)
  • cleanup_versions (475-483)
lib/common/common/src/counter/hardware_counter.rs (1)
  • disposable (60-73)
lib/segment/src/id_tracker/compressed/versions_store.rs (2)
lib/segment/src/id_tracker/compressed/internal_to_external.rs (1)
  • iter (94-109)
lib/segment/src/id_tracker/compressed/external_to_internal.rs (1)
  • iter (172-176)
lib/segment/src/id_tracker/immutable_id_tracker.rs (7)
lib/segment/src/fixtures/payload_context_fixture.rs (5)
  • internal_id (61-70)
  • drop_internal (91-97)
  • external_id (72-76)
  • iter_internal_versions (181-185)
  • new (39-45)
lib/segment/src/id_tracker/id_tracker_base.rs (8)
  • internal_id (46-46)
  • internal_id (274-282)
  • drop_internal (65-65)
  • drop_internal (326-334)
  • external_id (51-51)
  • external_id (284-292)
  • iter_internal_versions (169-171)
  • iter_internal_versions (463-473)
lib/segment/src/id_tracker/mutable_id_tracker.rs (4)
  • internal_id (211-213)
  • drop_internal (242-253)
  • external_id (215-217)
  • iter_internal_versions (337-346)
lib/segment/src/id_tracker/in_memory_id_tracker.rs (5)
  • internal_id (64-66)
  • drop_internal (89-97)
  • external_id (68-70)
  • iter_internal_versions (158-167)
  • new (24-26)
lib/segment/src/id_tracker/simple_id_tracker.rs (4)
  • internal_id (210-212)
  • drop_internal (234-241)
  • external_id (214-216)
  • iter_internal_versions (300-309)
lib/segment/src/id_tracker/point_mappings.rs (3)
  • internal_id (81-86)
  • external_id (88-96)
  • new (41-53)
lib/segment/src/id_tracker/compressed/compressed_point_mappings.rs (3)
  • internal_id (86-88)
  • external_id (90-96)
  • new (43-57)
lib/segment/src/fixtures/payload_context_fixture.rs (4)
lib/segment/src/id_tracker/id_tracker_base.rs (10)
  • internal_id (46-46)
  • internal_id (274-282)
  • external_id (51-51)
  • external_id (284-292)
  • drop_internal (65-65)
  • drop_internal (326-334)
  • iter_internal_versions (169-171)
  • iter_internal_versions (463-473)
  • cleanup_versions (179-214)
  • cleanup_versions (475-483)
lib/segment/src/id_tracker/mutable_id_tracker.rs (4)
  • internal_id (211-213)
  • external_id (215-217)
  • drop_internal (242-253)
  • iter_internal_versions (337-346)
lib/segment/src/id_tracker/immutable_id_tracker.rs (5)
  • internal_id (407-409)
  • external_id (411-413)
  • drop_internal (434-443)
  • iter_internal_versions (503-507)
  • new (281-357)
lib/segment/src/id_tracker/in_memory_id_tracker.rs (4)
  • internal_id (64-66)
  • external_id (68-70)
  • drop_internal (89-97)
  • iter_internal_versions (158-167)
lib/segment/src/id_tracker/mutable_id_tracker.rs (6)
lib/segment/src/fixtures/payload_context_fixture.rs (5)
  • internal_id (61-70)
  • external_id (72-76)
  • drop_internal (91-97)
  • iter_internal_versions (181-185)
  • new (39-45)
lib/segment/src/id_tracker/id_tracker_base.rs (8)
  • internal_id (46-46)
  • internal_id (274-282)
  • external_id (51-51)
  • external_id (284-292)
  • drop_internal (65-65)
  • drop_internal (326-334)
  • iter_internal_versions (169-171)
  • iter_internal_versions (463-473)
lib/segment/src/id_tracker/immutable_id_tracker.rs (5)
  • internal_id (407-409)
  • external_id (411-413)
  • drop_internal (434-443)
  • iter_internal_versions (503-507)
  • new (281-357)
lib/segment/src/id_tracker/in_memory_id_tracker.rs (5)
  • internal_id (64-66)
  • external_id (68-70)
  • drop_internal (89-97)
  • iter_internal_versions (158-167)
  • new (24-26)
lib/segment/src/id_tracker/simple_id_tracker.rs (4)
  • internal_id (210-212)
  • external_id (214-216)
  • drop_internal (234-241)
  • iter_internal_versions (300-309)
lib/segment/src/id_tracker/point_mappings.rs (3)
  • internal_id (81-86)
  • external_id (88-96)
  • new (41-53)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: integration-tests
  • GitHub Check: integration-tests-consensus
  • GitHub Check: e2e-tests
  • GitHub Check: test-consensus-compose
  • GitHub Check: test-consistency
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
  • GitHub Check: lint
🔇 Additional comments (29)
lib/common/common/src/counter/counter_cell.rs (1)

124-125: Appreciate the typo fix.

Correcting the docstring wording keeps the documentation crisp.

lib/collection/src/shards/local_shard/mod.rs (4)

44-45: LGTM! Imports support the fake operation construction.

These imports are correctly added to support the construction of the empty WAL operation in insert_fake_operation.


613-625: Verify clock tag and method visibility.

Based on past review comments, consider the following:

  1. Clock tag value: Line 621 sets clock_tag: None. However, a previous review comment from @timvisee suggested: "I think it's best to set this clock tag, to aid with WAL delta" and noted that "having an operation with clock tick 0 is semi critical" for WAL delta resolution. Verify whether clock_tag: Some(0) should be used instead of None to ensure proper WAL delta transfer behavior, especially for newly joined peers.

  2. Method visibility: The method is marked pub, which exposes it as part of the public API. If this is intended only for internal initialization during build/load, consider restricting visibility to pub(crate) or pub(super).

Based on past review comments and learning context.


649-652: LGTM! Improved observability.

The updated log message now includes both the lower bound (first_index) and upper bound (last_index) of the WAL recovery range, which improves observability during shard recovery.


608-608: Verify adding insert_fake_operation to the load path
The load method currently omits self.insert_fake_operation().await?, so shards loaded from WAL may still start at version 0. Confirm whether you need to invoke insert_fake_operation in load (e.g., after line 451, before load_from_wal) to guarantee a non-zero initial operation version.

lib/collection/src/tests/wal_recovery_test.rs (1)

17-18: LGTM!

The test logger initialization is appropriate for test contexts and aligns with the PR's goal of enabling logging during WAL recovery tests.

lib/segment/src/id_tracker/compressed/versions_store.rs (1)

97-105: LGTM!

The iter method correctly reconstructs compressed versions by pairing lower and upper parts. The implementation is consistent with the existing get method and aligns with similar iterators in related compressed mapping modules.

lib/segment/src/segment/segment_ops.rs (3)

302-327: LGTM!

The new delete_point_internal method centralizes internal deletion logic. The commented-out vector deletion propagation is appropriately explained and addresses a real ordering issue with segment flushes.


497-518: LGTM!

The updated check_consistency_and_repair correctly delegates cleanup to delete_point_internal for each ID returned by cleanup_versions. The use of a disposable hardware counter for internal operations is appropriate.


656-659: LGTM!

The updated cleanup_versions signature correctly returns the list of IDs to clean, aligning with the broader IdTracker API evolution across the codebase.

lib/segment/src/fixtures/payload_context_fixture.rs (4)

87-89: LGTM!

The updated drop method correctly delegates to drop_internal, consistent with the pattern used across other IdTracker implementations.


91-97: LGTM!

The drop_internal implementation is appropriate for this test fixture. Note that it sets the version to 0 rather than DELETED_POINT_VERSION, which is acceptable given this fixture's limited version tracking support.


181-185: LGTM!

Returning an empty iterator from iter_internal_versions is appropriate for this fixture, which doesn't track per-point versions.


187-189: LGTM!

The cleanup_versions implementation correctly returns an empty vector as this fixture doesn't support cleaning up orphan versions. The signature matches the updated IdTracker trait.

lib/segment/src/segment/mod.rs (1)

107-113: LGTM!

The conditional compilation correctly limits the flush call to RocksDB-enabled builds. The comment clearly explains the intent and necessity for RocksDB's panic avoidance.

lib/segment/src/segment/entry.rs (1)

145-149: LGTM!

The deletion flow is correctly simplified by delegating to delete_point_internal, while maintaining proper version tracking via version_tracker.set_payload.

lib/segment/src/id_tracker/simple_id_tracker.rs (2)

234-241: Implementation aligns with deprecation plan.

The drop_internal method implements the minimal required functionality for trait compliance. Note that it doesn't set DELETED_POINT_VERSION, which is a known limitation due to SimpleIdTracker's incompatible version storage structure and planned deprecation. Based on learnings.


300-309: LGTM!

The iter_internal_versions implementation correctly iterates over the internal version storage, consistent with the pattern used in other IdTracker implementations.

lib/segment/src/id_tracker/immutable_id_tracker.rs (4)

388-405: LGTM!

The simplified set_internal_version correctly enforces that the immutable tracker cannot extend the version list, with an appropriate debug assertion.


423-432: LGTM!

The drop method correctly marks the point as deleted and sets its version to DELETED_POINT_VERSION, ensuring proper recovery behavior.


434-443: LGTM!

The drop_internal method correctly handles internal deletion by dropping the external mapping and marking the point as deleted with DELETED_POINT_VERSION.


503-507: LGTM!

The iter_internal_versions implementation correctly delegates to the compressed versions iterator, providing the required trait functionality.

lib/segment/src/id_tracker/mutable_id_tracker.rs (4)

88-93: Doc update clarifies dedup semantics — good

Clarifies append-only persistence and in-memory dedup to latest. LGTM.


231-240: Correctly mark deletions with DELETED_POINT_VERSION

Setting version=DELETED_POINT_VERSION on drop enables effective repair. LGTM.


242-252: drop_internal flow looks correct

Drops external mapping if present and zeroes internal version. Matches new semantics. LGTM.


337-345: iter_internal_versions exposure is appropriate

Provides full (internal_id, version) enumeration needed by cleanup. LGTM.

lib/segment/src/id_tracker/in_memory_id_tracker.rs (1)

158-167: iter_internal_versions impl is fine

Enumerates into (internal_id, version) pairs as expected. LGTM.

lib/segment/src/id_tracker/id_tracker_base.rs (2)

25-27: DELETED_POINT_VERSION docs — good

Clarifies meaning vs. state. Matches prior feedback. LGTM.


167-172: iter_internal_versions requirement — good

Needed by repair flow. LGTM.

Comment on lines 81 to +87
fn drop(&mut self, external_id: PointIdType) -> OperationResult<()> {
self.mappings.drop(external_id);
let internal_id = self.mappings.drop(external_id);
if let Some(internal_id) = internal_id {
self.set_internal_version(internal_id, DELETED_POINT_VERSION)?;
}
Ok(())
}
Copy link
Contributor

@coderabbitai coderabbitai bot Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Bug: version zeroing is a no‑op due to mapping‑existence guard

You drop the mapping, then call set_internal_version; but set_internal_version only updates when a mapping exists, so the version isn’t zeroed. This breaks cleanup invariants and contradicts trait docs.

Fix by setting the version before removing the mapping:

Apply these diffs.

For drop(external_id):

-        let internal_id = self.mappings.drop(external_id);
-        if let Some(internal_id) = internal_id {
-            self.set_internal_version(internal_id, DELETED_POINT_VERSION)?;
-        }
-        Ok(())
+        let internal_id = self.mappings.internal_id(&external_id);
+        if let Some(internal_id) = internal_id {
+            self.set_internal_version(internal_id, DELETED_POINT_VERSION)?;
+        }
+        self.mappings.drop(external_id);
+        Ok(())

For drop_internal(internal_id):

-        if let Some(external_id) = self.mappings.external_id(internal_id) {
-            self.mappings.drop(external_id);
-        }
-
-        self.set_internal_version(internal_id, DELETED_POINT_VERSION)?;
+        self.set_internal_version(internal_id, DELETED_POINT_VERSION)?;
+        if let Some(external_id) = self.mappings.external_id(internal_id) {
+            self.mappings.drop(external_id);
+        }

Also applies to: 89-97

🤖 Prompt for AI Agents
In lib/segment/src/id_tracker/in_memory_id_tracker.rs around lines 81-87 and
89-97, the drop logic currently removes the mapping before calling
set_internal_version, but set_internal_version is a no-op if the mapping is
already removed; change both drop(external_id) and drop_internal(internal_id) to
set the internal version to DELETED_POINT_VERSION before removing the mapping so
the version gets zeroed reliably, then proceed to remove the mapping and return
Ok; ensure error propagation from set_internal_version is preserved.

Copy link
Member

@timvisee timvisee Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call.

I flipped the ordering for this one in 1451dcd

cc @generall

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Oct 14, 2025
@timvisee timvisee force-pushed the fix-segment-clearing branch from 6b088b7 to 1451dcd Compare October 14, 2025 09:35
@qdrant qdrant deleted a comment from coderabbitai bot Oct 14, 2025
coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Oct 14, 2025
coderabbitai[bot]

This comment was marked as resolved.

@generall generall requested a review from timvisee October 14, 2025 11:50
@qdrant qdrant deleted a comment from coderabbitai bot Oct 14, 2025
@timvisee timvisee merged commit 02de826 into dev Oct 14, 2025
15 checks passed
@timvisee timvisee deleted the fix-segment-clearing branch October 14, 2025 12:06
timvisee added a commit that referenced this pull request Nov 14, 2025
* fix segment repair on load

* workaround for rocksdb

* Minor comment tweaks

* In in memory ID tracker, unset version first before dropping mapping

* rename cleanup_versions into fix inconsistencies

---------

Co-authored-by: timvisee <tim@visee.me>
@timvisee timvisee mentioned this pull request Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants