Skip to content

derive Eq for Filter (and underlying types)#7419

Merged
generall merged 8 commits intodevfrom
make-filter-eq
Oct 20, 2025
Merged

derive Eq for Filter (and underlying types)#7419
generall merged 8 commits intodevfrom
make-filter-eq

Conversation

@generall
Copy link
Member

No description provided.

@generall generall requested review from KShivendu and coszio October 17, 2025 17:43
coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@generall generall mentioned this pull request Oct 17, 2025
2 tasks
@qdrant qdrant deleted a comment from coderabbitai bot Oct 20, 2025
coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Oct 20, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 20, 2025

📝 Walkthrough

Walkthrough

This PR systematically introduces OrderedFloat wrappers for floating-point values across the codebase to enable total ordering for numeric comparisons. Key changes include: converting Range types to use Range<OrderedFloat<FloatPayloadType>> instead of Range<FloatPayloadType>, changing GeoPoint coordinates and GeoRadius radius from f64 to OrderedFloat<f64>, and introducing a new public CustomIdChecker wrapper with Condition::new_custom constructor. Multiple types gain Eq, Hash, and Ord trait derives. Field access patterns are updated to use .0 notation for inner values, and GeoPoint construction shifts from struct literals to GeoPoint::new_unchecked. The changes propagate consistently across benchmarks, integration tests, filter types, and internal indexing logic.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

This PR affects 40+ files with a systematic but extensive refactoring. While individual changes follow consistent patterns (wrapping with OrderedFloat, updating field access via .0, replacing constructors), the scope is large and involves:

  • Public API surface modifications (Range, GeoPoint, Condition types)
  • Coordinated changes across multiple interconnected systems (geo indexing, numeric indexing, condition checking)
  • Extensive field access pattern updates requiring careful verification
  • Multiple trait derive additions affecting equality/hashing semantics

The homogeneity of the pattern mitigates per-file complexity, but the breadth demands careful attention to consistency and potential interaction issues.

Possibly related PRs

  • qdrant/qdrant#6644: Modifies GeoPoint and related geo type conversions in lib/segment/src/types.rs, directly overlapping with this PR's coordinate representation changes.

Suggested reviewers

  • timvisee
  • coszio
  • KShivendu

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning No pull request description was provided by the author. The description field is completely empty, which means it does not describe any part of the changeset or explain the motivation and scope of the changes. According to the evaluation criteria, a description that fails to describe any part of the changeset falls under the failure condition. Add a pull request description that explains the purpose and scope of the changes. At minimum, the description should clarify why OrderedFloat wrappers are being introduced, how the trait derivations support this effort, and any breaking changes to public APIs (such as the GeoPoint constructor changes and Range field type modifications).
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "derive Eq for Filter (and underlying types)" is related to the changeset but significantly understates the scope of changes. While the PR does add Eq derives to Filter and many underlying types (FieldCondition, GeoPoint, RangeInterface, etc.), the changeset contains substantial additional work beyond trait derivation. Major changes include introducing OrderedFloat wrappers throughout the codebase to wrap numeric types, adding new constructors like GeoPoint::new_unchecked, modifying type signatures (e.g., Range fields now use OrderedFloat), and refactoring condition creation with Condition::new_custom. The title captures only one aspect of the work and is overly narrow in scope.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch make-filter-eq

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (7)
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (2)

351-359: Fix exponential/gaussian decay: sign must be negative to keep range [0,1].

Currently returns e^(+λ⋅d) and e^(+λ⋅d²) which grow unbounded and violate the debug assertion. Use negative exponent.

Apply:

 fn exp_decay(x: PreciseScore, target: PreciseScore, lambda: PreciseScore) -> PreciseScore {
     let diff = (x - target).abs();
-    (lambda * diff).exp()
+    (-lambda * diff).exp()
 }
 
 fn gauss_decay(x: PreciseScore, target: PreciseScore, lambda: PreciseScore) -> PreciseScore {
     let diff = x - target;
-    (lambda * diff * diff).exp()
+    (-lambda * diff * diff).exp()
 }

168-175: Use correct datetime units; divide millis by 1_000, not microseconds.

Encoding/decoding uses milliseconds; here we divide by 1_000_000 and tests match that. Align to ms→s.

Apply:

-// Convert from i64 to f64.
-// f64's 53 bits of sign + mantissa for microseconds means a span of exact equivalence of
-// about 285 years, after which precision starts dropping
-let float_micros = datetime.timestamp() as PreciseScore;
-
-// Convert to seconds
-let float_seconds = float_micros / 1_000_000.0;
-
-Ok(float_seconds)
+// DateTimePayloadType::timestamp() is milliseconds.
+let millis = datetime.timestamp() as PreciseScore;
+// Convert to seconds
+let seconds = millis / 1_000.0;
+Ok(seconds)

And update tests:

-Ok("2025-03-18".parse::<DateTimePayloadType>().unwrap().timestamp() as PreciseScore / 1_000_000.0)
+Ok("2025-03-18".parse::<DateTimePayloadType>().unwrap().timestamp() as PreciseScore / 1_000.0)

-Ok("2025-03-19T12:00:00".parse::<DateTimePayloadType>().unwrap().timestamp() as PreciseScore / 1_000_000.0)
+Ok("2025-03-19T12:00:00".parse::<DateTimePayloadType>().unwrap().timestamp() as PreciseScore / 1_000.0)

Also applies to: 536-538, 550-552

lib/segment/src/index/field_index/geo_index/mod.rs (2)

229-231: Fix encoding after GeoPoint fields became OrderedFloat

to_be_bytes() is on f64. With GeoPoint { lat: OrderedFloat(f64), lon: OrderedFloat(f64) }, this must unwrap the inner value; otherwise this won’t compile and risks wrong serialization if coerced later.

Apply:

 fn encode_db_value(value: &GeoPoint) -> [u8; 16] {
     let mut result: [u8; 16] = [0; 16];
-    result[0..8].clone_from_slice(&value.lat.to_be_bytes());
-    result[8..16].clone_from_slice(&value.lon.to_be_bytes());
+    result[0..8].clone_from_slice(&value.lat.0.to_be_bytes());
+    result[8..16].clone_from_slice(&value.lon.0.to_be_bytes());
     result
 }

998-1010: Tests: unwrap OrderedFloat before trig conversions

to_radians() is on f64. Unwrap OrderedFloat fields in radius_to_polygon.

-    let angular_lat = circle.center.lat.to_radians();
+    let angular_lat = circle.center.lat.0.to_radians();
     let mut min_lat = (angular_lat - angular_radius).to_degrees();
     let mut max_lat = (angular_lat + angular_radius).to_degrees();

-    let (min_lon, max_lon) = if LAT_RANGE.start < min_lat && max_lat < LAT_RANGE.end {
-        let angular_lon = circle.center.lon.to_radians();
+    let (min_lon, max_lon) = if LAT_RANGE.start < min_lat && max_lat < LAT_RANGE.end {
+        let angular_lon = circle.center.lon.0.to_radians();
         let delta_lon = (angular_radius.sin() / angular_lat.cos()).asin();
lib/segment/src/index/field_index/geo_hash.rs (1)

539-547: Unwrap OrderedFloat before angle ops

to_radians() is on f64. Unwrap OrderedFloat fields first.

-    let angular_lat = circle.center.lat.to_radians();
+    let angular_lat = circle.center.lat.0.to_radians();
@@
-        let angular_lon = circle.center.lon.to_radians();
+        let angular_lon = circle.center.lon.0.to_radians();
lib/segment/src/types.rs (2)

2693-2699: Fix: hashing OrderedFloat radius without re-wrapping.

radius is already OrderedFloat<f64>. OrderedFloat(*radius) is a type error and will not compile. Hash the wrapper directly and update the comment.

Apply this diff:

 impl Hash for GeoRadius {
     fn hash<H: hash::Hasher>(&self, state: &mut H) {
         let GeoRadius { center, radius } = self;
         center.hash(state);
-        // Hash f64 by converting to bits
-        OrderedFloat(*radius).hash(state);
+        // Hash radius via its OrderedFloat wrapper
+        radius.hash(state);
     }
 }

2754-2756: Fix: arithmetic on OrderedFloat in polygon validation.

first.lat/first.lon are OrderedFloat<f64>. Calling .abs() on the wrapper won’t compile. Compare using inner f64.

Apply this diff:

-            && ((first.lat - last.lat).abs() > f64::EPSILON
-                || (first.lon - last.lon).abs() > f64::EPSILON)
+            && ((first.lat.0 - last.lat.0).abs() > f64::EPSILON
+                || (first.lon.0 - last.lon.0).abs() > f64::EPSILON)
🧹 Nitpick comments (15)
lib/segment/tests/integration/multivector_filtrable_hnsw_test.rs (1)

8-8: Consider using explicit OrderedFloat::from() for consistency.

The direct constructor syntax OrderedFloat(value) works correctly, but the coding guidelines suggest preferring explicit SomeType::from(x) over implicit conversions. For consistency across the codebase, consider using OrderedFloat::from(f64::from(...)) instead.

As per coding guidelines.

Apply this diff for consistency:

-                gte: Some(OrderedFloat(f64::from(left_range))),
-                lte: Some(OrderedFloat(f64::from(right_range))),
+                gte: Some(OrderedFloat::from(f64::from(left_range))),
+                lte: Some(OrderedFloat::from(f64::from(right_range))),

Also applies to: 175-176

lib/segment/tests/integration/byte_storage_hnsw_test.rs (1)

8-8: Consider using explicit OrderedFloat::from() for consistency.

The direct constructor syntax OrderedFloat(value) works correctly, but the coding guidelines suggest preferring explicit SomeType::from(x). For consistency with other test files like gpu_hnsw_test.rs, consider using OrderedFloat::from(f64::from(...)) instead.

As per coding guidelines.

Apply this diff for consistency:

-                gte: Some(OrderedFloat(f64::from(left_range))),
-                lte: Some(OrderedFloat(f64::from(right_range))),
+                gte: Some(OrderedFloat::from(f64::from(left_range))),
+                lte: Some(OrderedFloat::from(f64::from(right_range))),

Also applies to: 227-228

lib/segment/tests/integration/filtrable_hnsw_test.rs (1)

10-10: Consider using explicit OrderedFloat::from() for consistency.

The direct constructor syntax OrderedFloat(value) works correctly, but the coding guidelines suggest preferring explicit SomeType::from(x). For consistency across test files, consider using OrderedFloat::from(f64::from(...)) instead.

As per coding guidelines.

Apply this diff for consistency:

-                gte: Some(OrderedFloat(f64::from(left_range))),
-                lte: Some(OrderedFloat(f64::from(right_range))),
+                gte: Some(OrderedFloat::from(f64::from(left_range))),
+                lte: Some(OrderedFloat::from(f64::from(right_range))),

Also applies to: 189-190

lib/segment/benches/range_filtering.rs (1)

10-10: Consider using explicit OrderedFloat::from() for consistency.

The direct constructor syntax OrderedFloat(value) works correctly, but the coding guidelines suggest preferring explicit SomeType::from(x). For consistency across the codebase, consider using OrderedFloat::from(...) instead.

As per coding guidelines.

Apply this diff for consistency:

-            gte: Some(OrderedFloat(rng.random_range(0.0..MAX_RANGE / 2.0))),
-            lte: Some(OrderedFloat(rng.random_range(MAX_RANGE / 2.0..MAX_RANGE))),
+            gte: Some(OrderedFloat::from(rng.random_range(0.0..MAX_RANGE / 2.0))),
+            lte: Some(OrderedFloat::from(rng.random_range(MAX_RANGE / 2.0..MAX_RANGE))),

Also applies to: 34-35

lib/segment/src/data_types/order_by.rs (1)

91-96: Consider using explicit OrderedFloat::from() for consistency.

The direct constructor syntax OrderedFloat(value) works correctly, but the coding guidelines suggest preferring explicit SomeType::from(x) over implicit conversions. For consistency and clarity in production code, consider using OrderedFloat::from(...) instead.

As per coding guidelines.

Apply this diff for consistency:

                 StartFrom::Integer(i) => {
-                    RangeInterface::Float(self.direction().as_range_from(OrderedFloat(*i as f64)))
+                    RangeInterface::Float(self.direction().as_range_from(OrderedFloat::from(*i as f64)))
                 }
                 StartFrom::Float(f) => {
-                    RangeInterface::Float(self.direction().as_range_from(OrderedFloat(*f)))
+                    RangeInterface::Float(self.direction().as_range_from(OrderedFloat::from(*f)))
                 }
lib/collection/benches/batch_query_bench.rs (1)

21-21: Consider using explicit OrderedFloat::from() for consistency.

The direct constructor syntax OrderedFloat(value) works correctly, but the coding guidelines suggest preferring explicit SomeType::from(x). For consistency across the codebase, consider using OrderedFloat::from(...) instead.

As per coding guidelines.

Apply this diff for consistency:

                 Range {
                     lt: None,
-                    gt: Some(OrderedFloat(-1.)),
+                    gt: Some(OrderedFloat::from(-1.)),
                     gte: None,
-                    lte: Some(OrderedFloat(100.0)),
+                    lte: Some(OrderedFloat::from(100.0)),
                 },

Also applies to: 141-143

lib/segment/tests/integration/byte_storage_quantization_test.rs (1)

380-382: Use OrderedFloat::from(...) for consistency.

Minor style tweak: prefer OrderedFloat::from(x) over tuple construction to align with the explicit From guideline.

-                gte: Some(OrderedFloat(f64::from(left_range))),
-                lte: Some(OrderedFloat(f64::from(right_range))),
+                gte: Some(OrderedFloat::from(f64::from(left_range))),
+                lte: Some(OrderedFloat::from(f64::from(right_range))),

As per coding guidelines.

lib/shard/src/query/mod.rs (1)

461-462: Prefer OrderedFloat::from(lambda).

Style-only: explicit From reads clearer and matches the codebase trend.

-                    lambda: OrderedFloat(lambda),
+                    lambda: OrderedFloat::from(lambda),

As per coding guidelines.

lib/segment/src/index/field_index/geo_index/mutable_geo_index.rs (1)

558-561: LGTM: consistent geohash encoding during add.

Matches OrderedFloat wrapping and centralized error reporting.

Consider a small helper to reduce repetition and centralize messages:

fn geohash_of(p: &GeoPoint) -> Result<GeoHash, OperationError> {
    encode_max_precision(p.lon.0, p.lat.0)
        .map_err(|e| OperationError::service_error(format!("Malformed geo points: {e}")))
}

Then replace call sites with geohash_of(&point).

lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (1)

151-152: Prefer explicit From over implicit Into for geo points.

Make the conversions explicit for readability and guideline compliance.

As per coding guidelines.

-Ok(Haversine.distance((*origin).into(), value.into()))
+Ok(Haversine.distance(geo::Point::from(*origin), geo::Point::from(value)))
lib/segment/src/payload_storage/query_checker.rs (1)

276-283: Replace .into() with explicit constructors for OwnedPayloadRef.

Switch to OwnedPayloadRef::from to follow repository style and avoid implicit conversions.

As per coding guidelines.

-                            s.payload_ptr(point_id).map(|x| x.into())
+                            s.payload_ptr(point_id).map(OwnedPayloadRef::from)
...
-                            s.payload_ptr(point_id).map(|x| x.into())
+                            s.payload_ptr(point_id).map(OwnedPayloadRef::from)
...
-                        .replace(payload_ptr.or_else(|| Some((&self.empty_payload).into())));
+                        .replace(payload_ptr.or_else(|| Some(OwnedPayloadRef::from(&self.empty_payload))));

Also applies to: 300-307, 311-314

src/actix/api/local_shard_api.rs (1)

190-204: Avoid implicit .into() for Option; use explicit Some(...).

Clearer and fits project style.

As per coding guidelines.

-            Some(filter) => get_hash_ring_filter(
+            Some(filter) => Some(get_hash_ring_filter(
                 &dispatcher,
                 &access,
                 &path.collection,
                 AccessRequirements::new(),
                 filter.expected_shard_id,
                 &pass,
-            )
-            .await?
-            .into(),
+            )
+            .await?),
             None => None,
         };
-        (Some(filter), Some(hash_ring)) => hash_ring.merge_owned(filter).into(),
-        (Some(filter), None) => filter.into(),
-        (None, Some(hash_ring)) => hash_ring.into(),
+        (Some(filter), Some(hash_ring)) => Some(hash_ring.merge_owned(filter)),
+        (Some(filter), None) => Some(filter),
+        (None, Some(hash_ring)) => Some(hash_ring),

Also applies to: 307-314

lib/segment/src/index/query_optimization/condition_converter.rs (1)

189-191: Avoid reaching into newtype internals for CustomIdChecker

Using cond.0.check(point_id) leaks the wrapper’s representation. Prefer destructuring the newtype or exposing a check(&self, id) on CustomIdChecker and calling it directly.

Example:

-            Condition::CustomIdChecker(cond) => {
+            Condition::CustomIdChecker(cond) => {
                 let segment_ids: AHashSet<_> = id_tracker
                     .iter_external()
-                    .filter(|&point_id| cond.0.check(point_id))
+                    .filter(|&point_id| cond.check(point_id)) // or destructure: `CustomIdChecker(inner)`
                     .filter_map(|external_id| id_tracker.internal_id(external_id))
                     .collect();
lib/segment/src/index/field_index/geo_hash.rs (1)

259-261: Follow guideline: prefer explicit From over Into

Use Coord::from((lon, lat)) instead of (lon, lat).into().

As per coding guidelines:

-pub fn encode_max_precision(lon: f64, lat: f64) -> Result<GeoHash, GeohashError> {
-    let encoded_string = encode((lon, lat).into(), GEOHASH_MAX_LENGTH)?;
+pub fn encode_max_precision(lon: f64, lat: f64) -> Result<GeoHash, GeohashError> {
+    let encoded_string = encode(Coord::from((lon, lat)), GEOHASH_MAX_LENGTH)?;
     GeoHash::try_from(encoded_string)
 }
lib/segment/src/types.rs (1)

3191-3197: Update comments to match pointer-identity Eq/Hash semantics.

The implementation now hashes and compares by Arc pointer identity, but comments still state “cannot hash/compare” and imply degenerate behavior.

Apply this diff:

 impl Hash for CustomIdChecker {
     fn hash<H: hash::Hasher>(&self, state: &mut H) {
-        // We cannot hash the inner function
-        // This means that two different CustomIdChecker conditions will have the same hash,
-        // but that's acceptable since we cannot do better, and only expected to be used
-        // for logging and profiling purposes.
+        // Hash by pointer identity of the inner Arc target
         std::ptr::hash(Arc::as_ptr(&self.0), state);
     }
 }

 impl PartialEq for CustomIdChecker {
     fn eq(&self, other: &Self) -> bool {
-        // We cannot compare the inner function
-        // This means that two different CustomIdChecker conditions will never be equal,
-        // but that's acceptable since we cannot do better, and only expected to be used
-        // for logging and profiling purposes.
+        // Pointer identity comparison; reflexive and stable
         Arc::ptr_eq(&self.0, &other.0)
     }
 }

Based on learnings.

Also applies to: 3201-3207

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 75434ca and 4a5e498.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (35)
  • lib/api/src/grpc/conversions.rs (7 hunks)
  • lib/collection/benches/batch_query_bench.rs (2 hunks)
  • lib/collection/benches/batch_search_bench.rs (2 hunks)
  • lib/collection/src/operations/universal_query/collection_query.rs (1 hunks)
  • lib/collection/src/shards/shard_holder/resharding.rs (1 hunks)
  • lib/collection/src/tests/payload.rs (2 hunks)
  • lib/edge/python/Cargo.toml (1 hunks)
  • lib/edge/python/src/types/filter/geo.rs (2 hunks)
  • lib/edge/python/src/types/filter/range.rs (3 hunks)
  • lib/segment/benches/range_filtering.rs (2 hunks)
  • lib/segment/src/data_types/order_by.rs (1 hunks)
  • lib/segment/src/fixtures/payload_fixtures.rs (2 hunks)
  • lib/segment/src/index/field_index/geo_hash.rs (23 hunks)
  • lib/segment/src/index/field_index/geo_index/immutable_geo_index.rs (2 hunks)
  • lib/segment/src/index/field_index/geo_index/mod.rs (17 hunks)
  • lib/segment/src/index/field_index/geo_index/mutable_geo_index.rs (5 hunks)
  • lib/segment/src/index/field_index/mmap_point_to_values.rs (3 hunks)
  • lib/segment/src/index/field_index/numeric_index/mod.rs (5 hunks)
  • lib/segment/src/index/field_index/numeric_index/tests.rs (2 hunks)
  • lib/segment/src/index/query_optimization/condition_converter.rs (3 hunks)
  • lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (2 hunks)
  • lib/segment/src/index/struct_payload_index.rs (1 hunks)
  • lib/segment/src/payload_storage/condition_checker.rs (6 hunks)
  • lib/segment/src/payload_storage/query_checker.rs (4 hunks)
  • lib/segment/src/types.rs (25 hunks)
  • lib/segment/tests/integration/byte_storage_hnsw_test.rs (2 hunks)
  • lib/segment/tests/integration/byte_storage_quantization_test.rs (2 hunks)
  • lib/segment/tests/integration/exact_search_test.rs (2 hunks)
  • lib/segment/tests/integration/filtrable_hnsw_test.rs (2 hunks)
  • lib/segment/tests/integration/gpu_hnsw_test.rs (2 hunks)
  • lib/segment/tests/integration/multivector_filtrable_hnsw_test.rs (2 hunks)
  • lib/segment/tests/integration/multivector_quantization_test.rs (2 hunks)
  • lib/segment/tests/integration/payload_index_test.rs (5 hunks)
  • lib/shard/src/query/mod.rs (1 hunks)
  • src/actix/api/local_shard_api.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead

Files:

  • lib/collection/src/tests/payload.rs
  • lib/segment/src/data_types/order_by.rs
  • lib/shard/src/query/mod.rs
  • lib/segment/tests/integration/filtrable_hnsw_test.rs
  • lib/segment/tests/integration/byte_storage_quantization_test.rs
  • lib/segment/src/index/query_optimization/condition_converter.rs
  • lib/segment/src/index/field_index/numeric_index/mod.rs
  • lib/segment/src/fixtures/payload_fixtures.rs
  • lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs
  • lib/collection/benches/batch_query_bench.rs
  • lib/segment/src/payload_storage/condition_checker.rs
  • lib/collection/src/operations/universal_query/collection_query.rs
  • lib/collection/src/shards/shard_holder/resharding.rs
  • lib/edge/python/src/types/filter/geo.rs
  • lib/segment/src/index/struct_payload_index.rs
  • src/actix/api/local_shard_api.rs
  • lib/segment/src/index/field_index/geo_index/immutable_geo_index.rs
  • lib/segment/src/payload_storage/query_checker.rs
  • lib/segment/tests/integration/multivector_filtrable_hnsw_test.rs
  • lib/segment/tests/integration/byte_storage_hnsw_test.rs
  • lib/segment/tests/integration/multivector_quantization_test.rs
  • lib/edge/python/src/types/filter/range.rs
  • lib/segment/tests/integration/gpu_hnsw_test.rs
  • lib/segment/src/index/field_index/geo_index/mutable_geo_index.rs
  • lib/segment/src/index/field_index/numeric_index/tests.rs
  • lib/collection/benches/batch_search_bench.rs
  • lib/segment/tests/integration/exact_search_test.rs
  • lib/segment/tests/integration/payload_index_test.rs
  • lib/segment/src/index/field_index/mmap_point_to_values.rs
  • lib/segment/benches/range_filtering.rs
  • lib/api/src/grpc/conversions.rs
  • lib/segment/src/index/field_index/geo_hash.rs
  • lib/segment/src/index/field_index/geo_index/mod.rs
  • lib/segment/src/types.rs
**/src/**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)

Files:

  • lib/collection/src/tests/payload.rs
  • lib/segment/src/data_types/order_by.rs
  • lib/shard/src/query/mod.rs
  • lib/segment/src/index/query_optimization/condition_converter.rs
  • lib/segment/src/index/field_index/numeric_index/mod.rs
  • lib/segment/src/fixtures/payload_fixtures.rs
  • lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs
  • lib/segment/src/payload_storage/condition_checker.rs
  • lib/collection/src/operations/universal_query/collection_query.rs
  • lib/collection/src/shards/shard_holder/resharding.rs
  • lib/edge/python/src/types/filter/geo.rs
  • lib/segment/src/index/struct_payload_index.rs
  • src/actix/api/local_shard_api.rs
  • lib/segment/src/index/field_index/geo_index/immutable_geo_index.rs
  • lib/segment/src/payload_storage/query_checker.rs
  • lib/edge/python/src/types/filter/range.rs
  • lib/segment/src/index/field_index/geo_index/mutable_geo_index.rs
  • lib/segment/src/index/field_index/numeric_index/tests.rs
  • lib/segment/src/index/field_index/mmap_point_to_values.rs
  • lib/api/src/grpc/conversions.rs
  • lib/segment/src/index/field_index/geo_hash.rs
  • lib/segment/src/index/field_index/geo_index/mod.rs
  • lib/segment/src/types.rs
**/{tests,benches}/**/*.rs

📄 CodeRabbit inference engine (.github/review-rules.md)

Using .unwrap() and panic!() in tests and benchmarks is acceptable and should not be flagged

Files:

  • lib/collection/src/tests/payload.rs
  • lib/segment/tests/integration/filtrable_hnsw_test.rs
  • lib/segment/tests/integration/byte_storage_quantization_test.rs
  • lib/collection/benches/batch_query_bench.rs
  • lib/segment/tests/integration/multivector_filtrable_hnsw_test.rs
  • lib/segment/tests/integration/byte_storage_hnsw_test.rs
  • lib/segment/tests/integration/multivector_quantization_test.rs
  • lib/segment/tests/integration/gpu_hnsw_test.rs
  • lib/collection/benches/batch_search_bench.rs
  • lib/segment/tests/integration/exact_search_test.rs
  • lib/segment/tests/integration/payload_index_test.rs
  • lib/segment/benches/range_filtering.rs
🧠 Learnings (2)
📚 Learning: 2025-08-15T11:42:00.297Z
Learnt from: IvanPleshkov
PR: qdrant/qdrant#7043
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:86-90
Timestamp: 2025-08-15T11:42:00.297Z
Learning: In lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs, overflow protection for encoded_storage_size computation (quantized_vector_size * vectors_count) is implemented in PR #7048, using checked_mul with u64 casting to prevent silent overflow.

Applied to files:

  • lib/segment/tests/integration/byte_storage_quantization_test.rs
  • lib/segment/tests/integration/multivector_quantization_test.rs
📚 Learning: 2025-08-23T22:24:44.276Z
Learnt from: generall
PR: qdrant/qdrant#7100
File: tests/openapi/test_fts.py:255-263
Timestamp: 2025-08-23T22:24:44.276Z
Learning: In Qdrant's OpenAPI schema, the `filter.must` field accepts either a single Condition object or an array of Condition objects, not just arrays. The schema uses `anyOf` to allow both formats.

Applied to files:

  • lib/segment/src/types.rs
🧬 Code graph analysis (22)
lib/segment/tests/integration/filtrable_hnsw_test.rs (1)
lib/segment/src/types.rs (16)
  • from (140-142)
  • from (200-202)
  • from (256-261)
  • from (265-270)
  • from (851-853)
  • from (857-859)
  • from (863-865)
  • from (900-909)
  • from (913-916)
  • from (943-952)
  • from (964-967)
  • from (1204-1248)
  • from (1780-1787)
  • from (1791-1793)
  • from (1797-1799)
  • from (1901-1903)
lib/segment/tests/integration/byte_storage_quantization_test.rs (1)
lib/segment/src/types.rs (16)
  • from (140-142)
  • from (200-202)
  • from (256-261)
  • from (265-270)
  • from (851-853)
  • from (857-859)
  • from (863-865)
  • from (900-909)
  • from (913-916)
  • from (943-952)
  • from (964-967)
  • from (1204-1248)
  • from (1780-1787)
  • from (1791-1793)
  • from (1797-1799)
  • from (1901-1903)
lib/segment/src/index/field_index/numeric_index/mod.rs (1)
lib/segment/src/index/field_index/histogram.rs (4)
  • from_f64 (56-56)
  • from_f64 (79-81)
  • from_f64 (100-102)
  • from_f64 (121-123)
lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (2)
lib/segment/src/types.rs (6)
  • new_unchecked (1760-1765)
  • new (424-437)
  • new (1755-1758)
  • new (3136-3138)
  • new (3419-3421)
  • new (3432-3434)
lib/segment/src/index/query_optimization/rescore_formula/parsed_formula.rs (1)
  • new_geo_distance (147-149)
lib/segment/src/payload_storage/condition_checker.rs (1)
lib/segment/src/types.rs (1)
  • new_unchecked (1760-1765)
lib/collection/src/shards/shard_holder/resharding.rs (1)
lib/segment/src/types.rs (7)
  • new_must_not (3578-3585)
  • new_custom (3182-3184)
  • new (424-437)
  • new (1755-1758)
  • new (3136-3138)
  • new (3419-3421)
  • new (3432-3434)
src/actix/api/local_shard_api.rs (2)
lib/segment/src/types.rs (6)
  • new_custom (3182-3184)
  • new (424-437)
  • new (1755-1758)
  • new (3136-3138)
  • new (3419-3421)
  • new (3432-3434)
lib/collection/src/shards/shard_holder/resharding.rs (1)
  • hash_ring_filter (499-513)
lib/segment/src/index/field_index/geo_index/immutable_geo_index.rs (1)
lib/segment/src/index/field_index/geo_hash.rs (1)
  • encode_max_precision (258-261)
lib/segment/src/payload_storage/query_checker.rs (1)
lib/segment/src/types.rs (7)
  • new_unchecked (1760-1765)
  • new_geo_bounding_box (2906-2918)
  • new (424-437)
  • new (1755-1758)
  • new (3136-3138)
  • new (3419-3421)
  • new (3432-3434)
lib/segment/tests/integration/multivector_filtrable_hnsw_test.rs (1)
lib/segment/src/types.rs (16)
  • from (140-142)
  • from (200-202)
  • from (256-261)
  • from (265-270)
  • from (851-853)
  • from (857-859)
  • from (863-865)
  • from (900-909)
  • from (913-916)
  • from (943-952)
  • from (964-967)
  • from (1204-1248)
  • from (1780-1787)
  • from (1791-1793)
  • from (1797-1799)
  • from (1901-1903)
lib/segment/tests/integration/byte_storage_hnsw_test.rs (1)
lib/segment/src/types.rs (16)
  • from (140-142)
  • from (200-202)
  • from (256-261)
  • from (265-270)
  • from (851-853)
  • from (857-859)
  • from (863-865)
  • from (900-909)
  • from (913-916)
  • from (943-952)
  • from (964-967)
  • from (1204-1248)
  • from (1780-1787)
  • from (1791-1793)
  • from (1797-1799)
  • from (1901-1903)
lib/segment/tests/integration/multivector_quantization_test.rs (1)
lib/segment/src/types.rs (16)
  • from (140-142)
  • from (200-202)
  • from (256-261)
  • from (265-270)
  • from (851-853)
  • from (857-859)
  • from (863-865)
  • from (900-909)
  • from (913-916)
  • from (943-952)
  • from (964-967)
  • from (1204-1248)
  • from (1780-1787)
  • from (1791-1793)
  • from (1797-1799)
  • from (1901-1903)
lib/segment/tests/integration/gpu_hnsw_test.rs (1)
lib/segment/src/types.rs (16)
  • from (140-142)
  • from (200-202)
  • from (256-261)
  • from (265-270)
  • from (851-853)
  • from (857-859)
  • from (863-865)
  • from (900-909)
  • from (913-916)
  • from (943-952)
  • from (964-967)
  • from (1204-1248)
  • from (1780-1787)
  • from (1791-1793)
  • from (1797-1799)
  • from (1901-1903)
lib/segment/src/index/field_index/geo_index/mutable_geo_index.rs (2)
lib/segment/src/index/field_index/geo_hash.rs (1)
  • encode_max_precision (258-261)
lib/segment/src/common/operation_error.rs (1)
  • service_error (84-89)
lib/segment/src/index/field_index/numeric_index/tests.rs (2)
lib/segment/src/types.rs (6)
  • new_range (2875-2887)
  • new (424-437)
  • new (1755-1758)
  • new (3136-3138)
  • new (3419-3421)
  • new (3432-3434)
lib/segment/src/index/field_index/numeric_index/mod.rs (2)
  • index (149-165)
  • new (826-828)
lib/segment/tests/integration/exact_search_test.rs (1)
lib/segment/src/types.rs (16)
  • from (140-142)
  • from (200-202)
  • from (256-261)
  • from (265-270)
  • from (851-853)
  • from (857-859)
  • from (863-865)
  • from (900-909)
  • from (913-916)
  • from (943-952)
  • from (964-967)
  • from (1204-1248)
  • from (1780-1787)
  • from (1791-1793)
  • from (1797-1799)
  • from (1901-1903)
lib/segment/tests/integration/payload_index_test.rs (1)
lib/segment/src/types.rs (1)
  • new_unchecked (1760-1765)
lib/segment/src/index/field_index/mmap_point_to_values.rs (1)
lib/segment/src/types.rs (1)
  • new_unchecked (1760-1765)
lib/api/src/grpc/conversions.rs (1)
lib/segment/src/types.rs (16)
  • from (140-142)
  • from (200-202)
  • from (256-261)
  • from (265-270)
  • from (851-853)
  • from (857-859)
  • from (863-865)
  • from (900-909)
  • from (913-916)
  • from (943-952)
  • from (964-967)
  • from (1204-1248)
  • from (1780-1787)
  • from (1791-1793)
  • from (1797-1799)
  • from (1901-1903)
lib/segment/src/index/field_index/geo_hash.rs (1)
lib/segment/src/types.rs (22)
  • new (424-437)
  • new (1755-1758)
  • new (3136-3138)
  • new (3419-3421)
  • new (3432-3434)
  • from (140-142)
  • from (200-202)
  • from (256-261)
  • from (265-270)
  • from (851-853)
  • from (857-859)
  • from (863-865)
  • from (900-909)
  • from (913-916)
  • from (943-952)
  • from (964-967)
  • from (1204-1248)
  • from (1780-1787)
  • from (1791-1793)
  • from (1797-1799)
  • from (1901-1903)
  • new_unchecked (1760-1765)
lib/segment/src/index/field_index/geo_index/mod.rs (1)
lib/segment/src/types.rs (1)
  • new_unchecked (1760-1765)
lib/segment/src/types.rs (3)
lib/segment/src/data_types/vectors.rs (3)
  • new_unchecked (355-367)
  • hash (36-55)
  • hash (329-338)
lib/segment/src/data_types/order_by.rs (3)
  • hash (61-67)
  • hash (142-147)
  • eq (192-199)
lib/sparse/src/common/sparse_vector.rs (1)
  • hash (25-31)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: lint
  • GitHub Check: test-consistency
  • GitHub Check: test-consensus-compose
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: e2e-tests
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: integration-tests-consensus
  • GitHub Check: integration-tests
  • GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
🔇 Additional comments (45)
lib/segment/src/index/field_index/numeric_index/tests.rs (2)

151-165: LGTM! Correct OrderedFloat wrapping for range comparisons.

The transformation from Range<FloatPayloadType> to Range<OrderedFloat<FloatPayloadType>> is implemented correctly. The explicit use of OrderedFloat::from follows the coding guidelines, and the .clone() on line 160 is necessary since ordered_range is consumed by RangeInterface::Float and then reused in FieldCondition::new_range.


595-602: LGTM! Consistent OrderedFloat transformation.

The same transformation pattern is correctly applied in the test helper. This ensures all test cases use OrderedFloat-wrapped ranges consistently with the main implementation.

lib/collection/src/tests/payload.rs (1)

7-7: LGTM! Correct type alignment with OrderedFloat wrapper.

The addition of the OrderedFloat import and its usage to wrap the radius value properly aligns the test with the updated GeoRadius type signature, supporting the PR's goal to enable total ordering for floating-point values.

Also applies to: 64-64

lib/collection/src/operations/universal_query/collection_query.rs (1)

421-433: LGTM! Systematic OrderedFloat constructor change.

The change from OrderedFloat::from(...) to OrderedFloat(...) is correct and part of the PR's systematic refactoring to introduce OrderedFloat wrappers. The tuple struct constructor is idiomatic and functionally equivalent to the From trait for simple wrappers. The lambda calculation logic remains correct: it computes 1.0 - diversity when diversity is provided, otherwise defaults to DEFAULT_MMR_LAMBDA (0.5).

lib/edge/python/Cargo.toml (1)

22-22: LGTM!

The addition of the ordered-float dependency is necessary to support the OrderedFloat-wrapped numeric types introduced throughout the codebase.

lib/segment/tests/integration/gpu_hnsw_test.rs (1)

8-8: LGTM!

The explicit use of OrderedFloat::from() follows the coding guidelines and correctly wraps the numeric bounds for total ordering.

Also applies to: 169-170

lib/edge/python/src/types/filter/geo.rs (1)

45-49: LGTM! OrderedFloat wrapping is correctly applied.

The radius field is now wrapped in OrderedFloat while maintaining the public API (the constructor still accepts f64). This internal change enables the Eq trait for the underlying types without breaking the Python interface.

lib/collection/benches/batch_search_bench.rs (1)

133-138: LGTM! Benchmark updated for OrderedFloat-wrapped Range bounds.

The range boundaries are correctly wrapped in OrderedFloat, consistent with the new internal representation of Range<OrderedFloat<T>>.

lib/segment/src/fixtures/payload_fixtures.rs (1)

218-226: LGTM! Fixture generation updated for OrderedFloat ranges.

The random range conditions now correctly wrap their bounds in OrderedFloat, ensuring consistency with the updated Range<OrderedFloat<T>> type used throughout the codebase.

lib/collection/src/shards/shard_holder/resharding.rs (1)

301-301: LGTM! Correctly using the new public constructor.

The change from direct variant construction to Condition::new_custom() properly uses the new public API for creating custom ID checker conditions, maintaining better encapsulation.

lib/edge/python/src/types/filter/range.rs (1)

33-53: LGTM! OrderedFloat integration is complete and correct.

The internal representation now uses Range<OrderedFloat<FloatPayloadType>> while the public API maintains ergonomic float parameters. All four range bounds (gte, gt, lte, lt) are correctly wrapped in the constructor.

lib/segment/tests/integration/multivector_quantization_test.rs (1)

360-365: LGTM! Test updated for OrderedFloat-wrapped Range.

The range filter construction correctly wraps the boundaries in OrderedFloat, ensuring the test uses the new internal representation.

lib/api/src/grpc/conversions.rs (3)

697-704: LGTM! GeoPoint conversion correctly extracts OrderedFloat values.

The conversion from internal GeoPoint (with OrderedFloat coordinates) to gRPC GeoPoint (with raw f64) correctly extracts the inner values using .0.


1729-1753: LGTM! GeoRadius conversions properly handle OrderedFloat wrapping.

Both directions of the conversion are correct:

  • gRPC → internal: wraps radius in OrderedFloat
  • internal → gRPC: extracts inner value with .0

Note: The TODO comment about lossy f32 conversion on line 1751 is pre-existing behavior, not introduced by this PR.


1792-1841: LGTM! Range and GeoPoint conversions are correctly implemented.

The new bidirectional conversions for Range<OrderedFloat<FloatPayloadType>> properly wrap and unwrap all four range boundaries. The GeoPoint conversion correctly wraps coordinates in OrderedFloat when converting from gRPC to internal representation.

lib/segment/tests/integration/exact_search_test.rs (1)

187-195: LGTM! Test correctly uses OrderedFloat-wrapped Range bounds.

The range filter construction properly wraps gte and lte boundaries in OrderedFloat, maintaining consistency with the updated type signatures throughout the codebase.

lib/segment/src/index/struct_payload_index.rs (1)

567-570: LGTM: correct newtype access.

Using cond.0 matches the CustomIdChecker wrapper. No further changes needed.

lib/segment/src/index/field_index/geo_index/mutable_geo_index.rs (4)

176-183: LGTM: safe error mapping when loading Gridstore.

Switch to inner coords and map_err is correct and robust.


322-325: LGTM: robust RocksDB insert path.

Properly encodes with inner coords and propagates errors.


371-374: LGTM: robust RocksDB delete path.

Error propagation instead of panic is good.


509-512: LGTM: in-memory removal now non-panicking.

Keeps error handling consistent with storage paths.

lib/segment/src/index/query_optimization/rescore_formula/formula_scorer.rs (1)

465-466: GeoPoint constructor switch LGTM.

Using GeoPoint::new_unchecked(lon, lat) aligns with new API and OrderedFloat fields.

Also applies to: 470-471, 533-534

lib/segment/src/payload_storage/query_checker.rs (2)

167-169: Correct tuple access for CustomIdChecker.

Using cond.0.check(point_id) is the right adjustment after wrapping the checker.


501-504: Geo and OrderedFloat test updates LGTM.

new_unchecked for GeoPoint and OrderedFloat bounds reflect the new types correctly.

Also applies to: 509-512, 520-521

lib/segment/tests/integration/payload_index_test.rs (1)

15-16: Test adaptations to OrderedFloat + GeoPoint::new_unchecked look good.

Using OrderedFloat for float ranges and wrapping radius, plus constructor-based GeoPoint creation, matches updated APIs.

Also applies to: 717-725, 1011-1016, 1031-1035, 1054-1060

lib/segment/src/index/field_index/numeric_index/mod.rs (2)

329-333: Consistent mapping from OrderedFloat ranges to index keys.

Using float.0 with T::from_f64 is correct; date-time mapping also consistent.

Also applies to: 926-930, 1218-1224


1001-1019: Construct float Range with OrderedFloat via explicit wrapping.

Creating Range bounds with OrderedFloat(val.to_f64()) is consistent with the new type definitions.

src/actix/api/local_shard_api.rs (1)

301-303: Constructor swap to Condition::new_custom is correct.

Matches the new API and keeps call sites clean.

lib/segment/src/index/field_index/mmap_point_to_values.rs (1)

479-507: GeoPoint test updates LGTM.

Construction via new_unchecked matches storage format and read/write paths.

lib/segment/src/index/query_optimization/condition_converter.rs (2)

345-355: Range → Range<OrderedFloat>: confirm truncation semantics for IntIndex

range.map(|f| f.0 as IntPayloadType) truncates toward zero. If float ranges are used against int fields, confirm this is the intended behavior (same as before) and is documented in API expectations.


357-362: FloatIndex path looks good

Wrapping stored values with OrderedFloat(*value) to check against Range<OrderedFloat<_>> is correct and keeps comparisons consistent.

lib/segment/src/index/field_index/geo_index/mod.rs (1)

223-224: Decode path update LGTM

Using GeoPoint::new_unchecked(lon, lat) aligns with the new GeoPoint representation.

lib/segment/src/payload_storage/condition_checker.rs (2)

224-233: Range ValueChecker matches storage-side changes

Wrapping JSON numbers with OrderedFloat before check_range keeps ordering consistent.

If NaN can appear in payloads, confirm intended behavior with OrderedFloat’s total ordering vs prior f64 partial ordering.


253-254: GeoPoint construction changes LGTM

Switching to GeoPoint::new_unchecked(lon, lat) is consistent with the new API and avoids field shuffling.

Also applies to: 270-271, 287-290

lib/segment/src/types.rs (11)

1693-1711: LGTM: GeoPoint now total-orderable and hashable.

Switch to OrderedFloat<f64> for lon/lat with Eq/Ord/Hash derives is consistent; serde via GeoPointShadow preserves API. Note: internal code should access .0 when numeric ops are needed (see polygon validation fix above).


1714-1717: LGTM: GeoLineString derives Eq/Hash.

Enables use as map/set keys; consistent with GeoPoint changes.


1755-1758: LGTM: validated constructor + const new_unchecked.

new() validates, new_unchecked() is clear and used where inputs are trusted. Const is a nice touch for test/const contexts.

Consider documenting new_unchecked safety expectations in rustdoc.

Also applies to: 1760-1765


1780-1786: LGTM: Conversions updated for OrderedFloat.

Destructuring in From<GeoPoint> for geo::Point and exposing (lon.0, lat.0) in From<GeoPoint> for RawGeoPoint are correct.

Also applies to: 1798-1799


2539-2544: LGTM: Ranges over OrderedFloat and stronger derives.

  • RangeInterface::FloatRange<OrderedFloat<_>> eliminates float-Eq/Hash pitfalls.
  • Added Eq/Ord on Range<T> is appropriate.
  • Hash impl hashes each bound explicitly.

Also applies to: 2547-2565, 2567-2573


2657-2664: LGTM: Geo filter types derive Eq/Hash and use OrderedFloat radius.

Box and polygon become set/map-friendly; GeoRadius’s radius: OrderedFloat<f64> plus .0 in checks is consistent. Fix the hash impl per earlier comment.

Also applies to: 2684-2691, 2703-2706


2721-2723: LGTM: Downstream geo conversions use inner f64.

.0 access in PolygonWrapper::check_point and GeoPolygon::convert keeps geo crate interop correct.

Also applies to: 2773-2776, 2789-2792


2829-2858: LGTM: FieldCondition Eq/Hash and float Range API.

Derives and new_range signature align with the OrderedFloat shift, improving determinism in collections and caching.

Also applies to: 2875-2887


3121-3132: LGTM: Nested/NestedCondition now Hash/Eq.

Enables reliable dedup and memoization of nested filters.


3155-3179: LGTM: Condition Eq/Hash and custom checker constructor.

Condition is now collection-safe; new_custom + serde(skip) keeps custom checkers internal to server APIs. Make sure OpenAPI schema excludes/ignores this variant.

Also applies to: 3182-3185, 3187-3188


3502-3510: LGTM: MinShould/Filter derive Hash/Eq.

Consistent with enclosing Condition; helps with filter caching and deduping. Behavior remains order-sensitive (by design).

Also applies to: 3513-3548

Comment on lines 223 to 226
let removed_geo_hashes: Vec<_> = removed_geo_points
.into_iter()
.map(|geo_point| encode_max_precision(geo_point.lon, geo_point.lat).unwrap())
.map(|geo_point| encode_max_precision(geo_point.lon.0, geo_point.lat.0).unwrap())
.collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid unwrap() on geohash encode; handle malformed points gracefully.

A bad/corrupted geohash input will panic during mmap load. Prefer skipping with a warning to keep the index load resilient.

-            let removed_geo_hashes: Vec<_> = removed_geo_points
-                .into_iter()
-                .map(|geo_point| encode_max_precision(geo_point.lon.0, geo_point.lat.0).unwrap())
-                .collect();
+            let removed_geo_hashes: Vec<_> = removed_geo_points
+                .into_iter()
+                .filter_map(|geo_point| {
+                    match encode_max_precision(geo_point.lon.0, geo_point.lat.0) {
+                        Ok(hash) => Some(hash),
+                        Err(e) => {
+                            log::warn!("Skipping malformed GeoPoint during mmap load: {e}");
+                            None
+                        }
+                    }
+                })
+                .collect();
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let removed_geo_hashes: Vec<_> = removed_geo_points
.into_iter()
.map(|geo_point| encode_max_precision(geo_point.lon, geo_point.lat).unwrap())
.map(|geo_point| encode_max_precision(geo_point.lon.0, geo_point.lat.0).unwrap())
.collect();
let removed_geo_hashes: Vec<_> = removed_geo_points
.into_iter()
.filter_map(|geo_point| {
match encode_max_precision(geo_point.lon.0, geo_point.lat.0) {
Ok(hash) => Some(hash),
Err(e) => {
log::warn!("Skipping malformed GeoPoint during mmap load: {e}");
None
}
}
})
.collect();
🤖 Prompt for AI Agents
In lib/segment/src/index/field_index/geo_index/immutable_geo_index.rs around
lines 223-226, avoid unwrapping the result of encode_max_precision which can
panic on malformed/corrupted geo points; instead filter and collect only
successful encodings and handle errors by logging a warning with identifying
context (e.g., the offending lon/lat and index or key) so the load continues
resiliently. Replace the map(...unwrap()) with an iterator that matches on the
Result, pushes Ok values into the collected Vec, and for Err emits a warning (do
not abort), ensuring the rest of removed_geo_hashes are processed normally.

Comment on lines 360 to 363
for removed_geo_point in removed_geo_points {
let removed_geo_hash: GeoHash =
encode_max_precision(removed_geo_point.lon, removed_geo_point.lat).unwrap();
encode_max_precision(removed_geo_point.lon.0, removed_geo_point.lat.0).unwrap();
removed_geo_hashes.push(removed_geo_hash);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Propagate encode errors instead of unwrap() in remove_point.

This path can also panic on malformed data. Return an OperationError instead.

-            let removed_geo_hash: GeoHash =
-                encode_max_precision(removed_geo_point.lon.0, removed_geo_point.lat.0).unwrap();
+            let removed_geo_hash: GeoHash = encode_max_precision(
+                removed_geo_point.lon.0,
+                removed_geo_point.lat.0,
+            )
+            .map_err(|e| {
+                OperationError::service_error(format!("Malformed geo point in remove_point: {e}"))
+            })?;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for removed_geo_point in removed_geo_points {
let removed_geo_hash: GeoHash =
encode_max_precision(removed_geo_point.lon, removed_geo_point.lat).unwrap();
encode_max_precision(removed_geo_point.lon.0, removed_geo_point.lat.0).unwrap();
removed_geo_hashes.push(removed_geo_hash);
for removed_geo_point in removed_geo_points {
let removed_geo_hash: GeoHash = encode_max_precision(
removed_geo_point.lon.0,
removed_geo_point.lat.0,
)
.map_err(|e| {
OperationError::service_error(format!("Malformed geo point in remove_point: {e}"))
})?;
removed_geo_hashes.push(removed_geo_hash);
🤖 Prompt for AI Agents
In lib/segment/src/index/field_index/geo_index/immutable_geo_index.rs around
lines 360 to 363, replace the unwrap() on encode_max_precision so encoding
failures are propagated as an OperationError instead of panicking: call
encode_max_precision(...)? (or encode_max_precision(...).map_err(|e|
OperationError::...)? to convert the encode error into the crate's
OperationError variant) and return the error from remove_point; ensure the
function's return type permits returning an OperationError and update any
map_err conversion to produce an appropriate OperationError message indicating
geo encoding failed.

Comment on lines +118 to 122
Some(Self {
lon: OrderedFloat(lon),
lat: OrderedFloat(lat),
})
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Write f64s from OrderedFloat inner field; current code won’t compile.

OrderedFloat<f64> doesn’t implement zerocopy write; write the inner f64.

Apply:

     fn read_from_mmap(bytes: &[u8]) -> Option<Self> {
         let (lon, bytes) = f64::read_from_prefix(bytes).ok()?;
         let (lat, _) = f64::read_from_prefix(bytes).ok()?;
-
-        Some(Self {
-            lon: OrderedFloat(lon),
-            lat: OrderedFloat(lat),
-        })
+        Some(Self { lon: OrderedFloat(lon), lat: OrderedFloat(lat) })
     }
 
     fn write_to_mmap(value: Self, bytes: &mut [u8]) -> Option<()> {
-        value.lon.write_to_prefix(bytes).ok()?;
+        value.lon.0.write_to_prefix(bytes).ok()?;
         bytes
             .get_mut(std::mem::size_of::<f64>()..)
-            .and_then(|bytes| value.lat.write_to_prefix(bytes).ok())
+            .and_then(|bytes| value.lat.0.write_to_prefix(bytes).ok())
     }

Also applies to: 124-129

🤖 Prompt for AI Agents
In lib/segment/src/index/field_index/mmap_point_to_values.rs around lines
118-122 and similarly 124-129, the code attempts to write OrderedFloat<f64>
values via zerocopy but OrderedFloat does not implement the required trait;
change the writes/read-writes to use the inner f64 values instead (access the .0
field of OrderedFloat when writing to or reading from the mmap/zerocopy buffer),
ensuring all read/write calls use plain f64 and preserve wrapping/unwrapping to
OrderedFloat only at the API boundaries.

@generall generall merged commit 29a1197 into dev Oct 20, 2025
15 checks passed
@generall generall deleted the make-filter-eq branch October 20, 2025 15:27
timvisee pushed a commit that referenced this pull request Nov 14, 2025
* derive Eq for Filter (and underlying types)

* fix tests

* use ptr to compare and hash CustomIdChecker

* fix gpu test

* fmt

* use OrderedFloat directly

* post-rebase fixes

* fmt

---------

Co-authored-by: Luis Cossío <luis.cossio@outlook.com>
@coderabbitai coderabbitai bot mentioned this pull request Dec 7, 2025
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants