Skip to content

Remove is_stopped from RawScorer #6305

Merged
generall merged 2 commits intodevfrom
RawScorer_stopped
Apr 9, 2025
Merged

Remove is_stopped from RawScorer #6305
generall merged 2 commits intodevfrom
RawScorer_stopped

Conversation

@xzfc
Copy link
Member

@xzfc xzfc commented Apr 2, 2025

Depends on #6299 as it touches the same benchmark. Ignore first commit during the review.

This PR removes is_stopped flag from RawScorer as suggested in #6245 (comment). Instead, it is passed to search functions explicitly.

@xzfc xzfc requested review from generall and timvisee April 2, 2025 13:59
@xzfc xzfc changed the title Update hnsw_build_asymptotic benchmark Remove is_stopped from RawScorer Apr 2, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Apr 2, 2025

📝 Walkthrough

Walkthrough

The changes primarily update the error handling and cancellation mechanics across various modules. A new conversion implementation for transforming a CancelledError into a CollectionError has been introduced. Benchmark, test, and production routines have been modified to incorporate a DEFAULT_STOPPED constant into method signatures, ensuring a consistent parameter for cancellation checks. Methods in the HNSW index and related graph layers now include an additional is_stopped parameter and return a CancellableResult, replacing previous implementations that handled cancellation flags differently. Several functions have been updated to remove the is_stopped parameter from constructors and initializations, shifting the cancellation check to be performed during method calls. In the test suites and GPU-related files, calls to search and scorer functions have been adjusted to either add or remove the DEFAULT_STOPPED argument accordingly, ensuring unified error propagation and stoppage behavior throughout the codebase.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b23bc08 and 6428f15.

📒 Files selected for processing (1)
  • lib/segment/src/vector_storage/raw_scorer.rs (10 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • lib/segment/src/vector_storage/raw_scorer.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-consensus
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: test (macos-latest)
  • GitHub Check: test
  • GitHub Check: test (windows-latest)
  • GitHub Check: test-consensus-compose
  • GitHub Check: test
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: lint

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (7)
lib/segment/benches/fixture.rs (3)

38-39: Consider handling potential I/O or parsing errors more gracefully.
Calls to unwrap() on lines 39, 43 can cause runtime panics when loading or constructing the graph. Even in benchmarks, it might be helpful to handle these errors gracefully and log them for easier troubleshooting.

A possible approach:

- let raw_scorer = vector_holder.get_raw_scorer(added_vector).unwrap();
+ let raw_scorer = match vector_holder.get_raw_scorer(added_vector) {
+     Ok(s) => s,
+     Err(err) => {
+         eprintln!("Failed to create raw scorer: {}", err);
+         return; // or handle differently
+     }
+ };

Also applies to: 41-44


65-65: Typo in progress bar message.
"Buildng" should be spelled "Building" for clarity and consistency.

- ProgressStyle::with_template("{percent:>3}% Buildng HNSW {wide_bar}")
+ ProgressStyle::with_template("{percent:>3}% Building HNSW {wide_bar}")

54-73: Confirm parallel build logic.
The logic that builds the initial points on a single thread and the remaining points in parallel is generally fine. However, ensure that the concurrency is beneficial for typical dataset sizes (and doesn’t degrade performance for smaller sets). Also, confirm that concurrency is safe if any environment calls these benchmarks simultaneously.

If you want, I can prepare a script or new test to measure performance differences and concurrency safety under multiple runs.

lib/segment/src/common/operation_error.rs (1)

215-217: Naming convention for cancellation error.
The struct CancelledError is concise and fine. However, consider whether the American-style “CanceledError” (one L) is more consistent with the rest of the codebase or commonly used standards. This is just a style preference; no functional changes needed.

lib/segment/benches/hnsw_build_asymptotic.rs (2)

37-45: Handle .unwrap() in benchmarks.
The .unwrap() calls in the benchmark loops reflect an assumption that index building and searching will always succeed. Consider converting them into graceful error logs or panics with messages to more easily diagnose potential issues during automated bench runs.

Also applies to: 55-66


150-155: Ensure memory constraints in large-scale benchmarks.
Allocating up to 10M or 20M vectors in memory is significant. Confirm your environment has enough resources for these benchmarks. Otherwise, out-of-memory conditions may skew benchmarking results or crash the process.

I can provide a resource usage script to monitor memory consumption across these large-scale benchmarks if needed.

Also applies to: 172-176

lib/segment/src/index/hnsw_index/graph_layers.rs (1)

67-68: Efficient cancellation checks in _search_on_level.
The introduction of the is_stopped parameter and the inline check_process_stopped(is_stopped)?; within the loop ensures timely cancellation. However, running this check in tight loops could potentially impact performance for large data sets. Consider adjusting the frequency of checks if performance benchmarks show overhead.

Also applies to: 73-74, 93-93

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dce4a94 and 6646045.

📒 Files selected for processing (24)
  • lib/collection/src/operations/types.rs (2 hunks)
  • lib/segment/benches/fixture.rs (1 hunks)
  • lib/segment/benches/hnsw_build_asymptotic.rs (4 hunks)
  • lib/segment/benches/hnsw_search_graph.rs (4 hunks)
  • lib/segment/benches/scorer_mmap.rs (2 hunks)
  • lib/segment/benches/vector_search.rs (2 hunks)
  • lib/segment/src/common/operation_error.rs (1 hunks)
  • lib/segment/src/fixtures/index_fixtures.rs (1 hunks)
  • lib/segment/src/index/hnsw_index/graph_layers.rs (11 hunks)
  • lib/segment/src/index/hnsw_index/graph_layers_builder.rs (6 hunks)
  • lib/segment/src/index/hnsw_index/hnsw.rs (5 hunks)
  • lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs (3 hunks)
  • lib/segment/src/index/plain_vector_index.rs (3 hunks)
  • lib/segment/src/index/sparse_index/sparse_vector_index.rs (3 hunks)
  • lib/segment/src/vector_storage/async_raw_scorer.rs (6 hunks)
  • lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (6 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs (8 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs (0 hunks)
  • lib/segment/src/vector_storage/raw_scorer.rs (10 hunks)
  • lib/segment/src/vector_storage/tests/async_raw_scorer.rs (1 hunks)
  • lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs (0 hunks)
  • lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (7 hunks)
  • lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (5 hunks)
  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (3 hunks)
💤 Files with no reviewable changes (2)
  • lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs
🧰 Additional context used
🧬 Code Definitions (15)
lib/segment/benches/vector_search.rs (2)
lib/segment/src/index/sparse_index/sparse_vector_index.rs (1)
  • vector_storage (69-71)
lib/segment/src/vector_storage/raw_scorer.rs (1)
  • new_raw_scorer_for_test (258-269)
lib/segment/src/index/sparse_index/sparse_vector_index.rs (2)
lib/segment/src/vector_storage/raw_scorer.rs (1)
  • new_raw_scorer (117-207)
lib/segment/src/data_types/query_context.rs (2)
  • is_stopped (52-54)
  • is_stopped (203-207)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (2)
lib/segment/src/vector_storage/raw_scorer.rs (1)
  • new_raw_scorer_for_test (258-269)
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)
  • raw_scorer (147-165)
lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs (1)
lib/segment/src/vector_storage/raw_scorer.rs (1)
  • raw_scorer_from_query_scorer (553-568)
lib/segment/src/fixtures/index_fixtures.rs (2)
lib/segment/src/index/sparse_index/sparse_vector_index.rs (1)
  • vector_storage (69-71)
lib/segment/src/vector_storage/raw_scorer.rs (1)
  • raw_scorer_impl (271-303)
lib/segment/benches/scorer_mmap.rs (2)
lib/segment/src/index/sparse_index/sparse_vector_index.rs (1)
  • vector_storage (69-71)
lib/segment/src/vector_storage/raw_scorer.rs (1)
  • new_raw_scorer_for_test (258-269)
lib/segment/src/index/plain_vector_index.rs (1)
lib/segment/src/vector_storage/raw_scorer.rs (1)
  • new_raw_scorer (117-207)
lib/segment/src/index/hnsw_index/hnsw.rs (4)
lib/segment/src/index/sparse_index/sparse_vector_index.rs (1)
  • vector_storage (69-71)
lib/segment/src/vector_storage/raw_scorer.rs (1)
  • new_raw_scorer (117-207)
lib/segment/src/index/hnsw_index/gpu/batched_points.rs (1)
  • points (92-94)
lib/segment/src/index/field_index/map_index/immutable_map_index.rs (1)
  • points (216-216)
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (2)
lib/segment/src/index/sparse_index/sparse_vector_index.rs (1)
  • vector_storage (69-71)
lib/segment/src/vector_storage/raw_scorer.rs (1)
  • new_raw_scorer_for_test (258-269)
lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (1)
lib/segment/src/vector_storage/raw_scorer.rs (1)
  • new_raw_scorer_for_test (258-269)
lib/segment/benches/hnsw_search_graph.rs (1)
lib/segment/benches/fixture.rs (1)
  • make_cached_graph (15-76)
lib/segment/src/index/hnsw_index/graph_layers_builder.rs (4)
lib/segment/src/index/hnsw_index/hnsw.rs (2)
  • new (96-106)
  • graph (173-175)
lib/segment/src/vector_storage/async_raw_scorer.rs (3)
  • new (22-29)
  • new (43-57)
  • new (203-223)
lib/segment/src/index/hnsw_index/entry_points.rs (1)
  • new (34-39)
lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs (1)
  • new (33-53)
lib/segment/benches/hnsw_build_asymptotic.rs (1)
lib/segment/benches/fixture.rs (1)
  • make_cached_graph (15-76)
lib/segment/src/index/hnsw_index/graph_layers.rs (1)
lib/segment/src/data_types/query_context.rs (2)
  • is_stopped (52-54)
  • is_stopped (203-207)
lib/segment/src/vector_storage/async_raw_scorer.rs (2)
lib/segment/src/data_types/query_context.rs (2)
  • is_stopped (52-54)
  • is_stopped (203-207)
lib/segment/src/vector_storage/raw_scorer.rs (2)
  • peek_top_all (94-98)
  • peek_top_all (955-962)
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-consensus-compose
  • GitHub Check: test-consensus
  • GitHub Check: test (macos-latest)
  • GitHub Check: test (windows-latest)
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: test
🔇 Additional comments (79)
lib/segment/src/fixtures/index_fixtures.rs (2)

19-19: Import statement updated to reflect refactored design pattern

The removal of DEFAULT_STOPPED from the import statement aligns with the PR objective of removing the is_stopped flag from the RawScorer component. This import change is consistent with the overall design shift toward passing the stop flag explicitly to search functions rather than storing it within the RawScorer.


142-150: Implementation correctly follows the new design pattern

The get_raw_scorer method has been properly updated to no longer pass the DEFAULT_STOPPED parameter, which is consistent with the removal of the is_stopped flag from the RawScorer component. This change ensures the code follows the new pattern of passing stop flags explicitly to search functions instead.

#!/bin/bash
# Verify that the raw_scorer_impl function signature has been updated
rg -A 2 "fn raw_scorer_impl" lib/segment/src/vector_storage/
lib/segment/benches/vector_search.rs (2)

17-19: Import correctly includes DEFAULT_STOPPED constant

The import statement has been updated to include DEFAULT_STOPPED, which is necessary for the explicit passing of the stop flag to search functions. This change aligns with the PR's objective of removing the is_stopped flag from the RawScorer component.


76-77: Search function signature updated to include stop flag parameter

The peek_top_all method call has been properly updated to include the &DEFAULT_STOPPED parameter, implementing the new pattern of explicitly passing stop flags to search functions rather than storing them within the RawScorer. This change ensures consistency with the overall codebase refactoring.

lib/segment/benches/scorer_mmap.rs (2)

15-17: Import statement properly includes DEFAULT_STOPPED constant

The import statement has been updated to include DEFAULT_STOPPED, which is necessary for the new design pattern of explicitly passing stop flags to search functions.


73-74: Search function call correctly includes stop flag parameter

The peek_top_all method call has been properly updated to include the &DEFAULT_STOPPED parameter, which is consistent with the PR's objective of explicitly passing stop flags to search functions rather than storing them within the RawScorer.

lib/segment/src/vector_storage/tests/async_raw_scorer.rs (1)

117-117: Function call updated to match new signature without is_stopped parameter

The async_raw_scorer::new function call has been updated to remove the is_stopped parameter, aligning with the PR's objective of removing the is_stopped flag from the RawScorer component. This change is consistent with the pattern now being implemented throughout the codebase.

#!/bin/bash
# Verify that the async_raw_scorer::new function signature has been updated
rg -A 5 "async_raw_scorer::new" lib/segment/src/vector_storage/
lib/collection/src/operations/types.rs (1)

1251-1255: LGTM: Implementation for From<CancelledError> for CollectionError added.

This addition allows direct conversion from CancelledError to CollectionError, which is consistent with the PR objective of streamlining error propagation for cancellation.

lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs (4)

16-16: Added required import for DEFAULT_STOPPED.

This import aligns with the PR objective to use the explicit stopped flag parameter.


32-40: Updated search_entry to include explicit stopped flag parameter.

The method now takes &DEFAULT_STOPPED as an additional parameter, consistent with the refactoring to remove internal handling of the stop condition from RawScorer.


42-50: Updated search_on_level to include explicit stopped flag parameter.

Similar to the above change, this method now takes &DEFAULT_STOPPED as an additional parameter.


91-93: Updated graph_layers.search to include explicit stopped flag parameter.

The search method now takes &DEFAULT_STOPPED as its last parameter, consistent with the overall pattern of making cancellation checks explicit.

lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (4)

23-23: Added DEFAULT_STOPPED to imports.

This import is necessary for the updated method calls throughout the file.


130-132: Updated peek_top_iter to include explicit stopped flag parameter.

The method now takes &DEFAULT_STOPPED as an additional parameter, consistent with the refactoring approach.


175-175: Updated peek_top_all to include explicit stopped flag parameter.

The implementation now passes &DEFAULT_STOPPED to the method call, matching the updated method signature.


238-240: Updated another peek_top_iter call to include explicit stopped flag parameter.

This change is consistent with the previous updates to method calls.

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (7)

296-296: Updated import to include DEFAULT_STOPPED constant

The import statement now includes DEFAULT_STOPPED from the vector_storage module, which will be used in test functions for operation cancellation.


404-405: Added DEFAULT_STOPPED parameter to peek_top_all method call

The peek_top_all method now requires an additional parameter to handle operation cancellation. This replaces the previous approach where the RawScorer had an internal stopped flag.


410-413: Updated peek_top_iter method call to include DEFAULT_STOPPED parameter

The call to peek_top_iter has been updated to include the &DEFAULT_STOPPED parameter, consistent with the changes in signature of the scorer's methods. The code is also reformatted with line breaks for better readability.


489-491: Added DEFAULT_STOPPED parameter to peek_top_iter in test_delete_points function

Similar to the previous changes, the peek_top_iter method call now includes the &DEFAULT_STOPPED parameter to support the explicit cancellation mechanism.


516-518: Updated peek_top_iter in another test function

The call to peek_top_iter has been updated consistently with the other changes to include the &DEFAULT_STOPPED parameter.


541-541: Added DEFAULT_STOPPED parameter to peek_top_all in test_delete_points

Consistent with the other changes, this peek_top_all method call now includes the &DEFAULT_STOPPED parameter.


608-610: Added DEFAULT_STOPPED parameter to peek_top_iter in test_update_from_delete_points

The call to peek_top_iter has been updated to include the &DEFAULT_STOPPED parameter to support explicit cancellation.

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (3)

18-20: Restructured imports and added DEFAULT_STOPPED

The imports have been reorganized to group related items together and include the DEFAULT_STOPPED constant, which is now used in the test functions.


88-90: Updated peek_top_iter to include the DEFAULT_STOPPED parameter

The peek_top_iter method call now includes the &DEFAULT_STOPPED parameter, matching the updated method signature that requires a reference to a stopping condition. This improves code consistency across the codebase.


184-186: Updated peek_top_iter in do_test_update_from_delete_points function

Added the &DEFAULT_STOPPED parameter to the peek_top_iter method call, consistent with the signature change to explicitly pass the stopping condition.

lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (7)

17-19: Restructured imports and added DEFAULT_STOPPED

The imports have been reorganized into a block-style format and now include the DEFAULT_STOPPED constant from the vector_storage module.


62-64: Updated peek_top_iter call in do_test_delete_points

The peek_top_iter method call now includes the &DEFAULT_STOPPED parameter and has been reformatted for better readability. This change is part of the broader refactoring to make cancellation explicit rather than internal.


85-87: Added DEFAULT_STOPPED parameter to another peek_top_iter call

Consistent with other changes, this peek_top_iter call has been updated to include the &DEFAULT_STOPPED parameter.


107-107: Updated peek_top_all call with DEFAULT_STOPPED parameter

The peek_top_all method now requires the &DEFAULT_STOPPED parameter, which has been added to this call.


168-170: Updated peek_top_iter in do_test_update_from_delete_points

The call to peek_top_iter has been updated to include the &DEFAULT_STOPPED parameter and reformatted for better readability.


216-218: Updated peek_top_iter in do_test_score_points

The call to peek_top_iter in the do_test_score_points function has been updated to include the &DEFAULT_STOPPED parameter and reformatted.


236-238: Updated another peek_top_iter call in do_test_score_points

Another call to peek_top_iter has been consistently updated to include the &DEFAULT_STOPPED parameter.

lib/segment/src/index/plain_vector_index.rs (3)

21-21: Updated import to use new_raw_scorer instead of new_stoppable_raw_scorer

The import statement now includes new_raw_scorer instead of new_stoppable_raw_scorer, reflecting the refactoring of how cancellation is handled.


112-124: Refactored to use new_raw_scorer and explicitly pass is_stopped parameter

The code now uses new_raw_scorer instead of new_stoppable_raw_scorer and has been updated to explicitly pass the &is_stopped parameter to the peek_top_iter method. Additionally, the error handling has been improved by using and_then instead of map, which provides better control flow for error propagation.


138-145: Updated unfiltered search to use new_raw_scorer and explicit is_stopped parameter

Similar to the filtered search case, the unfiltered search now uses new_raw_scorer instead of new_stoppable_raw_scorer and passes the &is_stopped parameter explicitly to the peek_top_all method. The error handling has also been improved with and_then.

lib/segment/src/index/sparse_index/sparse_vector_index.rs (4)

39-39: Correctly updated import for the new function name.

The import has been correctly updated to use new_raw_scorer instead of the previous new_stoppable_raw_scorer function, consistent with the PR objective of removing the is_stopped flag from the RawScorer.


300-305: Properly updated to use new raw scorer function.

The code now uses new_raw_scorer function, correctly removing the is_stopped parameter that was previously passed to new_stoppable_raw_scorer. This change aligns with the PR objective to simplify the raw scorer implementation.


318-318: Correctly added explicit stop condition to the search function.

The peek_top_iter function call now correctly receives the is_stopped parameter explicitly, following the new pattern of passing the stop condition directly to search functions rather than storing it in the RawScorer.


322-322: Correctly added explicit stop condition to the search function.

The peek_top_all function call now correctly receives the is_stopped parameter explicitly, following the same pattern as with the peek_top_iter function. This ensures consistent handling of stop conditions throughout the codebase.

lib/segment/benches/hnsw_search_graph.rs (5)

10-13: Properly updated imports for the new pattern.

The imports have been updated to:

  1. Use the new fixture module (line 10)
  2. Add the DEFAULT_STOPPED constant (line 13)

These changes support the PR's objective of explicitly passing stop conditions to search functions instead of embedding them in the scorer.


23-23: Introduced a new fixture module for better code organization.

Adding a dedicated fixture module improves code organization by extracting the graph generation logic into a reusable component, which aligns with good software engineering practices.


32-33: Simplified graph creation using the new fixture module.

The benchmark now uses fixture::make_cached_graph to create the test vectors and graph layers, which significantly simplifies the benchmark setup. Based on the provided context, this function encapsulates the vector generation and graph building logic that was previously inline.


43-47: Correctly updated search call with explicit stop condition.

The search function call now includes the &DEFAULT_STOPPED parameter, aligning with the PR's objective of passing stop conditions explicitly to search functions instead of storing them in the scorer.


60-64: Correctly updated second search call with explicit stop condition.

The second search call in the compressed benchmark also correctly includes the &DEFAULT_STOPPED parameter, ensuring consistent behavior across different benchmark tests.

lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs (2)

161-161: Successfully removed the is_stopped parameter from raw scorer creation.

The call to raw_scorer_from_query_scorer has been updated to no longer pass the is_stopped parameter, which aligns with the PR objective to simplify the raw scorer implementation by removing the stop condition from it.


171-171: Consistently removed is_stopped from all raw scorer creation calls.

All instances of raw_scorer_from_query_scorer calls have been systematically updated to remove the is_stopped parameter. This consistency ensures that the refactoring is complete and no legacy usage patterns remain in the codebase.

Also applies to: 182-182, 192-192, 225-225, 237-237, 249-249, 261-261

lib/segment/src/index/hnsw_index/hnsw.rs (5)

57-57: Updated import to use the new raw scorer function.

The import has been correctly modified to use new_raw_scorer instead of the previous new_stoppable_raw_scorer, consistent with the PR objective.


65-65: Changed SINGLE_THREADED_HNSW_BUILD_THRESHOLD from private to public.

Making the SINGLE_THREADED_HNSW_BUILD_THRESHOLD constant public allows external components (like the new fixture module) to access this value, which is necessary for consistent graph building behavior across the codebase.

Also applies to: 67-67


365-378: Successfully updated all raw scorer creation calls.

All instances of raw scorer creation have been systematically updated to use new_raw_scorer instead of new_stoppable_raw_scorer, ensuring consistent implementation of the new pattern across the codebase.

Also applies to: 582-593, 691-702, 1037-1043, 1091-1096


840-846: Added explicit stop condition to graph search function.

The graph.search call now includes the explicit &is_stopped parameter, properly implementing the new pattern of passing stop conditions directly to search functions rather than embedding them in scorers.


912-912: Added explicit stop condition to peek_top_iter function.

The peek_top_iter call now correctly includes the &is_stopped parameter, consistent with the updated function signature and the PR's objective.

lib/segment/benches/fixture.rs (1)

27-33:

Details

❓ Verification inconclusive

Validate existence of Cargo environment variables.
The code constructs a cache path using Cargo environment variables (CARGO_TARGET_TMPDIR, CARGO_PKG_NAME, CARGO_CRATE_NAME). Please ensure these are always set in every environment where this benchmark is expected to run. Otherwise, env!() will panic if any are undefined.

Would you like to verify usage of these environment variables across the build and CI pipelines?


🏁 Script executed:

#!/bin/bash
# Verify usage of required environment variables in repository build scripts.
rg -A 5 'env!("CARGO_TARGET_TMPDIR"|CARGO_PKG_NAME|CARGO_CRATE_NAME)'

Length of output: 71


Attention: Confirm that Cargo environment variables are consistently available for the benchmark.
The cache path is built using env!("CARGO_TARGET_TMPDIR"), env!("CARGO_PKG_NAME"), and env!("CARGO_CRATE_NAME"). Our initial search did not yield evidence of explicit provisioning of these variables in build scripts or CI configurations. Please manually verify that your build/CI pipelines provide these environment variables (or that Cargo sets them automatically) to avoid any runtime panics when they’re missing. Specifically, review the configurations for environments where these benchmarks are run.

  • File: lib/segment/benches/fixture.rs (Lines 27-33)
  • Snippet:
        let path = Path::new(env!("CARGO_TARGET_TMPDIR"))
            .join(env!("CARGO_PKG_NAME"))
            .join(env!("CARGO_CRATE_NAME"))
            .join(format!(
                "{num_vectors}-{dim}-{m}-{ef_construct}-{use_heuristic}-{:?}",
                METRIC::distance(),
            ));
lib/segment/src/common/operation_error.rs (1)

229-234:

Details

✅ Verification successful

Assess atomic ordering.
Using Relaxed ordering in check_process_stopped might suffice if a best-effort check is enough. If there's a need for stronger memory ordering to guarantee visibility across threads, consider using Acquire or stronger.

Could you verify if there are other places in the code requiring stronger guarantees for this cancellation check?


🏁 Script executed:

#!/bin/bash
# Search for all references of `check_process_stopped` to verify usage context.
rg -A 5 'check_process_stopped'

Length of output: 33769


Atomic ordering usage confirmed: no changes required.
Our investigation shows that the check_process_stopped function is consistently invoked across various modules with Ordering::Relaxed, which aligns with its design as a best-effort cancellation check. There is no evidence from the usage contexts that a stronger memory ordering (like Acquire) is necessary at this time. If future concurrency scenarios reveal stricter consistency requirements, this decision can be revisited.

lib/segment/benches/hnsw_build_asymptotic.rs (3)

13-13: Verify default stopped usage.
DEFAULT_STOPPED is passed to each benchmark. This global atomic might remain false most of the time, so it won’t truly simulate an early stop condition in typical usage. If real cancellation scenarios need measuring, consider customizing or toggling this value mid-benchmark.

Do you want a separate benchmark that flips the atomic to test partial builds or partial scoring?


29-32: Lazy initialization improves memory usage.
Using LazyCell to delay constructing large data sets (5k or 1M vectors) is an effective approach to prevent incurring memory costs until the benchmark runs. This is especially helpful when multiple benchmarks exist, but only a few get selected. Good job!

Also applies to: 49-53


77-77: Consider overhead of freeing large sets.
Dropping setup_5k and setup_1m promptly is a great idea to avoid inflating memory usage across subsequent benchmarks. Since this is an explicit drop, ensure benchmarks are isolated so that performance results from the next test aren’t influenced by ongoing memory cleanup.

Would you like a follow-up script to check if large memory allocations persist due to lingering references?

Also applies to: 83-83

lib/segment/src/vector_storage/raw_scorer.rs (7)

15-17: Updated imports for improved error handling

The imports now include CancellableResult and check_process_stopped to support the new cancellation mechanism.


87-92: Function signature updated to support explicit cancellation

The peek_top_iter method now accepts an is_stopped parameter and returns a CancellableResult, which aligns with the PR objective to remove the is_stopped flag from RawScorer and pass it explicitly.


94-98: Consistently updated peek_top_all signature for cancellation

Similar to peek_top_iter, this method has been updated to accept an explicit is_stopped parameter and return a CancellableResult.


117-117: Function renamed for clarity

The function name has been simplified from new_stoppable_raw_scorer to new_raw_scorer, reflecting that the stopping functionality is now passed explicitly rather than being a special variant.


209-209: Added DEFAULT_STOPPED constant

A new static DEFAULT_STOPPED variable provides a default non-stopping state that can be used when cancellation isn't needed.


908-953: Updated implementation to use explicit cancellation check

The peek_top_iter implementation now:

  1. Uses check_process_stopped to verify if processing should be halted
  2. Returns a proper CancellableResult wrapped with Ok

This implementation properly handles the cancellation check in a centralized way.


955-962: Updated peek_top_all to forward cancellation flag

The implementation now correctly forwards the is_stopped parameter to peek_top_iter, maintaining the cancellation capability throughout the call chain.

lib/segment/src/vector_storage/async_raw_scorer.rs (4)

11-11: Updated import for cancellation support

Added CancellableResult to support the new cancellation mechanism in async operations.


128-141: Updated peek_top_iter signature and implementation for explicit cancellation

The method has been updated to:

  1. Accept an is_stopped parameter instead of using an internal flag
  2. Return a CancellableResult type
  3. Use the parameter in the take_while iterator adaptor to control iteration

This change aligns with the PR objective to make cancellation explicit.


160-171: Updated peek_top_all consistently with cancellation pattern

Similar to peek_top_iter, this method has been updated to accept an explicit is_stopped parameter, return a CancellableResult, and use the parameter for cancellation checking.


217-217: Removed is_stopped from builder

The storage field is now directly assigned without the intermediate is_stopped parameter, consistent with the pattern of not storing the cancellation state in the object.

lib/segment/src/index/hnsw_index/graph_layers_builder.rs (6)

5-5: Added AtomicBool import

Added the necessary import for AtomicBool to support the explicit cancellation pattern.


384-387: Added explicit cancellation parameter to search calls

Now passing a newly created AtomicBool set to false to the search_entry method and unwrapping the result. This ensures compatibility with the updated method signature that supports cancellation.


436-438: Added explicit cancellation parameter to search level calls

Similar to the previous change, now passing a cancellation flag to _search_on_level and unwrapping the result, maintaining consistency with the cancellation pattern throughout the codebase.


583-583: Imported DEFAULT_STOPPED constant

Using the shared DEFAULT_STOPPED constant for test cases, ensuring consistent behavior when cancellation isn't needed.


743-745: Use DEFAULT_STOPPED in search calls in tests

Test cases now properly pass the DEFAULT_STOPPED constant to the search method and unwrap the result, adapting to the new cancellation pattern.


838-840: Consistently using DEFAULT_STOPPED in all test cases

Similar to the previous test case, this test has been updated to pass the cancellation flag and handle the result type, ensuring consistent testing of the new cancellation mechanism.

lib/segment/src/index/hnsw_index/graph_layers.rs (5)

4-4: Integrating cancellation mechanism imports looks consistent.
The newly imported AtomicBool, CancellableResult, OperationError, OperationResult, and check_process_stopped are well-aligned with the updated approach for cancellation support, ensuring that these methods can reliably handle operation termination.

Also applies to: 15-17


102-103: search_on_level signature update is coherent.
Passing is_stopped and returning a CancellableResult extends consistent cancellation handling to higher-level searches. This approach is well-structured for gracefully halting operations.

Also applies to: 108-115


126-127: search_entry gracefully handles cancellation.
Adding is_stopped with check_process_stopped inside loops provides a clean mechanism for early termination, improving user control over extended searches.

Also applies to: 135-136, 157-157


252-253: Refined search function with cancellation support.
Including an AtomicBool parameter and returning early if no entry point is found helps unify error handling. Subsequent calls to search_entry and search_on_level properly propagate cancellation, clarifying the request flow for callers.

Also applies to: 255-255, 263-264, 265-272


394-394: Test coverage of DEFAULT_STOPPED usage.
The updated test code demonstrates correct usage of the DEFAULT_STOPPED constant in functions like search and search_on_level. This confirms that the cancellation pathway is successfully integrated into testing, ensuring robust coverage.

Also applies to: 408-410, 451-462

Copy link
Contributor

@coszio coszio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does make more sense to me 🧠

Comment on lines 925 to 1017
for point_id in &mut *points {
if self.is_stopped.load(Ordering::Relaxed) {
break;
}
check_process_stopped(is_stopped)?;
if !self.check_vector(point_id) {
Copy link
Contributor

@coszio coszio Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it was the same before, but loading the atomic bool on every iteration seems wasteful, even with Ordering::Relaxed.

I made this change, to only check the flag every 1000th iteration and the benchmarks improved a bit

-            for point_id in &mut *points {
-                check_process_stopped(is_stopped)?;
+            for point_id in points.check_stop_every(1000, || check_process_stopped(is_stopped).is_err()) {
                if !self.check_vector(point_id) {
                    continue;
                }
                chunk[chunk_size] = point_id;
                chunk_size += 1;
                if chunk_size == VECTOR_READ_BATCH_SIZE {
                    break;
                }
            }
+           check_process_stopped(is_stopped)?;

(using diff to highlight the important bits)

hnsw-index-build-asymptotic/build-n-search-hnsw-5k
                        time:   [24.581 µs 24.694 µs 24.822 µs]
                        change: [+0.5515% +1.4465% +2.4173%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) high mild
  10 (10.00%) high severe
hnsw-index-build-asymptotic/build-n-search-hnsw-1M
                        time:   [72.063 µs 72.978 µs 73.928 µs]
                        change: [-0.2191% +1.8195% +3.8988%] (p = 0.08 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
hnsw-index-build-asymptotic/build-n-search-hnsw-1M-score-point
                        time:   [30.522 µs 30.890 µs 31.290 µs]
                        change: [-4.4593% -3.4018% -2.3604%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe

scoring-vector/score-point
                        time:   [11.643 µs 11.678 µs 11.712 µs]
                        change: [-0.5989% -0.3534% -0.0951%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
scoring-vector/score-point-10x
                        time:   [11.618 µs 11.630 µs 11.644 µs]
                        change: [-1.8210% -1.3121% -0.7243%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe
scoring-vector/score-point-50x
                        time:   [19.912 µs 20.070 µs 20.250 µs]
+                        change: [-10.370% -8.1031% -5.7106%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe

scoring-vector/basic-score-point
                        time:   [47.388 µs 47.690 µs 48.017 µs]
                        change: [-1.9483% -1.0812% -0.1799%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
scoring-vector/basic-score-point-10x
                        time:   [45.719 µs 45.876 µs 46.058 µs]
+                        change: [-6.1074% -5.6304% -5.1639%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  17 (17.00%) high severe

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't confirm.

scoring-vector/score-point-50x calls RawScorer::score_points, and scoring-vector/basic-score-point-10x calls DotProductMetric::similarity. None are using peek_top_iter.

Other benchmark, storage-score-all/storage vector search which uses peek_top_all, seems not to be affected by this change.

Also, I'm not sure that checking atomic bool every iteration is worse than checking it every Nth iteration (at least, for one reader thread + one writer thread on x86).

On my test (code), it's slightly worse: (more is better)

check_every_nth_iteration:   6633641984
check_every_iteration:       8359090765

@xzfc xzfc force-pushed the RawScorer_stopped branch from 6646045 to 068725a Compare April 4, 2025 16:23
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
lib/segment/src/common/operation_error.rs (3)

216-219: Well-designed error type for cancellation handling

The new CancelledError marker struct and CancellableResult<T> type alias provide a cleaner way to handle cancellation events. This simplifies error handling by removing the need for error descriptions at every cancellation point.

Consider adding documentation comments for these new types to explain their purpose and usage patterns.

+/// A simple marker type that indicates a process was cancelled
 #[derive(Debug, Copy, Clone)]
 pub struct CancelledError;

+/// Result type for operations that can be cancelled
 pub type CancellableResult<T> = Result<T, CancelledError>;

221-227: Use lowercase parameter names in trait implementations

The implementation correctly converts CancelledError to OperationError, but the parameter name CancelledError shadows the type name. This doesn't follow Rust naming conventions.

 impl From<CancelledError> for OperationError {
-    fn from(CancelledError: CancelledError) -> Self {
+    fn from(_error: CancelledError) -> Self {
         OperationError::Cancelled {
             description: PROCESS_CANCELLED_BY_SERVICE_MESSAGE.to_string(),
         }
     }
 }

229-234: Simplified cancellation check implementation

The new implementation of check_process_stopped is cleaner and more straightforward, returning a CancellableResult instead of an OperationResult. This aligns well with the PR objective of refactoring how stopping is handled across the codebase.

Consider adding documentation to explain the function's purpose and behavior, especially since this is a public API.

+/// Checks if a process is marked as stopped
+///
+/// Returns `Ok(())` if the process should continue, or `Err(CancelledError)` if it should stop
 pub fn check_process_stopped(stopped: &AtomicBool) -> CancellableResult<()> {
     if stopped.load(Ordering::Relaxed) {
         return Err(CancelledError);
     }
     Ok(())
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6646045 and 068725a.

📒 Files selected for processing (26)
  • lib/collection/src/operations/types.rs (2 hunks)
  • lib/segment/benches/hnsw_build_asymptotic.rs (4 hunks)
  • lib/segment/benches/hnsw_search_graph.rs (4 hunks)
  • lib/segment/benches/scorer_mmap.rs (2 hunks)
  • lib/segment/benches/vector_search.rs (2 hunks)
  • lib/segment/src/common/operation_error.rs (1 hunks)
  • lib/segment/src/fixtures/index_fixtures.rs (1 hunks)
  • lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs (3 hunks)
  • lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs (1 hunks)
  • lib/segment/src/index/hnsw_index/gpu/mod.rs (3 hunks)
  • lib/segment/src/index/hnsw_index/graph_layers.rs (11 hunks)
  • lib/segment/src/index/hnsw_index/graph_layers_builder.rs (6 hunks)
  • lib/segment/src/index/hnsw_index/hnsw.rs (5 hunks)
  • lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs (3 hunks)
  • lib/segment/src/index/plain_vector_index.rs (3 hunks)
  • lib/segment/src/index/sparse_index/sparse_vector_index.rs (3 hunks)
  • lib/segment/src/vector_storage/async_raw_scorer.rs (6 hunks)
  • lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (6 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs (8 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs (0 hunks)
  • lib/segment/src/vector_storage/raw_scorer.rs (10 hunks)
  • lib/segment/src/vector_storage/tests/async_raw_scorer.rs (1 hunks)
  • lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs (0 hunks)
  • lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (7 hunks)
  • lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (5 hunks)
  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (3 hunks)
💤 Files with no reviewable changes (2)
  • lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs
🚧 Files skipped from review as they are similar to previous changes (13)
  • lib/segment/benches/scorer_mmap.rs
  • lib/segment/src/fixtures/index_fixtures.rs
  • lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
  • lib/segment/src/vector_storage/tests/async_raw_scorer.rs
  • lib/collection/src/operations/types.rs
  • lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
  • lib/segment/src/index/hnsw_index/hnsw.rs
  • lib/segment/src/index/sparse_index/sparse_vector_index.rs
  • lib/segment/benches/vector_search.rs
  • lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs
  • lib/segment/src/vector_storage/async_raw_scorer.rs
  • lib/segment/src/index/hnsw_index/graph_layers_builder.rs
  • lib/segment/src/index/hnsw_index/graph_layers.rs
🧰 Additional context used
🧬 Code Definitions (7)
lib/segment/src/index/plain_vector_index.rs (1)
lib/segment/src/vector_storage/raw_scorer.rs (1)
  • new_raw_scorer (117-207)
lib/segment/benches/hnsw_search_graph.rs (1)
lib/segment/benches/fixture.rs (1)
  • make_cached_graph (15-76)
lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs (1)
lib/segment/src/index/sparse_index/sparse_vector_index.rs (1)
  • vector_storage (69-71)
lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (1)
lib/segment/src/vector_storage/raw_scorer.rs (1)
  • new_raw_scorer_for_test (258-269)
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (2)
lib/segment/src/index/sparse_index/sparse_vector_index.rs (1)
  • vector_storage (69-71)
lib/segment/src/vector_storage/raw_scorer.rs (1)
  • new_raw_scorer_for_test (258-269)
lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs (2)
lib/segment/src/index/sparse_index/sparse_vector_index.rs (1)
  • vector_storage (69-71)
lib/common/common/src/fixed_length_priority_queue.rs (1)
  • top (80-82)
lib/segment/benches/hnsw_build_asymptotic.rs (1)
lib/segment/benches/fixture.rs (1)
  • make_cached_graph (15-76)
🪛 GitHub Actions: Storage compatibility tests
lib/segment/src/vector_storage/raw_scorer.rs

[error] 234-234: cannot find value is_stopped in this scope


[error] 355-355: cannot find value is_stopped in this scope


[error] 462-462: cannot find value is_stopped in this scope


[error] 569-569: cannot find value is_stopped in this scope


[error] 693-693: cannot find value is_stopped in this scope


[error] 802-802: cannot find value is_stopped in this scope


[error] 226-226: this function takes 3 arguments but 4 arguments were supplied


[error] 347-347: this function takes 3 arguments but 4 arguments were supplied


[error] 561-561: this function takes 3 arguments but 4 arguments were supplied


[error] 685-685: this function takes 3 arguments but 4 arguments were supplied


[error] 794-794: this function takes 3 arguments but 4 arguments were supplied

🪛 GitHub Actions: Rust tests
lib/segment/src/vector_storage/raw_scorer.rs

[error] 234-234: cannot find value is_stopped in this scope


[error] 355-355: cannot find value is_stopped in this scope


[error] 462-462: cannot find value is_stopped in this scope


[error] 569-569: cannot find value is_stopped in this scope


[error] 693-693: cannot find value is_stopped in this scope


[error] 802-802: cannot find value is_stopped in this scope


[error] 911-911: cannot find value is_stopped in this scope


[error] 226-226: this function takes 3 arguments but 4 arguments were supplied


[error] 347-347: this function takes 3 arguments but 4 arguments were supplied


[error] 454-454: this function takes 3 arguments but 4 arguments were supplied


[error] 561-561: this function takes 3 arguments but 4 arguments were supplied


[error] 685-685: this function takes 3 arguments but 4 arguments were supplied


[error] 794-794: this function takes 3 arguments but 4 arguments were supplied


[error] 903-903: this function takes 3 arguments but 4 arguments were supplied

⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-consistency
  • GitHub Check: test-consensus-compose
  • GitHub Check: test
🔇 Additional comments (44)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs (1)

723-723: Signature change aligns with removal of is_stopped from RawScorer

The method call to raw_scorer has been updated to remove the is_stopped parameter, reducing the argument count from five to four. This change aligns with the PR objective of removing the is_stopped flag from RawScorer.

lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs (3)

482-482: Added DEFAULT_STOPPED import to support updated method signatures

Adding the DEFAULT_STOPPED constant to the imports enables the explicit passing of the stop condition to search functions, supporting the PR objective of removing is_stopped from RawScorer.


761-762: Updated search_on_level call to pass DEFAULT_STOPPED explicitly

The search_on_level method now receives an explicit stop condition parameter and returns a Result that needs to be unwrapped. This change aligns with the PR's approach to make cancellation checks more explicit.


988-990: Updated search_on_level call in test_gpu_heuristic to pass DEFAULT_STOPPED explicitly

Consistent with the other changes, the search_on_level method call in the heuristic test now receives the explicit stop condition parameter and handles the returned Result.

lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (4)

23-23: Added DEFAULT_STOPPED import to support updated scoring method signatures

Adding the DEFAULT_STOPPED constant to the imports enables the explicit passing of the stop condition to scoring methods like peek_top_iter and peek_top_all.


130-132: Updated peek_top_iter call to use explicit stop condition parameter

The peek_top_iter method now accepts an explicit stop condition parameter (&DEFAULT_STOPPED) and returns a Result that requires unwrapping, aligning with the PR's approach of making cancellation checks more explicit.


153-155: Updated second peek_top_iter call with explicit stop condition

Consistently with the previous call, this instance of peek_top_iter also passes the explicit stop condition parameter and handles the returned Result.


175-175: Updated peek_top_all call with explicit stop condition

The peek_top_all method now also accepts the explicit stop condition parameter, consistent with the overall approach to make cancellation checks explicit throughout the codebase.

lib/segment/src/index/hnsw_index/gpu/mod.rs (3)

116-116: Added DEFAULT_STOPPED import to support updated search method signatures

Adding the DEFAULT_STOPPED constant to the imports enables the explicit passing of the stop condition to the search functions in the test module.


242-244: Updated GPU search call to include explicit stop condition

The search method called on the GPU graph now accepts an explicit stop condition parameter and returns a Result that needs to be unwrapped, aligning with the PR's objective of making cancellation explicit.


253-255: Updated CPU search call to include explicit stop condition

The search method called on the CPU reference graph now also accepts an explicit stop condition parameter, maintaining consistency between GPU and CPU implementations.

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (3)

18-20: Import adds DEFAULT_STOPPED constant

The import statement now includes the DEFAULT_STOPPED constant from the vector_storage module, which will be used as a parameter in scorer methods.


88-90: Scorer method now includes stop condition parameter

The peek_top_iter method call has been updated to include the &DEFAULT_STOPPED parameter, aligning with the PR's objective of passing the stop condition explicitly rather than having it internally managed by RawScorer.


184-186: Consistent addition of stop condition parameter

Similar to the previous update, this peek_top_iter call has been correctly modified to include the &DEFAULT_STOPPED parameter, ensuring consistency across the codebase with the new function signature.

lib/segment/src/index/plain_vector_index.rs (3)

21-21: Updated import to use new_raw_scorer

Import has been changed to use new_raw_scorer instead of the old new_stoppable_raw_scorer function, matching the PR's objective of removing internal stop condition handling from RawScorer.


112-124: Improved function call and error handling

This change includes:

  1. Replacing new_stoppable_raw_scorer with new_raw_scorer
  2. Changing from map to and_then for better error propagation
  3. Adding &is_stopped parameter to peek_top_iter

This properly implements the new approach of passing stop condition explicitly rather than having it internally in RawScorer.


138-145: Consistent updates for unfiltered search path

Similar to the filtered search path, the unfiltered search path has been updated to use new_raw_scorer and pass &is_stopped to peek_top_all. The error handling has also been improved using and_then.

lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs (4)

16-16: Added import for DEFAULT_STOPPED constant

Added import for the DEFAULT_STOPPED constant that will be used in search function calls.


32-40: Updated search_entry call with stop condition parameter

The search_entry method call has been updated to include &DEFAULT_STOPPED as a parameter, and now explicitly handles potential errors with .unwrap(). This is consistent with the PR's objective of passing stop conditions explicitly.


42-50: Updated search_on_level call with stop condition parameter

Similar to the previous update, the search_on_level method call has been modified to include &DEFAULT_STOPPED and now handles errors with .unwrap(). This maintains consistency with the new approach for handling stop conditions.


91-93: Updated graph_layers.search call with stop condition parameter

The search method call on graph_layers has been updated to include &DEFAULT_STOPPED and now handles errors with .unwrap(). This ensures that the test properly uses the new function signature.

lib/segment/benches/hnsw_search_graph.rs (5)

13-13: Added import for DEFAULT_STOPPED constant

Added import for the DEFAULT_STOPPED constant from the vector_storage module for use in search function calls.


23-23: Added fixture module

Introduced a new fixture module to simplify benchmark setup and improve code organization.


32-33: Simplified benchmark setup

Replaced complex vector generation and HNSW index building with a call to fixture::make_cached_graph. This refactoring improves code readability and maintainability by abstracting the setup logic into a reusable function.


43-47: Updated search call with stop condition parameter

The graph_layers.search method call has been updated to include &DEFAULT_STOPPED and now handles errors with .unwrap(). The code is now properly formatted with improved readability.


60-64: Consistently updated compressed benchmark search call

Similar to the uncompressed benchmark, the compressed benchmark's graph_layers.search call has been updated to include &DEFAULT_STOPPED and handle errors with .unwrap(). The code formatting has also been improved.

lib/segment/benches/hnsw_build_asymptotic.rs (10)

4-5: Import of LazyCell improves setup efficiency

The addition of LazyCell from std::cell is a good optimization that allows lazy initialization of test setups, improving resource allocation.


13-13: Standard import of DEFAULT_STOPPED for explicit stopping condition

Importing DEFAULT_STOPPED aligns with the PR objective to remove is_stopped from RawScorer and pass it explicitly to search functions.


22-22: Good modularization with fixture module

Moving the fixture creation code to a separate module improves code organization and reusability.


29-33: Improved resource management with LazyCell for test setup

Using LazyCell for the 5k vector setup ensures resources are only allocated when needed, which is especially important for benchmarks.


41-44: Explicit passing of stopping condition

The addition of &DEFAULT_STOPPED as a parameter to the search method aligns with the PR objective to make stopping conditions more explicit.


49-53: Good constant definition for vector count

Explicitly defining NUM_VECTORS as a constant improves code readability and maintainability.


55-56: Clear benchmark naming convention

Renaming from "build-n-search-hnsw-10x" to "build-n-search-hnsw-1M" improves clarity by clearly indicating the dataset size.


62-64: Consistent usage of explicit stopping condition

Consistent passing of &DEFAULT_STOPPED to search methods throughout the codebase ensures uniform error handling.


77-77: Improved vector generation using NUM_VECTORS

Using NUM_VECTORS in the random range for generating points to score aligns the test with the actual size of the dataset.


150-155: LazyCell for efficient vector generation

Using LazyCell for generating vectors ensures they are only created when needed, improving memory efficiency.

lib/segment/src/vector_storage/raw_scorer.rs (8)

15-17: Enhanced error handling with CancellableResult

Good improvement to imports by adding CancellableResult and related error handling utilities, which supports the new explicit cancellation approach.


87-92: Explicit cancellation parameter in peek_top_iter

Modifying the function signature to accept an explicit is_stopped parameter and return a CancellableResult improves error handling and makes cancellation logic more explicit.


94-98: Explicit cancellation parameter in peek_top_all

Same improvement as with peek_top_iter - adding explicit cancellation parameter and returning CancellableResult improves code clarity and error propagation.


117-118: Simplified raw scorer creation function

Renaming from new_stoppable_raw_scorer to new_raw_scorer and removing the is_stopped parameter reflects the design change to handle cancellation more explicitly.


912-913: Improved cancellation check in peek_top_iter implementation

The implementation now properly accepts the is_stopped parameter and returns a CancellableResult, aligning with the interface changes.


926-926: Consider optimizing the stopping condition check

Loading the atomic bool on every iteration could impact performance. Consider only checking the flag periodically (e.g., every 1000th iteration) as suggested in a previous review comment, which showed performance improvements.

-            for point_id in &mut *points {
-                check_process_stopped(is_stopped)?;
+            for (i, point_id) in (&mut *points).enumerate() {
+                if i % 1000 == 0 {
+                    check_process_stopped(is_stopped)?;
+                }
                if !self.check_vector(point_id) {
                    continue;
                }
                chunk[chunk_size] = point_id;
                chunk_size += 1;
                if chunk_size == VECTOR_READ_BATCH_SIZE {
                    break;
                }
            }
+           check_process_stopped(is_stopped)?;

952-952: Good error handling with CancellableResult

Wrapping the return value in Ok() correctly implements the CancellableResult return type, allowing proper error propagation.


955-961: Consistent implementation of peek_top_all

The implementation of peek_top_all correctly forwards the is_stopped parameter to peek_top_iter, ensuring consistent cancellation behavior throughout the code.

@xzfc xzfc force-pushed the RawScorer_stopped branch 2 times, most recently from 1e5d8e3 to b5b898d Compare April 4, 2025 18:16
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
lib/segment/benches/fixture.rs (1)

32-83: Efficient caching and parallel graph construction.

  1. The parallel approach with rayon's iterators is a good optimization for large vector counts.
  2. The single-threaded threshold usage prevents overhead for small inserts.
  3. Consider a small nitpick: line 75 displays "Buildng HNSW"; might fix the spelling to "Building HNSW".
-                ProgressStyle::with_template("{percent:>3}% Buildng HNSW {wide_bar}").unwrap(),
+                ProgressStyle::with_template("{percent:>3}% Building HNSW {wide_bar}").unwrap(),
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 068725a and 1e5d8e3.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (29)
  • Cargo.toml (1 hunks)
  • lib/collection/src/operations/types.rs (2 hunks)
  • lib/segment/Cargo.toml (1 hunks)
  • lib/segment/benches/fixture.rs (1 hunks)
  • lib/segment/benches/hnsw_build_asymptotic.rs (4 hunks)
  • lib/segment/benches/hnsw_search_graph.rs (4 hunks)
  • lib/segment/benches/scorer_mmap.rs (2 hunks)
  • lib/segment/benches/vector_search.rs (2 hunks)
  • lib/segment/src/common/operation_error.rs (1 hunks)
  • lib/segment/src/fixtures/index_fixtures.rs (1 hunks)
  • lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs (3 hunks)
  • lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs (1 hunks)
  • lib/segment/src/index/hnsw_index/gpu/mod.rs (3 hunks)
  • lib/segment/src/index/hnsw_index/graph_layers.rs (11 hunks)
  • lib/segment/src/index/hnsw_index/graph_layers_builder.rs (6 hunks)
  • lib/segment/src/index/hnsw_index/hnsw.rs (5 hunks)
  • lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs (3 hunks)
  • lib/segment/src/index/plain_vector_index.rs (3 hunks)
  • lib/segment/src/index/sparse_index/sparse_vector_index.rs (3 hunks)
  • lib/segment/src/vector_storage/async_raw_scorer.rs (6 hunks)
  • lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (6 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs (8 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs (0 hunks)
  • lib/segment/src/vector_storage/raw_scorer.rs (10 hunks)
  • lib/segment/src/vector_storage/tests/async_raw_scorer.rs (1 hunks)
  • lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs (0 hunks)
  • lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (7 hunks)
  • lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (5 hunks)
  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (3 hunks)
💤 Files with no reviewable changes (2)
  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs
  • lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
🚧 Files skipped from review as they are similar to previous changes (20)
  • lib/segment/benches/vector_search.rs
  • lib/segment/src/vector_storage/tests/async_raw_scorer.rs
  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
  • lib/segment/src/index/sparse_index/sparse_vector_index.rs
  • lib/segment/benches/scorer_mmap.rs
  • lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
  • lib/segment/src/fixtures/index_fixtures.rs
  • lib/segment/src/index/hnsw_index/gpu/mod.rs
  • lib/collection/src/operations/types.rs
  • lib/segment/src/index/plain_vector_index.rs
  • lib/segment/src/common/operation_error.rs
  • lib/segment/src/index/hnsw_index/hnsw.rs
  • lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs
  • lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
  • lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs
  • lib/segment/benches/hnsw_build_asymptotic.rs
  • lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs
  • lib/segment/src/index/hnsw_index/graph_layers_builder.rs
  • lib/segment/src/vector_storage/async_raw_scorer.rs
  • lib/segment/src/vector_storage/raw_scorer.rs
🧰 Additional context used
🧬 Code Definitions (1)
lib/segment/benches/hnsw_search_graph.rs (1)
lib/segment/benches/fixture.rs (1)
  • make_cached_graph (22-86)
🔇 Additional comments (32)
lib/segment/Cargo.toml (1)

32-32: LGTM: Adding humantime dependency for testing

The addition of the humantime dependency to the dev-dependencies section is correct and follows the project's pattern of using workspace-defined versions.

Cargo.toml (1)

180-180: LGTM: Adding humantime to workspace dependencies

The addition of the humantime crate (version 2.1.0) to the workspace dependencies is appropriate. This crate is useful for parsing and formatting human-readable time durations, which aligns with the database system's needs for reporting timing information.

lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (2)

17-19: LGTM: Updated imports to include DEFAULT_STOPPED

The import statement reorganization is clean and properly includes the new DEFAULT_STOPPED constant.


62-64: LGTM: Updated scorer method calls with DEFAULT_STOPPED parameter

These changes correctly implement the architectural change described in the PR objectives - instead of using an internal is_stopped flag in the RawScorer, the flag is now passed explicitly to the search methods. The addition of .unwrap() handles potential errors that might be returned from these methods now.

Also applies to: 85-87, 107-107, 168-170, 216-218, 236-238

lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs (2)

16-16: LGTM: Added import for DEFAULT_STOPPED

The import of DEFAULT_STOPPED from the vector_storage module is appropriate for the changes made in this file.


32-40: LGTM: Updated search method calls with DEFAULT_STOPPED parameter

The search methods (search_entry, search_on_level, and search) have been consistently updated to include the &DEFAULT_STOPPED parameter and add error handling via unwrap(). This maintains consistency with the architectural change of passing cancellation flags explicitly rather than storing them in the RawScorer.

Also applies to: 42-50, 91-93

lib/segment/benches/hnsw_search_graph.rs (6)

10-10: No issues with the updated imports.
These newly imported items (FakeFilterContext, random_vector) appear consistent with the subsequent usage.


13-13: New import for DEFAULT_STOPPED.
Including the default atomic cancellation token aligns with the broader refactoring to pass an external stop flag.


23-23: Good modularization by introducing mod fixture.
This separation helps keep benchmark setup logic organized in its own module.


32-33: Streamlined setup with make_cached_graph.
Replacing manual graph construction with the helper function from fixture greatly improves readability and maintainability.


43-47: Explicitly passing a stop token enhances cancellation handling.
Adding &DEFAULT_STOPPED makes it consistent with the new design requiring an external atomic stop flag.


60-64: Maintaining consistent approach to cancellation checks.
Reusing &DEFAULT_STOPPED in the second benchmark phase is coherent. No issues found.

lib/segment/benches/fixture.rs (4)

1-15: Overall structure and imports look fine.
The usage of rayon, rand, and indicatif is appropriate for parallel processing with progress reporting. No immediate concerns regarding thread safety or concurrency, given that FakeFilterContext seems trivial.


16-21: Clear doc comments.
The added documentation on caching and deterministic generation of vectors is helpful, ensuring others can navigate and reuse this code.


22-31: Naming alignment and generic bounds.
The function name make_cached_graph is self-explanatory; the generic type parameter METRIC is well-bounded by Metric<f32> + Sync + Send.


88-92: Helper method effectively reads last-updated file data.
updated_ago is a neat utility for user feedback; no immediate issues. Just ensure any errors get logged for debugging.

lib/segment/src/index/hnsw_index/graph_layers.rs (16)

4-4: Importing AtomicBool.
Matches the new approach where is_stopped is passed in from outside to support cancellation.


15-17: Introducing CancellableResult and check_process_stopped.
These changes unify cancellation checks across methods. This is a clear improvement over embedding the flag directly.


67-68: Extending _search_on_level signature with is_stopped.
This extra parameter ensures early termination in the greedy search routine if the process is canceled.


73-74: Immediate cancellation check within the loop.
This frequent check is appropriate for potentially long-running loops, but watch out for the overhead of atomic loads in very large loops.


103-103: Parameter addition in search_on_level.
No concerns; the function properly returns a CancellableResult to propagate potential cancellation.


127-127: Added is_stopped to search_entry.
Aligns with the new design, letting the user or calling scope handle cancellation.


135-135: Cancellation check in search_entry.
The location is correct—cancellation is polled before each down-level search iteration.


252-253: Extended search signature.
Adds the is_stopped parameter to the top-level method, making the entire search pipeline cancellable.


255-255: Early return with an empty result.
If there is no valid entry point, returning an empty vector is safer than erroring out—this prevents confusion in upper layers.


263-264: Passing is_stopped further down.
Ensures cancellation is honored before the final zero-level step begins.


265-265: New line for chaining method call.
Minor addition to support the next call. Looks good.


266-271: Propagation of is_stopped parameter into search_on_level.
Keeps the call chain consistent, preventing partial subsearch from ignoring the stop request.


272-272: Proper final step.
Returning the filtered top results in a cancellable manner. Implementation remains consistent.


394-394: Import of DEFAULT_STOPPED.
Used in tests to provide a default non-canceled reference. Straightforward usage.


409-410: search call updated in the test helper.
Helps validate the new cancellation-aware search. No issues with usage.


452-462: search_on_level calls updated in tests.
The updated signature ensures these tests confirm the presence of the external cancellation token.

@xzfc xzfc force-pushed the RawScorer_stopped branch from b5b898d to 258073b Compare April 4, 2025 18:24
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (8)
lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs (2)

32-50: Consider using expect(...) instead of unwrap()
In tests, replacing .unwrap() with .expect("search_in_builder failed") can provide more clarity if an error occurs, facilitating faster debugging.

-        .unwrap();
+        .expect("search_entry call failed in test");

91-93: Use expect(...) in place of unwrap() in test code
Similar rationale: clearer error messages help pinpoint failures quickly.

-            .unwrap()
+            .expect("search call failed in test");
lib/segment/benches/hnsw_search_graph.rs (2)

43-47: Use expect(...) for better error context
Consider providing a message in case of search failure to aid in diagnosing issues during benchmarking.

-                    .unwrap(),
+                    .expect("search in uncompressed bench failed"),

60-64: Replace .unwrap() with .expect(...)
Improves clarity if the search operation encounters unexpected behavior.

-                    .unwrap(),
+                    .expect("search in compressed bench failed"),
lib/segment/src/common/operation_error.rs (1)

229-230: Rust naming convention
Consider using a lowercase variable name for the parameter within fn from(CancelledError: CancelledError) to maintain standard Rust style.

lib/segment/benches/fixture.rs (3)

16-21: Helpful doc comments on caching
The explanation of caching mechanics is clear. Consider mentioning how the environment variable-based path might behave if these variables are not set.


75-75: Typo in progress style template
"Buildng" should be corrected to "Building."

-                ProgressStyle::with_template("{percent:>3}% Buildng HNSW {wide_bar}").unwrap(),
+                ProgressStyle::with_template("{percent:>3}% Building HNSW {wide_bar}").unwrap(),

88-92: Graceful fallback for file metadata
If the file metadata is unavailable or restricted, updated_ago will fail. Consider a more robust fallback or user-friendly error message.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1e5d8e3 and b5b898d.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (29)
  • Cargo.toml (1 hunks)
  • lib/collection/src/operations/types.rs (2 hunks)
  • lib/segment/Cargo.toml (1 hunks)
  • lib/segment/benches/fixture.rs (1 hunks)
  • lib/segment/benches/hnsw_build_asymptotic.rs (4 hunks)
  • lib/segment/benches/hnsw_search_graph.rs (4 hunks)
  • lib/segment/benches/scorer_mmap.rs (2 hunks)
  • lib/segment/benches/vector_search.rs (2 hunks)
  • lib/segment/src/common/operation_error.rs (1 hunks)
  • lib/segment/src/fixtures/index_fixtures.rs (1 hunks)
  • lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs (3 hunks)
  • lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs (1 hunks)
  • lib/segment/src/index/hnsw_index/gpu/mod.rs (3 hunks)
  • lib/segment/src/index/hnsw_index/graph_layers.rs (11 hunks)
  • lib/segment/src/index/hnsw_index/graph_layers_builder.rs (6 hunks)
  • lib/segment/src/index/hnsw_index/hnsw.rs (5 hunks)
  • lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs (3 hunks)
  • lib/segment/src/index/plain_vector_index.rs (3 hunks)
  • lib/segment/src/index/sparse_index/sparse_vector_index.rs (3 hunks)
  • lib/segment/src/vector_storage/async_raw_scorer.rs (6 hunks)
  • lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (6 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs (8 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs (0 hunks)
  • lib/segment/src/vector_storage/raw_scorer.rs (10 hunks)
  • lib/segment/src/vector_storage/tests/async_raw_scorer.rs (1 hunks)
  • lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs (0 hunks)
  • lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (7 hunks)
  • lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (5 hunks)
  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (3 hunks)
💤 Files with no reviewable changes (2)
  • lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs
🚧 Files skipped from review as they are similar to previous changes (21)
  • Cargo.toml
  • lib/segment/src/vector_storage/tests/async_raw_scorer.rs
  • lib/collection/src/operations/types.rs
  • lib/segment/benches/vector_search.rs
  • lib/segment/Cargo.toml
  • lib/segment/src/index/hnsw_index/gpu/mod.rs
  • lib/segment/src/fixtures/index_fixtures.rs
  • lib/segment/src/index/hnsw_index/hnsw.rs
  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
  • lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs
  • lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
  • lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
  • lib/segment/src/index/sparse_index/sparse_vector_index.rs
  • lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
  • lib/segment/benches/scorer_mmap.rs
  • lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs
  • lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs
  • lib/segment/src/index/plain_vector_index.rs
  • lib/segment/src/vector_storage/async_raw_scorer.rs
  • lib/segment/src/vector_storage/raw_scorer.rs
  • lib/segment/src/index/hnsw_index/graph_layers_builder.rs
🧰 Additional context used
🧬 Code Definitions (3)
lib/segment/src/common/operation_error.rs (1)
lib/collection/src/operations/types.rs (16)
  • from (98-105)
  • from (208-228)
  • from (277-298)
  • from (689-691)
  • from (709-711)
  • from (1194-1199)
  • from (1203-1208)
  • from (1212-1262)
  • from (1266-1268)
  • from (1272-1277)
  • from (1281-1286)
  • from (1290-1295)
  • from (1299-1304)
  • from (1308-1313)
  • from (1317-1322)
  • from (1326-1331)
lib/segment/benches/fixture.rs (1)
lib/segment/src/index/hnsw_index/hnsw.rs (1)
  • new (96-106)
lib/segment/benches/hnsw_build_asymptotic.rs (1)
lib/segment/benches/fixture.rs (1)
  • make_cached_graph (22-86)
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consensus-compose
🔇 Additional comments (36)
lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs (1)

16-16: Use of DEFAULT_STOPPED import is consistent with new cancellation approach
It's good to see the introduction of a dedicated atomic bool for handling cancellation checks.

lib/segment/src/common/operation_error.rs (3)

232-233: CancellableResult introduction is clear
Defining CancellableResult clarifies which operations may fail due to cancellation.


234-240: Consistent error conversion
This From<CancelledError> implementation cleanly integrates cancellations into the existing OperationError type.


242-247:

Details

❓ Verification inconclusive

Check memory ordering
Relaxed is likely sufficient for a cancellation check, but verify that it aligns with concurrency requirements. If stronger guarantees are needed, consider Acquire/SeqCst.


🏁 Script executed:

#!/bin/bash
# Searching for references to check_process_stopped usage to confirm concurrency assumptions
rg -A 5 "check_process_stopped"

Length of output: 33815


Memory Ordering in Cancellation Flag Check

The current implementation of check_process_stopped consistently uses Relaxed ordering throughout the codebase. This is acceptable if the sole purpose of the cancellation flag is to signal termination without enforcing happens-before relationships with other shared data. However, please verify that no additional synchronization is expected when cancellation is triggered. If later requirements demand that all preceding writes be visible after cancellation, consider switching the load (and the corresponding flag store) to use Acquire or even SeqCst ordering.

  • Verify that the cancellation flag isn’t relied upon for establishing memory visibility beyond a simple termination signal.
  • Confirm that the flag’s writer uses an appropriate memory ordering if stronger guarantees become necessary.
lib/segment/benches/hnsw_build_asymptotic.rs (15)

4-4: Improved resource management with LazyCell.

Using LazyCell is a good approach for lazy initialization of potentially expensive resources in benchmarks.


13-13: Added DEFAULT_STOPPED parameter to align with cancellation changes.

This import supports the updated search method signatures that now require an explicit cancellation flag.


22-22: Good modularization using fixture module.

Extracting common functionality to a separate module improves code organization and reusability.


29-32: Improved benchmarking setup with cached graph initialization.

Using LazyCell for lazy initialization of the graph ensures the setup cost doesn't affect benchmark measurements and the cached graph is only created when needed.


34-35: Renamed benchmark to clearly indicate vector size.

Changing from "build-n-search-hnsw" to "build-n-search-hnsw-5k" makes the benchmark purpose clearer.


41-44: Added cancellation flag to search method call.

The search method now includes &DEFAULT_STOPPED parameter and handles the Result with unwrap. This aligns with the changes to remove internal is_stopped handling from the RawScorer.


47-48: Added explicit resource cleanup.

Using drop(setup_5k) explicitly releases resources when they're no longer needed, which is important for benchmarks to avoid memory usage affecting subsequent tests.


49-53: Improved constant naming and vector size definition.

Using a named constant NUM_VECTORS with a clear value (1M) improves code readability and maintenance.


55-56: Renamed benchmark to accurately reflect vector count.

Changing from "build-n-search-hnsw-10x" to "build-n-search-hnsw-1M" clearly communicates the benchmark's purpose.


62-65: Updated search call with cancellation flag and error handling.

The search method now includes &DEFAULT_STOPPED parameter and unwraps the result, consistent with the changes across the codebase.


68-69: Renamed benchmark for clarity and consistency.

Updating from "build-n-search-hnsw-10x-score-point" to "build-n-search-hnsw-1M-score-point" provides a clearer indication of the benchmark's purpose.


77-77: Fixed random range to use correct vector count.

Using NUM_VECTORS instead of a hardcoded value ensures consistency with the actual number of vectors in the benchmark.


83-83: Added resource cleanup for 1M benchmark setup.

Explicitly dropping the setup ensures proper resource management between benchmarks.


150-155: Implemented LazyCell pattern for basic scoring benchmark.

This change provides consistent resource management across all benchmarks in the file.


171-176: Consistent LazyCell usage across benchmarks.

Using the same pattern for all benchmarks ensures consistent behavior and resource management.

lib/segment/src/index/hnsw_index/graph_layers.rs (17)

4-4: Added AtomicBool for cancellation support.

This import enables thread-safe cancellation checks across the codebase.


15-17: Enhanced error handling with cancellation support.

Adding CancellableResult and check_process_stopped enables proper propagation of cancellation signals throughout the code.


61-68: Updated method signature to support explicit cancellation.

The _search_on_level method now accepts an is_stopped parameter and returns a CancellableResult, enabling proper cancellation handling.


73-74: Added cancellation check during search operation.

The check_process_stopped call allows the operation to be cancelled at regular intervals during execution.


92-93: Updated return value to support cancellation pattern.

Returning Ok(()) aligns with the new CancellableResult return type.


96-103: Updated search_on_level signature with cancellation support.

Adding the is_stopped parameter and updating the return type to CancellableResult maintains consistency across the codebase.


108-115: Updated method call and return value to handle cancellation.

The call to _search_on_level now passes the is_stopped parameter and handles the result with proper error propagation.


120-127: Updated search_entry signature with cancellation support.

Adding the is_stopped parameter and updating the return type to CancellableResult maintains consistency across the codebase.


135-136: Added cancellation check within loop.

Checking for cancellation at the start of each iteration allows for responsive cancellation handling.


157-157: Updated return value to support cancellation.

Wrapping the result in Ok() aligns with the new return type.


246-253: Updated search method signature with cancellation support.

Adding the is_stopped parameter and updating the return type to CancellableResult completes the consistent pattern across all search-related methods.


255-256: Updated return value for early exit case.

Using Ok(Vec::default()) instead of returning directly maintains the cancellation result pattern.


258-264: Added cancellation parameter to search_entry call.

Passing the is_stopped parameter and handling the result with ? ensures proper cancellation propagation.


265-272: Updated search_on_level call with cancellation handling.

The updated call passes the cancellation flag and properly handles the result to propagate cancellation.


394-394: Added DEFAULT_STOPPED import for tests.

This import ensures tests can properly use the cancellation mechanism.


408-411: Updated test's search method call with cancellation parameter.

Tests now pass &DEFAULT_STOPPED to match the updated method signature and unwrap the result.


451-462: Updated test's search_on_level call to include cancellation parameter.

Updating the test ensures it remains compatible with the new method signature.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
lib/segment/src/index/hnsw_index/graph_layers.rs (1)

162-193: Consider updating test-only method for consistency.

The search_entry_on_level method is only used in tests but hasn't been updated to follow the new pattern with the is_stopped parameter and CancellableResult return type.

For consistency with the other search methods, consider updating this method:

fn search_entry_on_level(
    &self,
    entry_point: PointOffsetType,
    level: usize,
    points_scorer: &mut FilteredScorer,
+   is_stopped: &AtomicBool,
-) -> ScoredPointOffset {
+) -> CancellableResult<ScoredPointOffset> {
    let limit = self.get_m(level);
    let mut links: Vec<PointOffsetType> = Vec::with_capacity(2 * self.get_m(0));
    let mut current_point = ScoredPointOffset {
        idx: entry_point,
        score: points_scorer.score_point(entry_point),
    };

    let mut changed = true;
    while changed {
+       check_process_stopped(is_stopped)?;
        changed = false;

        links.clear();
        self.links_map(current_point.idx, level, |link| {
            links.push(link);
        });

        let scores = points_scorer.score_points(&mut links, limit);
        scores.iter().copied().for_each(|score_point| {
            if score_point.score > current_point.score {
                changed = true;
                current_point = score_point;
            }
        });
    }
-   current_point
+   Ok(current_point)
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b5b898d and 258073b.

📒 Files selected for processing (24)
  • lib/segment/benches/hnsw_build_asymptotic.rs (4 hunks)
  • lib/segment/benches/hnsw_search_graph.rs (4 hunks)
  • lib/segment/benches/scorer_mmap.rs (2 hunks)
  • lib/segment/benches/vector_search.rs (2 hunks)
  • lib/segment/src/fixtures/index_fixtures.rs (1 hunks)
  • lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs (3 hunks)
  • lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs (1 hunks)
  • lib/segment/src/index/hnsw_index/gpu/mod.rs (3 hunks)
  • lib/segment/src/index/hnsw_index/graph_layers.rs (11 hunks)
  • lib/segment/src/index/hnsw_index/graph_layers_builder.rs (6 hunks)
  • lib/segment/src/index/hnsw_index/hnsw.rs (5 hunks)
  • lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs (3 hunks)
  • lib/segment/src/index/plain_vector_index.rs (3 hunks)
  • lib/segment/src/index/sparse_index/sparse_vector_index.rs (3 hunks)
  • lib/segment/src/vector_storage/async_raw_scorer.rs (6 hunks)
  • lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (6 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs (10 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs (0 hunks)
  • lib/segment/src/vector_storage/raw_scorer.rs (10 hunks)
  • lib/segment/src/vector_storage/tests/async_raw_scorer.rs (1 hunks)
  • lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs (0 hunks)
  • lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (7 hunks)
  • lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (5 hunks)
  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (3 hunks)
💤 Files with no reviewable changes (2)
  • lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs
🚧 Files skipped from review as they are similar to previous changes (19)
  • lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs
  • lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
  • lib/segment/src/index/plain_vector_index.rs
  • lib/segment/src/index/hnsw_index/gpu/mod.rs
  • lib/segment/src/index/sparse_index/sparse_vector_index.rs
  • lib/segment/src/vector_storage/tests/async_raw_scorer.rs
  • lib/segment/benches/vector_search.rs
  • lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
  • lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs
  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
  • lib/segment/src/fixtures/index_fixtures.rs
  • lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs
  • lib/segment/src/index/hnsw_index/hnsw.rs
  • lib/segment/benches/scorer_mmap.rs
  • lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs
  • lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
  • lib/segment/src/vector_storage/raw_scorer.rs
  • lib/segment/src/vector_storage/async_raw_scorer.rs
  • lib/segment/src/index/hnsw_index/graph_layers_builder.rs
🧰 Additional context used
🧬 Code Definitions (2)
lib/segment/benches/hnsw_build_asymptotic.rs (1)
lib/segment/benches/fixture.rs (1)
  • make_cached_graph (22-86)
lib/segment/src/index/hnsw_index/graph_layers.rs (1)
lib/segment/src/index/hnsw_index/graph_links.rs (1)
  • links (125-127)
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: test (macos-latest)
  • GitHub Check: test-consensus-compose
  • GitHub Check: test (windows-latest)
  • GitHub Check: test
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: test-consensus
  • GitHub Check: test
  • GitHub Check: test
🔇 Additional comments (36)
lib/segment/benches/hnsw_search_graph.rs (5)

10-11: Good addition of DEFAULT_STOPPED import

The import for DEFAULT_STOPPED is consistent with the PR's objective to pass the stopped flag explicitly to search functions instead of maintaining it internally in RawScorer.

Also applies to: 13-13


23-23: Good modularization with the fixture module

Adding the fixture module helps reduce code duplication and improves maintainability by extracting common functionality used across benchmarks.


32-33: Well-implemented code extraction

Replacing the detailed graph generation code with a single call to fixture::make_cached_graph significantly simplifies this file while preserving functionality. This improves maintainability and readability.


43-47: Correctly updated search function call

The search function call has been properly updated to include the new &DEFAULT_STOPPED parameter, aligning with the PR objective to pass the stopped flag explicitly rather than maintaining it internally.


60-64: Consistently updated search function call

This second search function call is also correctly updated with the &DEFAULT_STOPPED parameter, maintaining consistency throughout the codebase.

lib/segment/benches/hnsw_build_asymptotic.rs (14)

4-4: Good use of LazyCell for resource management

Using LazyCell for lazy initialization is a good improvement that allows benchmark setup to be created only when needed, improving memory usage efficiency.


13-13: Consistent use of DEFAULT_STOPPED

The import of DEFAULT_STOPPED is consistent with the changes in other files and the PR's objective to pass the stopped flag explicitly to search functions.


22-22: Good reuse of fixture module

Reusing the fixture module across benchmark files promotes code reuse and ensures consistent behavior in all benchmark tests.


29-33: Improved benchmark setup with LazyCell

This implementation uses LazyCell to initialize the 5k vector benchmark setup only when needed, which is more efficient than eagerly creating resources that might not be used immediately.


34-34: Clearer benchmark naming

Renaming the benchmark to "build-n-search-hnsw-5k" better reflects what's being tested and makes the benchmark results more meaningful.


41-43: Correctly updated search call with DEFAULT_STOPPED

The search function call has been properly updated to include the new &DEFAULT_STOPPED parameter, consistent with the PR objectives.


47-47: Good resource management with explicit drop

Explicitly dropping the LazyCell after use ensures resources are properly freed, which is especially important for large benchmark setups.


49-53: Well-structured initialization for 1M vectors

The constant definition followed by LazyCell initialization for the 1M vector benchmark setup is well structured and makes the code's intent clear.


55-55: Descriptive benchmark naming convention

Renaming the benchmarks to clearly indicate they're using 1M vectors improves clarity and makes benchmark results more interpretable.

Also applies to: 68-68


62-64: Consistently updated search function call

This search function call is correctly updated with the &DEFAULT_STOPPED parameter, maintaining consistency with other changes in the codebase.


77-77: Improved random range definition

Using NUM_VECTORS as the upper bound for the random range is more consistent and maintainable than a hardcoded value, as it will automatically adjust if the vector count changes.


83-83: Good cleanup with explicit drop

Explicitly dropping the setup after use ensures proper resource cleanup, which is important for benchmarks using large amounts of memory.


150-155: Consistent use of LazyCell pattern

The LazyCell pattern is consistently applied throughout the file for all benchmark setups that require significant memory. This promotes a consistent approach to resource management.

Also applies to: 171-176


168-168: Thorough resource cleanup

The explicit drops of all LazyCell instances ensure thorough resource cleanup throughout the benchmark execution.

Also applies to: 189-189

lib/segment/src/index/hnsw_index/graph_layers.rs (17)

4-4: Appropriate addition of AtomicBool import.

The addition of the AtomicBool import supports the new cancellation mechanism being implemented, which is consistent with the PR's objective of explicitly passing the stop condition flag.


15-17: Good restructuring of error handling imports.

The addition of cancellation-specific imports from operation_error aligns perfectly with the implementation of the cancellation mechanism. Using CancellableResult and check_process_stopped provides a clean way to handle operation cancellation.


67-68: Well-implemented signature change for cancellation support.

Adding the is_stopped parameter and updating the return type to CancellableResult<()> enables proper cancellation propagation in the search mechanism. This change aligns with the PR objective of explicitly passing the stop condition.


73-74: Effective placement of cancellation check.

The cancellation check is strategically placed at the beginning of each iteration in the search loop. This ensures responsive cancellation without significantly impacting performance, as it's not in the innermost loop of the score calculation.


92-93: Proper return value update.

Updating the return to explicitly use Ok(()) correctly implements the new return type and ensures the function conforms to the updated trait signature.


102-103: Consistent signature update for search_on_level method.

The addition of the is_stopped parameter and return type change to CancellableResult maintains consistency with the cancellation pattern being implemented throughout the search functionality.


108-116: Correct propagation of cancellation parameter.

The method properly passes the is_stopped parameter to the lower-level search function and handles the result with the ? operator, which will propagate any cancellation. The return value is also correctly wrapped in Ok().


126-127: Consistent cancellation support in search_entry method.

Adding the is_stopped parameter and updating the return type to CancellableResult<ScoredPointOffset> ensures cancellation capability is consistently implemented across all search-related methods.


135-136: Strategic placement of cancellation check.

Placing the cancellation check at the beginning of each level iteration in search_entry allows for timely cancellation between each search level without impacting the inner loop performance.


157-158: Proper return value update for search_entry.

The function now correctly returns the result wrapped in Ok() to conform to the updated return type.


252-253: Consistent implementation of cancellation support in search method.

Adding the is_stopped parameter and updating the return type to CancellableResult<Vec<ScoredPointOffset>> completes the implementation of cancellation support throughout the search hierarchy.


255-256: Updated return statement for consistency.

The empty result is now properly wrapped in Ok(Vec::default()) to match the updated return type. This ensures proper error handling when no entry point is found.


258-264: Consistent parameter passing in search method.

The is_stopped parameter is correctly passed to the search_entry method call, ensuring cancellation capability is propagated down the call stack.


265-272: Proper result handling for cancellation.

The search_on_level call now includes the is_stopped parameter, and both method calls use the ? operator for propagating cancellation errors. The final result is correctly wrapped in Ok().


394-394: Appropriate import for tests.

The import of DEFAULT_STOPPED is necessary for the updated test code to work with the new method signatures.


408-411: Updated test code for new method signature.

The test code correctly passes &DEFAULT_STOPPED to the search method and handles the result with .unwrap(), maintaining the test's functionality with the new API.


451-463: Test code correctly updated for new search_on_level signature.

The test for search_on_level has been properly updated to include the &DEFAULT_STOPPED parameter and handle the CancellableResult return type with .unwrap().

// and notify the user that they better use the default IO implementation.

pq.into_sorted_vec()
Ok(pq.into_sorted_vec())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check flag one more time before returning? If operation was cancelled we might have partial result. Since we make it explicit, it might be worth maintaining the declared guarantees

@timvisee timvisee force-pushed the RawScorer_stopped branch from 6428f15 to b23bc08 Compare April 8, 2025 14:13
@generall generall merged commit a534e95 into dev Apr 9, 2025
34 checks passed
@generall generall deleted the RawScorer_stopped branch April 9, 2025 08:54
pull bot pushed a commit to kp-forks/qdrant that referenced this pull request Apr 21, 2025
* Introduce CancelledError

* Remove is_stopped from RawScorer
@coderabbitai coderabbitai bot mentioned this pull request Oct 16, 2025
@coderabbitai coderabbitai bot mentioned this pull request Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants