Conversation
📝 WalkthroughWalkthroughThe changes primarily update the error handling and cancellation mechanics across various modules. A new conversion implementation for transforming a 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (13)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (7)
lib/segment/benches/fixture.rs (3)
38-39: Consider handling potential I/O or parsing errors more gracefully.
Calls tounwrap()on lines 39, 43 can cause runtime panics when loading or constructing the graph. Even in benchmarks, it might be helpful to handle these errors gracefully and log them for easier troubleshooting.A possible approach:
- let raw_scorer = vector_holder.get_raw_scorer(added_vector).unwrap(); + let raw_scorer = match vector_holder.get_raw_scorer(added_vector) { + Ok(s) => s, + Err(err) => { + eprintln!("Failed to create raw scorer: {}", err); + return; // or handle differently + } + };Also applies to: 41-44
65-65: Typo in progress bar message.
"Buildng" should be spelled "Building" for clarity and consistency.- ProgressStyle::with_template("{percent:>3}% Buildng HNSW {wide_bar}") + ProgressStyle::with_template("{percent:>3}% Building HNSW {wide_bar}")
54-73: Confirm parallel build logic.
The logic that builds the initial points on a single thread and the remaining points in parallel is generally fine. However, ensure that the concurrency is beneficial for typical dataset sizes (and doesn’t degrade performance for smaller sets). Also, confirm that concurrency is safe if any environment calls these benchmarks simultaneously.If you want, I can prepare a script or new test to measure performance differences and concurrency safety under multiple runs.
lib/segment/src/common/operation_error.rs (1)
215-217: Naming convention for cancellation error.
The structCancelledErroris concise and fine. However, consider whether the American-style “CanceledError” (one L) is more consistent with the rest of the codebase or commonly used standards. This is just a style preference; no functional changes needed.lib/segment/benches/hnsw_build_asymptotic.rs (2)
37-45: Handle.unwrap()in benchmarks.
The.unwrap()calls in the benchmark loops reflect an assumption that index building and searching will always succeed. Consider converting them into graceful error logs or panics with messages to more easily diagnose potential issues during automated bench runs.Also applies to: 55-66
150-155: Ensure memory constraints in large-scale benchmarks.
Allocating up to 10M or 20M vectors in memory is significant. Confirm your environment has enough resources for these benchmarks. Otherwise, out-of-memory conditions may skew benchmarking results or crash the process.I can provide a resource usage script to monitor memory consumption across these large-scale benchmarks if needed.
Also applies to: 172-176
lib/segment/src/index/hnsw_index/graph_layers.rs (1)
67-68: Efficient cancellation checks in_search_on_level.
The introduction of theis_stoppedparameter and the inlinecheck_process_stopped(is_stopped)?;within the loop ensures timely cancellation. However, running this check in tight loops could potentially impact performance for large data sets. Consider adjusting the frequency of checks if performance benchmarks show overhead.Also applies to: 73-74, 93-93
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (24)
lib/collection/src/operations/types.rs(2 hunks)lib/segment/benches/fixture.rs(1 hunks)lib/segment/benches/hnsw_build_asymptotic.rs(4 hunks)lib/segment/benches/hnsw_search_graph.rs(4 hunks)lib/segment/benches/scorer_mmap.rs(2 hunks)lib/segment/benches/vector_search.rs(2 hunks)lib/segment/src/common/operation_error.rs(1 hunks)lib/segment/src/fixtures/index_fixtures.rs(1 hunks)lib/segment/src/index/hnsw_index/graph_layers.rs(11 hunks)lib/segment/src/index/hnsw_index/graph_layers_builder.rs(6 hunks)lib/segment/src/index/hnsw_index/hnsw.rs(5 hunks)lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs(3 hunks)lib/segment/src/index/plain_vector_index.rs(3 hunks)lib/segment/src/index/sparse_index/sparse_vector_index.rs(3 hunks)lib/segment/src/vector_storage/async_raw_scorer.rs(6 hunks)lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs(6 hunks)lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs(8 hunks)lib/segment/src/vector_storage/quantized/quantized_vectors.rs(0 hunks)lib/segment/src/vector_storage/raw_scorer.rs(10 hunks)lib/segment/src/vector_storage/tests/async_raw_scorer.rs(1 hunks)lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs(0 hunks)lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs(7 hunks)lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs(5 hunks)lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs(3 hunks)
💤 Files with no reviewable changes (2)
- lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
- lib/segment/src/vector_storage/quantized/quantized_vectors.rs
🧰 Additional context used
🧬 Code Definitions (15)
lib/segment/benches/vector_search.rs (2)
lib/segment/src/index/sparse_index/sparse_vector_index.rs (1)
vector_storage(69-71)lib/segment/src/vector_storage/raw_scorer.rs (1)
new_raw_scorer_for_test(258-269)
lib/segment/src/index/sparse_index/sparse_vector_index.rs (2)
lib/segment/src/vector_storage/raw_scorer.rs (1)
new_raw_scorer(117-207)lib/segment/src/data_types/query_context.rs (2)
is_stopped(52-54)is_stopped(203-207)
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (2)
lib/segment/src/vector_storage/raw_scorer.rs (1)
new_raw_scorer_for_test(258-269)lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)
raw_scorer(147-165)
lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs (1)
lib/segment/src/vector_storage/raw_scorer.rs (1)
raw_scorer_from_query_scorer(553-568)
lib/segment/src/fixtures/index_fixtures.rs (2)
lib/segment/src/index/sparse_index/sparse_vector_index.rs (1)
vector_storage(69-71)lib/segment/src/vector_storage/raw_scorer.rs (1)
raw_scorer_impl(271-303)
lib/segment/benches/scorer_mmap.rs (2)
lib/segment/src/index/sparse_index/sparse_vector_index.rs (1)
vector_storage(69-71)lib/segment/src/vector_storage/raw_scorer.rs (1)
new_raw_scorer_for_test(258-269)
lib/segment/src/index/plain_vector_index.rs (1)
lib/segment/src/vector_storage/raw_scorer.rs (1)
new_raw_scorer(117-207)
lib/segment/src/index/hnsw_index/hnsw.rs (4)
lib/segment/src/index/sparse_index/sparse_vector_index.rs (1)
vector_storage(69-71)lib/segment/src/vector_storage/raw_scorer.rs (1)
new_raw_scorer(117-207)lib/segment/src/index/hnsw_index/gpu/batched_points.rs (1)
points(92-94)lib/segment/src/index/field_index/map_index/immutable_map_index.rs (1)
points(216-216)
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (2)
lib/segment/src/index/sparse_index/sparse_vector_index.rs (1)
vector_storage(69-71)lib/segment/src/vector_storage/raw_scorer.rs (1)
new_raw_scorer_for_test(258-269)
lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (1)
lib/segment/src/vector_storage/raw_scorer.rs (1)
new_raw_scorer_for_test(258-269)
lib/segment/benches/hnsw_search_graph.rs (1)
lib/segment/benches/fixture.rs (1)
make_cached_graph(15-76)
lib/segment/src/index/hnsw_index/graph_layers_builder.rs (4)
lib/segment/src/index/hnsw_index/hnsw.rs (2)
new(96-106)graph(173-175)lib/segment/src/vector_storage/async_raw_scorer.rs (3)
new(22-29)new(43-57)new(203-223)lib/segment/src/index/hnsw_index/entry_points.rs (1)
new(34-39)lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs (1)
new(33-53)
lib/segment/benches/hnsw_build_asymptotic.rs (1)
lib/segment/benches/fixture.rs (1)
make_cached_graph(15-76)
lib/segment/src/index/hnsw_index/graph_layers.rs (1)
lib/segment/src/data_types/query_context.rs (2)
is_stopped(52-54)is_stopped(203-207)
lib/segment/src/vector_storage/async_raw_scorer.rs (2)
lib/segment/src/data_types/query_context.rs (2)
is_stopped(52-54)is_stopped(203-207)lib/segment/src/vector_storage/raw_scorer.rs (2)
peek_top_all(94-98)peek_top_all(955-962)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-consensus-compose
- GitHub Check: test-consensus
- GitHub Check: test (macos-latest)
- GitHub Check: test (windows-latest)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
🔇 Additional comments (79)
lib/segment/src/fixtures/index_fixtures.rs (2)
19-19: Import statement updated to reflect refactored design patternThe removal of
DEFAULT_STOPPEDfrom the import statement aligns with the PR objective of removing theis_stoppedflag from theRawScorercomponent. This import change is consistent with the overall design shift toward passing the stop flag explicitly to search functions rather than storing it within the RawScorer.
142-150: Implementation correctly follows the new design patternThe
get_raw_scorermethod has been properly updated to no longer pass theDEFAULT_STOPPEDparameter, which is consistent with the removal of theis_stoppedflag from theRawScorercomponent. This change ensures the code follows the new pattern of passing stop flags explicitly to search functions instead.#!/bin/bash # Verify that the raw_scorer_impl function signature has been updated rg -A 2 "fn raw_scorer_impl" lib/segment/src/vector_storage/lib/segment/benches/vector_search.rs (2)
17-19: Import correctly includes DEFAULT_STOPPED constantThe import statement has been updated to include
DEFAULT_STOPPED, which is necessary for the explicit passing of the stop flag to search functions. This change aligns with the PR's objective of removing theis_stoppedflag from theRawScorercomponent.
76-77: Search function signature updated to include stop flag parameterThe
peek_top_allmethod call has been properly updated to include the&DEFAULT_STOPPEDparameter, implementing the new pattern of explicitly passing stop flags to search functions rather than storing them within theRawScorer. This change ensures consistency with the overall codebase refactoring.lib/segment/benches/scorer_mmap.rs (2)
15-17: Import statement properly includes DEFAULT_STOPPED constantThe import statement has been updated to include
DEFAULT_STOPPED, which is necessary for the new design pattern of explicitly passing stop flags to search functions.
73-74: Search function call correctly includes stop flag parameterThe
peek_top_allmethod call has been properly updated to include the&DEFAULT_STOPPEDparameter, which is consistent with the PR's objective of explicitly passing stop flags to search functions rather than storing them within theRawScorer.lib/segment/src/vector_storage/tests/async_raw_scorer.rs (1)
117-117: Function call updated to match new signature without is_stopped parameterThe
async_raw_scorer::newfunction call has been updated to remove theis_stoppedparameter, aligning with the PR's objective of removing theis_stoppedflag from theRawScorercomponent. This change is consistent with the pattern now being implemented throughout the codebase.#!/bin/bash # Verify that the async_raw_scorer::new function signature has been updated rg -A 5 "async_raw_scorer::new" lib/segment/src/vector_storage/lib/collection/src/operations/types.rs (1)
1251-1255: LGTM: Implementation forFrom<CancelledError> for CollectionErroradded.This addition allows direct conversion from
CancelledErrortoCollectionError, which is consistent with the PR objective of streamlining error propagation for cancellation.lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs (4)
16-16: Added required import for DEFAULT_STOPPED.This import aligns with the PR objective to use the explicit stopped flag parameter.
32-40: Updated search_entry to include explicit stopped flag parameter.The method now takes
&DEFAULT_STOPPEDas an additional parameter, consistent with the refactoring to remove internal handling of the stop condition from RawScorer.
42-50: Updated search_on_level to include explicit stopped flag parameter.Similar to the above change, this method now takes
&DEFAULT_STOPPEDas an additional parameter.
91-93: Updated graph_layers.search to include explicit stopped flag parameter.The search method now takes
&DEFAULT_STOPPEDas its last parameter, consistent with the overall pattern of making cancellation checks explicit.lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (4)
23-23: Added DEFAULT_STOPPED to imports.This import is necessary for the updated method calls throughout the file.
130-132: Updated peek_top_iter to include explicit stopped flag parameter.The method now takes
&DEFAULT_STOPPEDas an additional parameter, consistent with the refactoring approach.
175-175: Updated peek_top_all to include explicit stopped flag parameter.The implementation now passes
&DEFAULT_STOPPEDto the method call, matching the updated method signature.
238-240: Updated another peek_top_iter call to include explicit stopped flag parameter.This change is consistent with the previous updates to method calls.
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (7)
296-296: Updated import to include DEFAULT_STOPPED constantThe import statement now includes
DEFAULT_STOPPEDfrom thevector_storagemodule, which will be used in test functions for operation cancellation.
404-405: Added DEFAULT_STOPPED parameter to peek_top_all method callThe
peek_top_allmethod now requires an additional parameter to handle operation cancellation. This replaces the previous approach where theRawScorerhad an internal stopped flag.
410-413: Updated peek_top_iter method call to include DEFAULT_STOPPED parameterThe call to
peek_top_iterhas been updated to include the&DEFAULT_STOPPEDparameter, consistent with the changes in signature of the scorer's methods. The code is also reformatted with line breaks for better readability.
489-491: Added DEFAULT_STOPPED parameter to peek_top_iter in test_delete_points functionSimilar to the previous changes, the
peek_top_itermethod call now includes the&DEFAULT_STOPPEDparameter to support the explicit cancellation mechanism.
516-518: Updated peek_top_iter in another test functionThe call to
peek_top_iterhas been updated consistently with the other changes to include the&DEFAULT_STOPPEDparameter.
541-541: Added DEFAULT_STOPPED parameter to peek_top_all in test_delete_pointsConsistent with the other changes, this
peek_top_allmethod call now includes the&DEFAULT_STOPPEDparameter.
608-610: Added DEFAULT_STOPPED parameter to peek_top_iter in test_update_from_delete_pointsThe call to
peek_top_iterhas been updated to include the&DEFAULT_STOPPEDparameter to support explicit cancellation.lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (3)
18-20: Restructured imports and added DEFAULT_STOPPEDThe imports have been reorganized to group related items together and include the
DEFAULT_STOPPEDconstant, which is now used in the test functions.
88-90: Updated peek_top_iter to include the DEFAULT_STOPPED parameterThe
peek_top_itermethod call now includes the&DEFAULT_STOPPEDparameter, matching the updated method signature that requires a reference to a stopping condition. This improves code consistency across the codebase.
184-186: Updated peek_top_iter in do_test_update_from_delete_points functionAdded the
&DEFAULT_STOPPEDparameter to thepeek_top_itermethod call, consistent with the signature change to explicitly pass the stopping condition.lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (7)
17-19: Restructured imports and added DEFAULT_STOPPEDThe imports have been reorganized into a block-style format and now include the
DEFAULT_STOPPEDconstant from thevector_storagemodule.
62-64: Updated peek_top_iter call in do_test_delete_pointsThe
peek_top_itermethod call now includes the&DEFAULT_STOPPEDparameter and has been reformatted for better readability. This change is part of the broader refactoring to make cancellation explicit rather than internal.
85-87: Added DEFAULT_STOPPED parameter to another peek_top_iter callConsistent with other changes, this
peek_top_itercall has been updated to include the&DEFAULT_STOPPEDparameter.
107-107: Updated peek_top_all call with DEFAULT_STOPPED parameterThe
peek_top_allmethod now requires the&DEFAULT_STOPPEDparameter, which has been added to this call.
168-170: Updated peek_top_iter in do_test_update_from_delete_pointsThe call to
peek_top_iterhas been updated to include the&DEFAULT_STOPPEDparameter and reformatted for better readability.
216-218: Updated peek_top_iter in do_test_score_pointsThe call to
peek_top_iterin thedo_test_score_pointsfunction has been updated to include the&DEFAULT_STOPPEDparameter and reformatted.
236-238: Updated another peek_top_iter call in do_test_score_pointsAnother call to
peek_top_iterhas been consistently updated to include the&DEFAULT_STOPPEDparameter.lib/segment/src/index/plain_vector_index.rs (3)
21-21: Updated import to use new_raw_scorer instead of new_stoppable_raw_scorerThe import statement now includes
new_raw_scorerinstead ofnew_stoppable_raw_scorer, reflecting the refactoring of how cancellation is handled.
112-124: Refactored to use new_raw_scorer and explicitly pass is_stopped parameterThe code now uses
new_raw_scorerinstead ofnew_stoppable_raw_scorerand has been updated to explicitly pass the&is_stoppedparameter to thepeek_top_itermethod. Additionally, the error handling has been improved by usingand_theninstead ofmap, which provides better control flow for error propagation.
138-145: Updated unfiltered search to use new_raw_scorer and explicit is_stopped parameterSimilar to the filtered search case, the unfiltered search now uses
new_raw_scorerinstead ofnew_stoppable_raw_scorerand passes the&is_stoppedparameter explicitly to thepeek_top_allmethod. The error handling has also been improved withand_then.lib/segment/src/index/sparse_index/sparse_vector_index.rs (4)
39-39: Correctly updated import for the new function name.The import has been correctly updated to use
new_raw_scorerinstead of the previousnew_stoppable_raw_scorerfunction, consistent with the PR objective of removing theis_stoppedflag from theRawScorer.
300-305: Properly updated to use new raw scorer function.The code now uses
new_raw_scorerfunction, correctly removing theis_stoppedparameter that was previously passed tonew_stoppable_raw_scorer. This change aligns with the PR objective to simplify the raw scorer implementation.
318-318: Correctly added explicit stop condition to the search function.The
peek_top_iterfunction call now correctly receives theis_stoppedparameter explicitly, following the new pattern of passing the stop condition directly to search functions rather than storing it in the RawScorer.
322-322: Correctly added explicit stop condition to the search function.The
peek_top_allfunction call now correctly receives theis_stoppedparameter explicitly, following the same pattern as with thepeek_top_iterfunction. This ensures consistent handling of stop conditions throughout the codebase.lib/segment/benches/hnsw_search_graph.rs (5)
10-13: Properly updated imports for the new pattern.The imports have been updated to:
- Use the new fixture module (line 10)
- Add the DEFAULT_STOPPED constant (line 13)
These changes support the PR's objective of explicitly passing stop conditions to search functions instead of embedding them in the scorer.
23-23: Introduced a new fixture module for better code organization.Adding a dedicated fixture module improves code organization by extracting the graph generation logic into a reusable component, which aligns with good software engineering practices.
32-33: Simplified graph creation using the new fixture module.The benchmark now uses
fixture::make_cached_graphto create the test vectors and graph layers, which significantly simplifies the benchmark setup. Based on the provided context, this function encapsulates the vector generation and graph building logic that was previously inline.
43-47: Correctly updated search call with explicit stop condition.The search function call now includes the
&DEFAULT_STOPPEDparameter, aligning with the PR's objective of passing stop conditions explicitly to search functions instead of storing them in the scorer.
60-64: Correctly updated second search call with explicit stop condition.The second search call in the compressed benchmark also correctly includes the
&DEFAULT_STOPPEDparameter, ensuring consistent behavior across different benchmark tests.lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs (2)
161-161: Successfully removed the is_stopped parameter from raw scorer creation.The call to
raw_scorer_from_query_scorerhas been updated to no longer pass theis_stoppedparameter, which aligns with the PR objective to simplify the raw scorer implementation by removing the stop condition from it.
171-171: Consistently removed is_stopped from all raw scorer creation calls.All instances of
raw_scorer_from_query_scorercalls have been systematically updated to remove theis_stoppedparameter. This consistency ensures that the refactoring is complete and no legacy usage patterns remain in the codebase.Also applies to: 182-182, 192-192, 225-225, 237-237, 249-249, 261-261
lib/segment/src/index/hnsw_index/hnsw.rs (5)
57-57: Updated import to use the new raw scorer function.The import has been correctly modified to use
new_raw_scorerinstead of the previousnew_stoppable_raw_scorer, consistent with the PR objective.
65-65: Changed SINGLE_THREADED_HNSW_BUILD_THRESHOLD from private to public.Making the
SINGLE_THREADED_HNSW_BUILD_THRESHOLDconstant public allows external components (like the new fixture module) to access this value, which is necessary for consistent graph building behavior across the codebase.Also applies to: 67-67
365-378: Successfully updated all raw scorer creation calls.All instances of raw scorer creation have been systematically updated to use
new_raw_scorerinstead ofnew_stoppable_raw_scorer, ensuring consistent implementation of the new pattern across the codebase.Also applies to: 582-593, 691-702, 1037-1043, 1091-1096
840-846: Added explicit stop condition to graph search function.The
graph.searchcall now includes the explicit&is_stoppedparameter, properly implementing the new pattern of passing stop conditions directly to search functions rather than embedding them in scorers.
912-912: Added explicit stop condition to peek_top_iter function.The
peek_top_itercall now correctly includes the&is_stoppedparameter, consistent with the updated function signature and the PR's objective.lib/segment/benches/fixture.rs (1)
27-33:Details
❓ Verification inconclusive
Validate existence of Cargo environment variables.
The code constructs a cache path using Cargo environment variables (CARGO_TARGET_TMPDIR,CARGO_PKG_NAME,CARGO_CRATE_NAME). Please ensure these are always set in every environment where this benchmark is expected to run. Otherwise,env!()will panic if any are undefined.Would you like to verify usage of these environment variables across the build and CI pipelines?
🏁 Script executed:
#!/bin/bash # Verify usage of required environment variables in repository build scripts. rg -A 5 'env!("CARGO_TARGET_TMPDIR"|CARGO_PKG_NAME|CARGO_CRATE_NAME)'Length of output: 71
Attention: Confirm that Cargo environment variables are consistently available for the benchmark.
The cache path is built usingenv!("CARGO_TARGET_TMPDIR"),env!("CARGO_PKG_NAME"), andenv!("CARGO_CRATE_NAME"). Our initial search did not yield evidence of explicit provisioning of these variables in build scripts or CI configurations. Please manually verify that your build/CI pipelines provide these environment variables (or that Cargo sets them automatically) to avoid any runtime panics when they’re missing. Specifically, review the configurations for environments where these benchmarks are run.
- File:
lib/segment/benches/fixture.rs(Lines 27-33)- Snippet:
let path = Path::new(env!("CARGO_TARGET_TMPDIR")) .join(env!("CARGO_PKG_NAME")) .join(env!("CARGO_CRATE_NAME")) .join(format!( "{num_vectors}-{dim}-{m}-{ef_construct}-{use_heuristic}-{:?}", METRIC::distance(), ));lib/segment/src/common/operation_error.rs (1)
229-234:Details
✅ Verification successful
Assess atomic ordering.
UsingRelaxedordering incheck_process_stoppedmight suffice if a best-effort check is enough. If there's a need for stronger memory ordering to guarantee visibility across threads, consider usingAcquireor stronger.Could you verify if there are other places in the code requiring stronger guarantees for this cancellation check?
🏁 Script executed:
#!/bin/bash # Search for all references of `check_process_stopped` to verify usage context. rg -A 5 'check_process_stopped'Length of output: 33769
Atomic ordering usage confirmed: no changes required.
Our investigation shows that thecheck_process_stoppedfunction is consistently invoked across various modules withOrdering::Relaxed, which aligns with its design as a best-effort cancellation check. There is no evidence from the usage contexts that a stronger memory ordering (likeAcquire) is necessary at this time. If future concurrency scenarios reveal stricter consistency requirements, this decision can be revisited.lib/segment/benches/hnsw_build_asymptotic.rs (3)
13-13: Verify default stopped usage.
DEFAULT_STOPPEDis passed to each benchmark. This global atomic might remainfalsemost of the time, so it won’t truly simulate an early stop condition in typical usage. If real cancellation scenarios need measuring, consider customizing or toggling this value mid-benchmark.Do you want a separate benchmark that flips the atomic to test partial builds or partial scoring?
29-32: Lazy initialization improves memory usage.
UsingLazyCellto delay constructing large data sets (5k or 1M vectors) is an effective approach to prevent incurring memory costs until the benchmark runs. This is especially helpful when multiple benchmarks exist, but only a few get selected. Good job!Also applies to: 49-53
77-77: Consider overhead of freeing large sets.
Droppingsetup_5kandsetup_1mpromptly is a great idea to avoid inflating memory usage across subsequent benchmarks. Since this is an explicit drop, ensure benchmarks are isolated so that performance results from the next test aren’t influenced by ongoing memory cleanup.Would you like a follow-up script to check if large memory allocations persist due to lingering references?
Also applies to: 83-83
lib/segment/src/vector_storage/raw_scorer.rs (7)
15-17: Updated imports for improved error handlingThe imports now include
CancellableResultandcheck_process_stoppedto support the new cancellation mechanism.
87-92: Function signature updated to support explicit cancellationThe
peek_top_itermethod now accepts anis_stoppedparameter and returns aCancellableResult, which aligns with the PR objective to remove theis_stoppedflag fromRawScorerand pass it explicitly.
94-98: Consistently updatedpeek_top_allsignature for cancellationSimilar to
peek_top_iter, this method has been updated to accept an explicitis_stoppedparameter and return aCancellableResult.
117-117: Function renamed for clarityThe function name has been simplified from
new_stoppable_raw_scorertonew_raw_scorer, reflecting that the stopping functionality is now passed explicitly rather than being a special variant.
209-209: Added DEFAULT_STOPPED constantA new static
DEFAULT_STOPPEDvariable provides a default non-stopping state that can be used when cancellation isn't needed.
908-953: Updated implementation to use explicit cancellation checkThe
peek_top_iterimplementation now:
- Uses
check_process_stoppedto verify if processing should be halted- Returns a proper
CancellableResultwrapped withOkThis implementation properly handles the cancellation check in a centralized way.
955-962: Updatedpeek_top_allto forward cancellation flagThe implementation now correctly forwards the
is_stoppedparameter topeek_top_iter, maintaining the cancellation capability throughout the call chain.lib/segment/src/vector_storage/async_raw_scorer.rs (4)
11-11: Updated import for cancellation supportAdded
CancellableResultto support the new cancellation mechanism in async operations.
128-141: Updatedpeek_top_itersignature and implementation for explicit cancellationThe method has been updated to:
- Accept an
is_stoppedparameter instead of using an internal flag- Return a
CancellableResulttype- Use the parameter in the
take_whileiterator adaptor to control iterationThis change aligns with the PR objective to make cancellation explicit.
160-171: Updatedpeek_top_allconsistently with cancellation patternSimilar to
peek_top_iter, this method has been updated to accept an explicitis_stoppedparameter, return aCancellableResult, and use the parameter for cancellation checking.
217-217: Removedis_stoppedfrom builderThe
storagefield is now directly assigned without the intermediateis_stoppedparameter, consistent with the pattern of not storing the cancellation state in the object.lib/segment/src/index/hnsw_index/graph_layers_builder.rs (6)
5-5: Added AtomicBool importAdded the necessary import for
AtomicBoolto support the explicit cancellation pattern.
384-387: Added explicit cancellation parameter to search callsNow passing a newly created
AtomicBoolset tofalseto thesearch_entrymethod and unwrapping the result. This ensures compatibility with the updated method signature that supports cancellation.
436-438: Added explicit cancellation parameter to search level callsSimilar to the previous change, now passing a cancellation flag to
_search_on_leveland unwrapping the result, maintaining consistency with the cancellation pattern throughout the codebase.
583-583: Imported DEFAULT_STOPPED constantUsing the shared
DEFAULT_STOPPEDconstant for test cases, ensuring consistent behavior when cancellation isn't needed.
743-745: Use DEFAULT_STOPPED in search calls in testsTest cases now properly pass the
DEFAULT_STOPPEDconstant to thesearchmethod and unwrap the result, adapting to the new cancellation pattern.
838-840: Consistently using DEFAULT_STOPPED in all test casesSimilar to the previous test case, this test has been updated to pass the cancellation flag and handle the result type, ensuring consistent testing of the new cancellation mechanism.
lib/segment/src/index/hnsw_index/graph_layers.rs (5)
4-4: Integrating cancellation mechanism imports looks consistent.
The newly importedAtomicBool,CancellableResult,OperationError,OperationResult, andcheck_process_stoppedare well-aligned with the updated approach for cancellation support, ensuring that these methods can reliably handle operation termination.Also applies to: 15-17
102-103:search_on_levelsignature update is coherent.
Passingis_stoppedand returning aCancellableResultextends consistent cancellation handling to higher-level searches. This approach is well-structured for gracefully halting operations.Also applies to: 108-115
126-127:search_entrygracefully handles cancellation.
Addingis_stoppedwithcheck_process_stoppedinside loops provides a clean mechanism for early termination, improving user control over extended searches.Also applies to: 135-136, 157-157
252-253: Refinedsearchfunction with cancellation support.
Including anAtomicBoolparameter and returning early if no entry point is found helps unify error handling. Subsequent calls tosearch_entryandsearch_on_levelproperly propagate cancellation, clarifying the request flow for callers.Also applies to: 255-255, 263-264, 265-272
394-394: Test coverage ofDEFAULT_STOPPEDusage.
The updated test code demonstrates correct usage of theDEFAULT_STOPPEDconstant in functions likesearchandsearch_on_level. This confirms that the cancellation pathway is successfully integrated into testing, ensuring robust coverage.Also applies to: 408-410, 451-462
coszio
left a comment
There was a problem hiding this comment.
This does make more sense to me 🧠
| for point_id in &mut *points { | ||
| if self.is_stopped.load(Ordering::Relaxed) { | ||
| break; | ||
| } | ||
| check_process_stopped(is_stopped)?; | ||
| if !self.check_vector(point_id) { |
There was a problem hiding this comment.
I know it was the same before, but loading the atomic bool on every iteration seems wasteful, even with Ordering::Relaxed.
I made this change, to only check the flag every 1000th iteration and the benchmarks improved a bit
- for point_id in &mut *points {
- check_process_stopped(is_stopped)?;
+ for point_id in points.check_stop_every(1000, || check_process_stopped(is_stopped).is_err()) {
if !self.check_vector(point_id) {
continue;
}
chunk[chunk_size] = point_id;
chunk_size += 1;
if chunk_size == VECTOR_READ_BATCH_SIZE {
break;
}
}
+ check_process_stopped(is_stopped)?;(using diff to highlight the important bits)
hnsw-index-build-asymptotic/build-n-search-hnsw-5k
time: [24.581 µs 24.694 µs 24.822 µs]
change: [+0.5515% +1.4465% +2.4173%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) high mild
10 (10.00%) high severe
hnsw-index-build-asymptotic/build-n-search-hnsw-1M
time: [72.063 µs 72.978 µs 73.928 µs]
change: [-0.2191% +1.8195% +3.8988%] (p = 0.08 > 0.05)
No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
hnsw-index-build-asymptotic/build-n-search-hnsw-1M-score-point
time: [30.522 µs 30.890 µs 31.290 µs]
change: [-4.4593% -3.4018% -2.3604%] (p = 0.00 < 0.05)
Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
5 (5.00%) high mild
6 (6.00%) high severe
scoring-vector/score-point
time: [11.643 µs 11.678 µs 11.712 µs]
change: [-0.5989% -0.3534% -0.0951%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
3 (3.00%) high mild
1 (1.00%) high severe
scoring-vector/score-point-10x
time: [11.618 µs 11.630 µs 11.644 µs]
change: [-1.8210% -1.3121% -0.7243%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) high mild
4 (4.00%) high severe
scoring-vector/score-point-50x
time: [19.912 µs 20.070 µs 20.250 µs]
+ change: [-10.370% -8.1031% -5.7106%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) high mild
6 (6.00%) high severe
scoring-vector/basic-score-point
time: [47.388 µs 47.690 µs 48.017 µs]
change: [-1.9483% -1.0812% -0.1799%] (p = 0.02 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
scoring-vector/basic-score-point-10x
time: [45.719 µs 45.876 µs 46.058 µs]
+ change: [-6.1074% -5.6304% -5.1639%] (p = 0.00 < 0.05)
Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
1 (1.00%) low mild
1 (1.00%) high mild
17 (17.00%) high severeThere was a problem hiding this comment.
Can't confirm.
scoring-vector/score-point-50x calls RawScorer::score_points, and scoring-vector/basic-score-point-10x calls DotProductMetric::similarity. None are using peek_top_iter.
Other benchmark, storage-score-all/storage vector search which uses peek_top_all, seems not to be affected by this change.
Also, I'm not sure that checking atomic bool every iteration is worse than checking it every Nth iteration (at least, for one reader thread + one writer thread on x86).
On my test (code), it's slightly worse: (more is better)
check_every_nth_iteration: 6633641984
check_every_iteration: 8359090765
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
lib/segment/src/common/operation_error.rs (3)
216-219: Well-designed error type for cancellation handlingThe new
CancelledErrormarker struct andCancellableResult<T>type alias provide a cleaner way to handle cancellation events. This simplifies error handling by removing the need for error descriptions at every cancellation point.Consider adding documentation comments for these new types to explain their purpose and usage patterns.
+/// A simple marker type that indicates a process was cancelled #[derive(Debug, Copy, Clone)] pub struct CancelledError; +/// Result type for operations that can be cancelled pub type CancellableResult<T> = Result<T, CancelledError>;
221-227: Use lowercase parameter names in trait implementationsThe implementation correctly converts
CancelledErrortoOperationError, but the parameter nameCancelledErrorshadows the type name. This doesn't follow Rust naming conventions.impl From<CancelledError> for OperationError { - fn from(CancelledError: CancelledError) -> Self { + fn from(_error: CancelledError) -> Self { OperationError::Cancelled { description: PROCESS_CANCELLED_BY_SERVICE_MESSAGE.to_string(), } } }
229-234: Simplified cancellation check implementationThe new implementation of
check_process_stoppedis cleaner and more straightforward, returning aCancellableResultinstead of anOperationResult. This aligns well with the PR objective of refactoring how stopping is handled across the codebase.Consider adding documentation to explain the function's purpose and behavior, especially since this is a public API.
+/// Checks if a process is marked as stopped +/// +/// Returns `Ok(())` if the process should continue, or `Err(CancelledError)` if it should stop pub fn check_process_stopped(stopped: &AtomicBool) -> CancellableResult<()> { if stopped.load(Ordering::Relaxed) { return Err(CancelledError); } Ok(()) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (26)
lib/collection/src/operations/types.rs(2 hunks)lib/segment/benches/hnsw_build_asymptotic.rs(4 hunks)lib/segment/benches/hnsw_search_graph.rs(4 hunks)lib/segment/benches/scorer_mmap.rs(2 hunks)lib/segment/benches/vector_search.rs(2 hunks)lib/segment/src/common/operation_error.rs(1 hunks)lib/segment/src/fixtures/index_fixtures.rs(1 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs(3 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs(1 hunks)lib/segment/src/index/hnsw_index/gpu/mod.rs(3 hunks)lib/segment/src/index/hnsw_index/graph_layers.rs(11 hunks)lib/segment/src/index/hnsw_index/graph_layers_builder.rs(6 hunks)lib/segment/src/index/hnsw_index/hnsw.rs(5 hunks)lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs(3 hunks)lib/segment/src/index/plain_vector_index.rs(3 hunks)lib/segment/src/index/sparse_index/sparse_vector_index.rs(3 hunks)lib/segment/src/vector_storage/async_raw_scorer.rs(6 hunks)lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs(6 hunks)lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs(8 hunks)lib/segment/src/vector_storage/quantized/quantized_vectors.rs(0 hunks)lib/segment/src/vector_storage/raw_scorer.rs(10 hunks)lib/segment/src/vector_storage/tests/async_raw_scorer.rs(1 hunks)lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs(0 hunks)lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs(7 hunks)lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs(5 hunks)lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs(3 hunks)
💤 Files with no reviewable changes (2)
- lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
- lib/segment/src/vector_storage/quantized/quantized_vectors.rs
🚧 Files skipped from review as they are similar to previous changes (13)
- lib/segment/benches/scorer_mmap.rs
- lib/segment/src/fixtures/index_fixtures.rs
- lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
- lib/segment/src/vector_storage/tests/async_raw_scorer.rs
- lib/collection/src/operations/types.rs
- lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
- lib/segment/src/index/hnsw_index/hnsw.rs
- lib/segment/src/index/sparse_index/sparse_vector_index.rs
- lib/segment/benches/vector_search.rs
- lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs
- lib/segment/src/vector_storage/async_raw_scorer.rs
- lib/segment/src/index/hnsw_index/graph_layers_builder.rs
- lib/segment/src/index/hnsw_index/graph_layers.rs
🧰 Additional context used
🧬 Code Definitions (7)
lib/segment/src/index/plain_vector_index.rs (1)
lib/segment/src/vector_storage/raw_scorer.rs (1)
new_raw_scorer(117-207)
lib/segment/benches/hnsw_search_graph.rs (1)
lib/segment/benches/fixture.rs (1)
make_cached_graph(15-76)
lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs (1)
lib/segment/src/index/sparse_index/sparse_vector_index.rs (1)
vector_storage(69-71)
lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (1)
lib/segment/src/vector_storage/raw_scorer.rs (1)
new_raw_scorer_for_test(258-269)
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (2)
lib/segment/src/index/sparse_index/sparse_vector_index.rs (1)
vector_storage(69-71)lib/segment/src/vector_storage/raw_scorer.rs (1)
new_raw_scorer_for_test(258-269)
lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs (2)
lib/segment/src/index/sparse_index/sparse_vector_index.rs (1)
vector_storage(69-71)lib/common/common/src/fixed_length_priority_queue.rs (1)
top(80-82)
lib/segment/benches/hnsw_build_asymptotic.rs (1)
lib/segment/benches/fixture.rs (1)
make_cached_graph(15-76)
🪛 GitHub Actions: Storage compatibility tests
lib/segment/src/vector_storage/raw_scorer.rs
[error] 234-234: cannot find value is_stopped in this scope
[error] 355-355: cannot find value is_stopped in this scope
[error] 462-462: cannot find value is_stopped in this scope
[error] 569-569: cannot find value is_stopped in this scope
[error] 693-693: cannot find value is_stopped in this scope
[error] 802-802: cannot find value is_stopped in this scope
[error] 226-226: this function takes 3 arguments but 4 arguments were supplied
[error] 347-347: this function takes 3 arguments but 4 arguments were supplied
[error] 561-561: this function takes 3 arguments but 4 arguments were supplied
[error] 685-685: this function takes 3 arguments but 4 arguments were supplied
[error] 794-794: this function takes 3 arguments but 4 arguments were supplied
🪛 GitHub Actions: Rust tests
lib/segment/src/vector_storage/raw_scorer.rs
[error] 234-234: cannot find value is_stopped in this scope
[error] 355-355: cannot find value is_stopped in this scope
[error] 462-462: cannot find value is_stopped in this scope
[error] 569-569: cannot find value is_stopped in this scope
[error] 693-693: cannot find value is_stopped in this scope
[error] 802-802: cannot find value is_stopped in this scope
[error] 911-911: cannot find value is_stopped in this scope
[error] 226-226: this function takes 3 arguments but 4 arguments were supplied
[error] 347-347: this function takes 3 arguments but 4 arguments were supplied
[error] 454-454: this function takes 3 arguments but 4 arguments were supplied
[error] 561-561: this function takes 3 arguments but 4 arguments were supplied
[error] 685-685: this function takes 3 arguments but 4 arguments were supplied
[error] 794-794: this function takes 3 arguments but 4 arguments were supplied
[error] 903-903: this function takes 3 arguments but 4 arguments were supplied
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-consistency
- GitHub Check: test-consensus-compose
- GitHub Check: test
🔇 Additional comments (44)
lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs (1)
723-723: Signature change aligns with removal of is_stopped from RawScorerThe method call to
raw_scorerhas been updated to remove theis_stoppedparameter, reducing the argument count from five to four. This change aligns with the PR objective of removing theis_stoppedflag fromRawScorer.lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs (3)
482-482: Added DEFAULT_STOPPED import to support updated method signaturesAdding the
DEFAULT_STOPPEDconstant to the imports enables the explicit passing of the stop condition to search functions, supporting the PR objective of removingis_stoppedfromRawScorer.
761-762: Updated search_on_level call to pass DEFAULT_STOPPED explicitlyThe
search_on_levelmethod now receives an explicit stop condition parameter and returns aResultthat needs to be unwrapped. This change aligns with the PR's approach to make cancellation checks more explicit.
988-990: Updated search_on_level call in test_gpu_heuristic to pass DEFAULT_STOPPED explicitlyConsistent with the other changes, the
search_on_levelmethod call in the heuristic test now receives the explicit stop condition parameter and handles the returnedResult.lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (4)
23-23: Added DEFAULT_STOPPED import to support updated scoring method signaturesAdding the
DEFAULT_STOPPEDconstant to the imports enables the explicit passing of the stop condition to scoring methods likepeek_top_iterandpeek_top_all.
130-132: Updated peek_top_iter call to use explicit stop condition parameterThe
peek_top_itermethod now accepts an explicit stop condition parameter (&DEFAULT_STOPPED) and returns aResultthat requires unwrapping, aligning with the PR's approach of making cancellation checks more explicit.
153-155: Updated second peek_top_iter call with explicit stop conditionConsistently with the previous call, this instance of
peek_top_iteralso passes the explicit stop condition parameter and handles the returnedResult.
175-175: Updated peek_top_all call with explicit stop conditionThe
peek_top_allmethod now also accepts the explicit stop condition parameter, consistent with the overall approach to make cancellation checks explicit throughout the codebase.lib/segment/src/index/hnsw_index/gpu/mod.rs (3)
116-116: Added DEFAULT_STOPPED import to support updated search method signaturesAdding the
DEFAULT_STOPPEDconstant to the imports enables the explicit passing of the stop condition to the search functions in the test module.
242-244: Updated GPU search call to include explicit stop conditionThe search method called on the GPU graph now accepts an explicit stop condition parameter and returns a
Resultthat needs to be unwrapped, aligning with the PR's objective of making cancellation explicit.
253-255: Updated CPU search call to include explicit stop conditionThe search method called on the CPU reference graph now also accepts an explicit stop condition parameter, maintaining consistency between GPU and CPU implementations.
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (3)
18-20: Import adds DEFAULT_STOPPED constantThe import statement now includes the
DEFAULT_STOPPEDconstant from the vector_storage module, which will be used as a parameter in scorer methods.
88-90: Scorer method now includes stop condition parameterThe
peek_top_itermethod call has been updated to include the&DEFAULT_STOPPEDparameter, aligning with the PR's objective of passing the stop condition explicitly rather than having it internally managed by RawScorer.
184-186: Consistent addition of stop condition parameterSimilar to the previous update, this
peek_top_itercall has been correctly modified to include the&DEFAULT_STOPPEDparameter, ensuring consistency across the codebase with the new function signature.lib/segment/src/index/plain_vector_index.rs (3)
21-21: Updated import to use new_raw_scorerImport has been changed to use
new_raw_scorerinstead of the oldnew_stoppable_raw_scorerfunction, matching the PR's objective of removing internal stop condition handling from RawScorer.
112-124: Improved function call and error handlingThis change includes:
- Replacing
new_stoppable_raw_scorerwithnew_raw_scorer- Changing from
maptoand_thenfor better error propagation- Adding
&is_stoppedparameter topeek_top_iterThis properly implements the new approach of passing stop condition explicitly rather than having it internally in RawScorer.
138-145: Consistent updates for unfiltered search pathSimilar to the filtered search path, the unfiltered search path has been updated to use
new_raw_scorerand pass&is_stoppedtopeek_top_all. The error handling has also been improved usingand_then.lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs (4)
16-16: Added import for DEFAULT_STOPPED constantAdded import for the
DEFAULT_STOPPEDconstant that will be used in search function calls.
32-40: Updated search_entry call with stop condition parameterThe
search_entrymethod call has been updated to include&DEFAULT_STOPPEDas a parameter, and now explicitly handles potential errors with.unwrap(). This is consistent with the PR's objective of passing stop conditions explicitly.
42-50: Updated search_on_level call with stop condition parameterSimilar to the previous update, the
search_on_levelmethod call has been modified to include&DEFAULT_STOPPEDand now handles errors with.unwrap(). This maintains consistency with the new approach for handling stop conditions.
91-93: Updated graph_layers.search call with stop condition parameterThe
searchmethod call ongraph_layershas been updated to include&DEFAULT_STOPPEDand now handles errors with.unwrap(). This ensures that the test properly uses the new function signature.lib/segment/benches/hnsw_search_graph.rs (5)
13-13: Added import for DEFAULT_STOPPED constantAdded import for the
DEFAULT_STOPPEDconstant from the vector_storage module for use in search function calls.
23-23: Added fixture moduleIntroduced a new
fixturemodule to simplify benchmark setup and improve code organization.
32-33: Simplified benchmark setupReplaced complex vector generation and HNSW index building with a call to
fixture::make_cached_graph. This refactoring improves code readability and maintainability by abstracting the setup logic into a reusable function.
43-47: Updated search call with stop condition parameterThe
graph_layers.searchmethod call has been updated to include&DEFAULT_STOPPEDand now handles errors with.unwrap(). The code is now properly formatted with improved readability.
60-64: Consistently updated compressed benchmark search callSimilar to the uncompressed benchmark, the compressed benchmark's
graph_layers.searchcall has been updated to include&DEFAULT_STOPPEDand handle errors with.unwrap(). The code formatting has also been improved.lib/segment/benches/hnsw_build_asymptotic.rs (10)
4-5: Import of LazyCell improves setup efficiencyThe addition of LazyCell from std::cell is a good optimization that allows lazy initialization of test setups, improving resource allocation.
13-13: Standard import of DEFAULT_STOPPED for explicit stopping conditionImporting DEFAULT_STOPPED aligns with the PR objective to remove is_stopped from RawScorer and pass it explicitly to search functions.
22-22: Good modularization with fixture moduleMoving the fixture creation code to a separate module improves code organization and reusability.
29-33: Improved resource management with LazyCell for test setupUsing LazyCell for the 5k vector setup ensures resources are only allocated when needed, which is especially important for benchmarks.
41-44: Explicit passing of stopping conditionThe addition of
&DEFAULT_STOPPEDas a parameter to the search method aligns with the PR objective to make stopping conditions more explicit.
49-53: Good constant definition for vector countExplicitly defining NUM_VECTORS as a constant improves code readability and maintainability.
55-56: Clear benchmark naming conventionRenaming from "build-n-search-hnsw-10x" to "build-n-search-hnsw-1M" improves clarity by clearly indicating the dataset size.
62-64: Consistent usage of explicit stopping conditionConsistent passing of
&DEFAULT_STOPPEDto search methods throughout the codebase ensures uniform error handling.
77-77: Improved vector generation using NUM_VECTORSUsing NUM_VECTORS in the random range for generating points to score aligns the test with the actual size of the dataset.
150-155: LazyCell for efficient vector generationUsing LazyCell for generating vectors ensures they are only created when needed, improving memory efficiency.
lib/segment/src/vector_storage/raw_scorer.rs (8)
15-17: Enhanced error handling with CancellableResultGood improvement to imports by adding CancellableResult and related error handling utilities, which supports the new explicit cancellation approach.
87-92: Explicit cancellation parameter in peek_top_iterModifying the function signature to accept an explicit
is_stoppedparameter and return aCancellableResultimproves error handling and makes cancellation logic more explicit.
94-98: Explicit cancellation parameter in peek_top_allSame improvement as with peek_top_iter - adding explicit cancellation parameter and returning CancellableResult improves code clarity and error propagation.
117-118: Simplified raw scorer creation functionRenaming from
new_stoppable_raw_scorertonew_raw_scorerand removing theis_stoppedparameter reflects the design change to handle cancellation more explicitly.
912-913: Improved cancellation check in peek_top_iter implementationThe implementation now properly accepts the is_stopped parameter and returns a CancellableResult, aligning with the interface changes.
926-926: Consider optimizing the stopping condition checkLoading the atomic bool on every iteration could impact performance. Consider only checking the flag periodically (e.g., every 1000th iteration) as suggested in a previous review comment, which showed performance improvements.
- for point_id in &mut *points { - check_process_stopped(is_stopped)?; + for (i, point_id) in (&mut *points).enumerate() { + if i % 1000 == 0 { + check_process_stopped(is_stopped)?; + } if !self.check_vector(point_id) { continue; } chunk[chunk_size] = point_id; chunk_size += 1; if chunk_size == VECTOR_READ_BATCH_SIZE { break; } } + check_process_stopped(is_stopped)?;
952-952: Good error handling with CancellableResultWrapping the return value in
Ok()correctly implements the CancellableResult return type, allowing proper error propagation.
955-961: Consistent implementation of peek_top_allThe implementation of peek_top_all correctly forwards the is_stopped parameter to peek_top_iter, ensuring consistent cancellation behavior throughout the code.
1e5d8e3 to
b5b898d
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/segment/benches/fixture.rs (1)
32-83: Efficient caching and parallel graph construction.
- The parallel approach with
rayon's iterators is a good optimization for large vector counts.- The single-threaded threshold usage prevents overhead for small inserts.
- Consider a small nitpick: line 75 displays "Buildng HNSW"; might fix the spelling to "Building HNSW".
- ProgressStyle::with_template("{percent:>3}% Buildng HNSW {wide_bar}").unwrap(), + ProgressStyle::with_template("{percent:>3}% Building HNSW {wide_bar}").unwrap(),
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (29)
Cargo.toml(1 hunks)lib/collection/src/operations/types.rs(2 hunks)lib/segment/Cargo.toml(1 hunks)lib/segment/benches/fixture.rs(1 hunks)lib/segment/benches/hnsw_build_asymptotic.rs(4 hunks)lib/segment/benches/hnsw_search_graph.rs(4 hunks)lib/segment/benches/scorer_mmap.rs(2 hunks)lib/segment/benches/vector_search.rs(2 hunks)lib/segment/src/common/operation_error.rs(1 hunks)lib/segment/src/fixtures/index_fixtures.rs(1 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs(3 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs(1 hunks)lib/segment/src/index/hnsw_index/gpu/mod.rs(3 hunks)lib/segment/src/index/hnsw_index/graph_layers.rs(11 hunks)lib/segment/src/index/hnsw_index/graph_layers_builder.rs(6 hunks)lib/segment/src/index/hnsw_index/hnsw.rs(5 hunks)lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs(3 hunks)lib/segment/src/index/plain_vector_index.rs(3 hunks)lib/segment/src/index/sparse_index/sparse_vector_index.rs(3 hunks)lib/segment/src/vector_storage/async_raw_scorer.rs(6 hunks)lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs(6 hunks)lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs(8 hunks)lib/segment/src/vector_storage/quantized/quantized_vectors.rs(0 hunks)lib/segment/src/vector_storage/raw_scorer.rs(10 hunks)lib/segment/src/vector_storage/tests/async_raw_scorer.rs(1 hunks)lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs(0 hunks)lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs(7 hunks)lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs(5 hunks)lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs(3 hunks)
💤 Files with no reviewable changes (2)
- lib/segment/src/vector_storage/quantized/quantized_vectors.rs
- lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
🚧 Files skipped from review as they are similar to previous changes (20)
- lib/segment/benches/vector_search.rs
- lib/segment/src/vector_storage/tests/async_raw_scorer.rs
- lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
- lib/segment/src/index/sparse_index/sparse_vector_index.rs
- lib/segment/benches/scorer_mmap.rs
- lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
- lib/segment/src/fixtures/index_fixtures.rs
- lib/segment/src/index/hnsw_index/gpu/mod.rs
- lib/collection/src/operations/types.rs
- lib/segment/src/index/plain_vector_index.rs
- lib/segment/src/common/operation_error.rs
- lib/segment/src/index/hnsw_index/hnsw.rs
- lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs
- lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
- lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs
- lib/segment/benches/hnsw_build_asymptotic.rs
- lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs
- lib/segment/src/index/hnsw_index/graph_layers_builder.rs
- lib/segment/src/vector_storage/async_raw_scorer.rs
- lib/segment/src/vector_storage/raw_scorer.rs
🧰 Additional context used
🧬 Code Definitions (1)
lib/segment/benches/hnsw_search_graph.rs (1)
lib/segment/benches/fixture.rs (1)
make_cached_graph(22-86)
🔇 Additional comments (32)
lib/segment/Cargo.toml (1)
32-32: LGTM: Adding humantime dependency for testingThe addition of the
humantimedependency to the dev-dependencies section is correct and follows the project's pattern of using workspace-defined versions.Cargo.toml (1)
180-180: LGTM: Adding humantime to workspace dependenciesThe addition of the
humantimecrate (version 2.1.0) to the workspace dependencies is appropriate. This crate is useful for parsing and formatting human-readable time durations, which aligns with the database system's needs for reporting timing information.lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (2)
17-19: LGTM: Updated imports to include DEFAULT_STOPPEDThe import statement reorganization is clean and properly includes the new
DEFAULT_STOPPEDconstant.
62-64: LGTM: Updated scorer method calls with DEFAULT_STOPPED parameterThese changes correctly implement the architectural change described in the PR objectives - instead of using an internal
is_stoppedflag in theRawScorer, the flag is now passed explicitly to the search methods. The addition of.unwrap()handles potential errors that might be returned from these methods now.Also applies to: 85-87, 107-107, 168-170, 216-218, 236-238
lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs (2)
16-16: LGTM: Added import for DEFAULT_STOPPEDThe import of
DEFAULT_STOPPEDfrom the vector_storage module is appropriate for the changes made in this file.
32-40: LGTM: Updated search method calls with DEFAULT_STOPPED parameterThe search methods (
search_entry,search_on_level, andsearch) have been consistently updated to include the&DEFAULT_STOPPEDparameter and add error handling viaunwrap(). This maintains consistency with the architectural change of passing cancellation flags explicitly rather than storing them in theRawScorer.Also applies to: 42-50, 91-93
lib/segment/benches/hnsw_search_graph.rs (6)
10-10: No issues with the updated imports.
These newly imported items (FakeFilterContext,random_vector) appear consistent with the subsequent usage.
13-13: New import forDEFAULT_STOPPED.
Including the default atomic cancellation token aligns with the broader refactoring to pass an external stop flag.
23-23: Good modularization by introducingmod fixture.
This separation helps keep benchmark setup logic organized in its own module.
32-33: Streamlined setup withmake_cached_graph.
Replacing manual graph construction with the helper function fromfixturegreatly improves readability and maintainability.
43-47: Explicitly passing a stop token enhances cancellation handling.
Adding&DEFAULT_STOPPEDmakes it consistent with the new design requiring an external atomic stop flag.
60-64: Maintaining consistent approach to cancellation checks.
Reusing&DEFAULT_STOPPEDin the second benchmark phase is coherent. No issues found.lib/segment/benches/fixture.rs (4)
1-15: Overall structure and imports look fine.
The usage ofrayon,rand, andindicatifis appropriate for parallel processing with progress reporting. No immediate concerns regarding thread safety or concurrency, given thatFakeFilterContextseems trivial.
16-21: Clear doc comments.
The added documentation on caching and deterministic generation of vectors is helpful, ensuring others can navigate and reuse this code.
22-31: Naming alignment and generic bounds.
The function namemake_cached_graphis self-explanatory; the generic type parameterMETRICis well-bounded byMetric<f32> + Sync + Send.
88-92: Helper method effectively reads last-updated file data.
updated_agois a neat utility for user feedback; no immediate issues. Just ensure any errors get logged for debugging.lib/segment/src/index/hnsw_index/graph_layers.rs (16)
4-4: ImportingAtomicBool.
Matches the new approach whereis_stoppedis passed in from outside to support cancellation.
15-17: IntroducingCancellableResultandcheck_process_stopped.
These changes unify cancellation checks across methods. This is a clear improvement over embedding the flag directly.
67-68: Extending_search_on_levelsignature withis_stopped.
This extra parameter ensures early termination in the greedy search routine if the process is canceled.
73-74: Immediate cancellation check within the loop.
This frequent check is appropriate for potentially long-running loops, but watch out for the overhead of atomic loads in very large loops.
103-103: Parameter addition insearch_on_level.
No concerns; the function properly returns aCancellableResultto propagate potential cancellation.
127-127: Addedis_stoppedtosearch_entry.
Aligns with the new design, letting the user or calling scope handle cancellation.
135-135: Cancellation check insearch_entry.
The location is correct—cancellation is polled before each down-level search iteration.
252-253: Extendedsearchsignature.
Adds theis_stoppedparameter to the top-level method, making the entire search pipeline cancellable.
255-255: Early return with an empty result.
If there is no valid entry point, returning an empty vector is safer than erroring out—this prevents confusion in upper layers.
263-264: Passingis_stoppedfurther down.
Ensures cancellation is honored before the final zero-level step begins.
265-265: New line for chaining method call.
Minor addition to support the next call. Looks good.
266-271: Propagation ofis_stoppedparameter intosearch_on_level.
Keeps the call chain consistent, preventing partial subsearch from ignoring the stop request.
272-272: Proper final step.
Returning the filtered top results in a cancellable manner. Implementation remains consistent.
394-394: Import ofDEFAULT_STOPPED.
Used in tests to provide a default non-canceled reference. Straightforward usage.
409-410:searchcall updated in the test helper.
Helps validate the new cancellation-aware search. No issues with usage.
452-462:search_on_levelcalls updated in tests.
The updated signature ensures these tests confirm the presence of the external cancellation token.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (8)
lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs (2)
32-50: Consider usingexpect(...)instead ofunwrap()
In tests, replacing.unwrap()with.expect("search_in_builder failed")can provide more clarity if an error occurs, facilitating faster debugging.- .unwrap(); + .expect("search_entry call failed in test");
91-93: Useexpect(...)in place ofunwrap()in test code
Similar rationale: clearer error messages help pinpoint failures quickly.- .unwrap() + .expect("search call failed in test");lib/segment/benches/hnsw_search_graph.rs (2)
43-47: Useexpect(...)for better error context
Consider providing a message in case of search failure to aid in diagnosing issues during benchmarking.- .unwrap(), + .expect("search in uncompressed bench failed"),
60-64: Replace.unwrap()with.expect(...)
Improves clarity if the search operation encounters unexpected behavior.- .unwrap(), + .expect("search in compressed bench failed"),lib/segment/src/common/operation_error.rs (1)
229-230: Rust naming convention
Consider using a lowercase variable name for the parameter withinfn from(CancelledError: CancelledError)to maintain standard Rust style.lib/segment/benches/fixture.rs (3)
16-21: Helpful doc comments on caching
The explanation of caching mechanics is clear. Consider mentioning how the environment variable-based path might behave if these variables are not set.
75-75: Typo in progress style template
"Buildng" should be corrected to "Building."- ProgressStyle::with_template("{percent:>3}% Buildng HNSW {wide_bar}").unwrap(), + ProgressStyle::with_template("{percent:>3}% Building HNSW {wide_bar}").unwrap(),
88-92: Graceful fallback for file metadata
If the file metadata is unavailable or restricted,updated_agowill fail. Consider a more robust fallback or user-friendly error message.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (29)
Cargo.toml(1 hunks)lib/collection/src/operations/types.rs(2 hunks)lib/segment/Cargo.toml(1 hunks)lib/segment/benches/fixture.rs(1 hunks)lib/segment/benches/hnsw_build_asymptotic.rs(4 hunks)lib/segment/benches/hnsw_search_graph.rs(4 hunks)lib/segment/benches/scorer_mmap.rs(2 hunks)lib/segment/benches/vector_search.rs(2 hunks)lib/segment/src/common/operation_error.rs(1 hunks)lib/segment/src/fixtures/index_fixtures.rs(1 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs(3 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs(1 hunks)lib/segment/src/index/hnsw_index/gpu/mod.rs(3 hunks)lib/segment/src/index/hnsw_index/graph_layers.rs(11 hunks)lib/segment/src/index/hnsw_index/graph_layers_builder.rs(6 hunks)lib/segment/src/index/hnsw_index/hnsw.rs(5 hunks)lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs(3 hunks)lib/segment/src/index/plain_vector_index.rs(3 hunks)lib/segment/src/index/sparse_index/sparse_vector_index.rs(3 hunks)lib/segment/src/vector_storage/async_raw_scorer.rs(6 hunks)lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs(6 hunks)lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs(8 hunks)lib/segment/src/vector_storage/quantized/quantized_vectors.rs(0 hunks)lib/segment/src/vector_storage/raw_scorer.rs(10 hunks)lib/segment/src/vector_storage/tests/async_raw_scorer.rs(1 hunks)lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs(0 hunks)lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs(7 hunks)lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs(5 hunks)lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs(3 hunks)
💤 Files with no reviewable changes (2)
- lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
- lib/segment/src/vector_storage/quantized/quantized_vectors.rs
🚧 Files skipped from review as they are similar to previous changes (21)
- Cargo.toml
- lib/segment/src/vector_storage/tests/async_raw_scorer.rs
- lib/collection/src/operations/types.rs
- lib/segment/benches/vector_search.rs
- lib/segment/Cargo.toml
- lib/segment/src/index/hnsw_index/gpu/mod.rs
- lib/segment/src/fixtures/index_fixtures.rs
- lib/segment/src/index/hnsw_index/hnsw.rs
- lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
- lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs
- lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
- lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
- lib/segment/src/index/sparse_index/sparse_vector_index.rs
- lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
- lib/segment/benches/scorer_mmap.rs
- lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs
- lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs
- lib/segment/src/index/plain_vector_index.rs
- lib/segment/src/vector_storage/async_raw_scorer.rs
- lib/segment/src/vector_storage/raw_scorer.rs
- lib/segment/src/index/hnsw_index/graph_layers_builder.rs
🧰 Additional context used
🧬 Code Definitions (3)
lib/segment/src/common/operation_error.rs (1)
lib/collection/src/operations/types.rs (16)
from(98-105)from(208-228)from(277-298)from(689-691)from(709-711)from(1194-1199)from(1203-1208)from(1212-1262)from(1266-1268)from(1272-1277)from(1281-1286)from(1290-1295)from(1299-1304)from(1308-1313)from(1317-1322)from(1326-1331)
lib/segment/benches/fixture.rs (1)
lib/segment/src/index/hnsw_index/hnsw.rs (1)
new(96-106)
lib/segment/benches/hnsw_build_asymptotic.rs (1)
lib/segment/benches/fixture.rs (1)
make_cached_graph(22-86)
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consensus-compose
🔇 Additional comments (36)
lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs (1)
16-16: Use ofDEFAULT_STOPPEDimport is consistent with new cancellation approach
It's good to see the introduction of a dedicated atomic bool for handling cancellation checks.lib/segment/src/common/operation_error.rs (3)
232-233: CancellableResult introduction is clear
DefiningCancellableResultclarifies which operations may fail due to cancellation.
234-240: Consistent error conversion
ThisFrom<CancelledError>implementation cleanly integrates cancellations into the existingOperationErrortype.
242-247:Details
❓ Verification inconclusive
Check memory ordering
Relaxedis likely sufficient for a cancellation check, but verify that it aligns with concurrency requirements. If stronger guarantees are needed, consider Acquire/SeqCst.
🏁 Script executed:
#!/bin/bash # Searching for references to check_process_stopped usage to confirm concurrency assumptions rg -A 5 "check_process_stopped"Length of output: 33815
Memory Ordering in Cancellation Flag Check
The current implementation of
check_process_stoppedconsistently usesRelaxedordering throughout the codebase. This is acceptable if the sole purpose of the cancellation flag is to signal termination without enforcing happens-before relationships with other shared data. However, please verify that no additional synchronization is expected when cancellation is triggered. If later requirements demand that all preceding writes be visible after cancellation, consider switching the load (and the corresponding flag store) to useAcquireor evenSeqCstordering.
- Verify that the cancellation flag isn’t relied upon for establishing memory visibility beyond a simple termination signal.
- Confirm that the flag’s writer uses an appropriate memory ordering if stronger guarantees become necessary.
lib/segment/benches/hnsw_build_asymptotic.rs (15)
4-4: Improved resource management with LazyCell.Using
LazyCellis a good approach for lazy initialization of potentially expensive resources in benchmarks.
13-13: Added DEFAULT_STOPPED parameter to align with cancellation changes.This import supports the updated search method signatures that now require an explicit cancellation flag.
22-22: Good modularization using fixture module.Extracting common functionality to a separate module improves code organization and reusability.
29-32: Improved benchmarking setup with cached graph initialization.Using LazyCell for lazy initialization of the graph ensures the setup cost doesn't affect benchmark measurements and the cached graph is only created when needed.
34-35: Renamed benchmark to clearly indicate vector size.Changing from "build-n-search-hnsw" to "build-n-search-hnsw-5k" makes the benchmark purpose clearer.
41-44: Added cancellation flag to search method call.The search method now includes
&DEFAULT_STOPPEDparameter and handles the Result with unwrap. This aligns with the changes to remove internalis_stoppedhandling from the RawScorer.
47-48: Added explicit resource cleanup.Using
drop(setup_5k)explicitly releases resources when they're no longer needed, which is important for benchmarks to avoid memory usage affecting subsequent tests.
49-53: Improved constant naming and vector size definition.Using a named constant
NUM_VECTORSwith a clear value (1M) improves code readability and maintenance.
55-56: Renamed benchmark to accurately reflect vector count.Changing from "build-n-search-hnsw-10x" to "build-n-search-hnsw-1M" clearly communicates the benchmark's purpose.
62-65: Updated search call with cancellation flag and error handling.The search method now includes
&DEFAULT_STOPPEDparameter and unwraps the result, consistent with the changes across the codebase.
68-69: Renamed benchmark for clarity and consistency.Updating from "build-n-search-hnsw-10x-score-point" to "build-n-search-hnsw-1M-score-point" provides a clearer indication of the benchmark's purpose.
77-77: Fixed random range to use correct vector count.Using
NUM_VECTORSinstead of a hardcoded value ensures consistency with the actual number of vectors in the benchmark.
83-83: Added resource cleanup for 1M benchmark setup.Explicitly dropping the setup ensures proper resource management between benchmarks.
150-155: Implemented LazyCell pattern for basic scoring benchmark.This change provides consistent resource management across all benchmarks in the file.
171-176: Consistent LazyCell usage across benchmarks.Using the same pattern for all benchmarks ensures consistent behavior and resource management.
lib/segment/src/index/hnsw_index/graph_layers.rs (17)
4-4: Added AtomicBool for cancellation support.This import enables thread-safe cancellation checks across the codebase.
15-17: Enhanced error handling with cancellation support.Adding
CancellableResultandcheck_process_stoppedenables proper propagation of cancellation signals throughout the code.
61-68: Updated method signature to support explicit cancellation.The
_search_on_levelmethod now accepts anis_stoppedparameter and returns aCancellableResult, enabling proper cancellation handling.
73-74: Added cancellation check during search operation.The
check_process_stoppedcall allows the operation to be cancelled at regular intervals during execution.
92-93: Updated return value to support cancellation pattern.Returning
Ok(())aligns with the newCancellableResultreturn type.
96-103: Updated search_on_level signature with cancellation support.Adding the
is_stoppedparameter and updating the return type toCancellableResultmaintains consistency across the codebase.
108-115: Updated method call and return value to handle cancellation.The call to
_search_on_levelnow passes theis_stoppedparameter and handles the result with proper error propagation.
120-127: Updated search_entry signature with cancellation support.Adding the
is_stoppedparameter and updating the return type toCancellableResultmaintains consistency across the codebase.
135-136: Added cancellation check within loop.Checking for cancellation at the start of each iteration allows for responsive cancellation handling.
157-157: Updated return value to support cancellation.Wrapping the result in
Ok()aligns with the new return type.
246-253: Updated search method signature with cancellation support.Adding the
is_stoppedparameter and updating the return type toCancellableResultcompletes the consistent pattern across all search-related methods.
255-256: Updated return value for early exit case.Using
Ok(Vec::default())instead of returning directly maintains the cancellation result pattern.
258-264: Added cancellation parameter to search_entry call.Passing the
is_stoppedparameter and handling the result with?ensures proper cancellation propagation.
265-272: Updated search_on_level call with cancellation handling.The updated call passes the cancellation flag and properly handles the result to propagate cancellation.
394-394: Added DEFAULT_STOPPED import for tests.This import ensures tests can properly use the cancellation mechanism.
408-411: Updated test's search method call with cancellation parameter.Tests now pass
&DEFAULT_STOPPEDto match the updated method signature and unwrap the result.
451-462: Updated test's search_on_level call to include cancellation parameter.Updating the test ensures it remains compatible with the new method signature.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/segment/src/index/hnsw_index/graph_layers.rs (1)
162-193: Consider updating test-only method for consistency.The
search_entry_on_levelmethod is only used in tests but hasn't been updated to follow the new pattern with theis_stoppedparameter andCancellableResultreturn type.For consistency with the other search methods, consider updating this method:
fn search_entry_on_level( &self, entry_point: PointOffsetType, level: usize, points_scorer: &mut FilteredScorer, + is_stopped: &AtomicBool, -) -> ScoredPointOffset { +) -> CancellableResult<ScoredPointOffset> { let limit = self.get_m(level); let mut links: Vec<PointOffsetType> = Vec::with_capacity(2 * self.get_m(0)); let mut current_point = ScoredPointOffset { idx: entry_point, score: points_scorer.score_point(entry_point), }; let mut changed = true; while changed { + check_process_stopped(is_stopped)?; changed = false; links.clear(); self.links_map(current_point.idx, level, |link| { links.push(link); }); let scores = points_scorer.score_points(&mut links, limit); scores.iter().copied().for_each(|score_point| { if score_point.score > current_point.score { changed = true; current_point = score_point; } }); } - current_point + Ok(current_point) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (24)
lib/segment/benches/hnsw_build_asymptotic.rs(4 hunks)lib/segment/benches/hnsw_search_graph.rs(4 hunks)lib/segment/benches/scorer_mmap.rs(2 hunks)lib/segment/benches/vector_search.rs(2 hunks)lib/segment/src/fixtures/index_fixtures.rs(1 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs(3 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs(1 hunks)lib/segment/src/index/hnsw_index/gpu/mod.rs(3 hunks)lib/segment/src/index/hnsw_index/graph_layers.rs(11 hunks)lib/segment/src/index/hnsw_index/graph_layers_builder.rs(6 hunks)lib/segment/src/index/hnsw_index/hnsw.rs(5 hunks)lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs(3 hunks)lib/segment/src/index/plain_vector_index.rs(3 hunks)lib/segment/src/index/sparse_index/sparse_vector_index.rs(3 hunks)lib/segment/src/vector_storage/async_raw_scorer.rs(6 hunks)lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs(6 hunks)lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs(10 hunks)lib/segment/src/vector_storage/quantized/quantized_vectors.rs(0 hunks)lib/segment/src/vector_storage/raw_scorer.rs(10 hunks)lib/segment/src/vector_storage/tests/async_raw_scorer.rs(1 hunks)lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs(0 hunks)lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs(7 hunks)lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs(5 hunks)lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs(3 hunks)
💤 Files with no reviewable changes (2)
- lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs
- lib/segment/src/vector_storage/quantized/quantized_vectors.rs
🚧 Files skipped from review as they are similar to previous changes (19)
- lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs
- lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs
- lib/segment/src/index/plain_vector_index.rs
- lib/segment/src/index/hnsw_index/gpu/mod.rs
- lib/segment/src/index/sparse_index/sparse_vector_index.rs
- lib/segment/src/vector_storage/tests/async_raw_scorer.rs
- lib/segment/benches/vector_search.rs
- lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
- lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/tests.rs
- lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs
- lib/segment/src/fixtures/index_fixtures.rs
- lib/segment/src/index/hnsw_index/tests/test_compact_graph_layer.rs
- lib/segment/src/index/hnsw_index/hnsw.rs
- lib/segment/benches/scorer_mmap.rs
- lib/segment/src/vector_storage/quantized/quantized_scorer_builder.rs
- lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs
- lib/segment/src/vector_storage/raw_scorer.rs
- lib/segment/src/vector_storage/async_raw_scorer.rs
- lib/segment/src/index/hnsw_index/graph_layers_builder.rs
🧰 Additional context used
🧬 Code Definitions (2)
lib/segment/benches/hnsw_build_asymptotic.rs (1)
lib/segment/benches/fixture.rs (1)
make_cached_graph(22-86)
lib/segment/src/index/hnsw_index/graph_layers.rs (1)
lib/segment/src/index/hnsw_index/graph_links.rs (1)
links(125-127)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: test (windows-latest)
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test-consensus
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (36)
lib/segment/benches/hnsw_search_graph.rs (5)
10-11: Good addition of DEFAULT_STOPPED importThe import for DEFAULT_STOPPED is consistent with the PR's objective to pass the stopped flag explicitly to search functions instead of maintaining it internally in RawScorer.
Also applies to: 13-13
23-23: Good modularization with the fixture moduleAdding the fixture module helps reduce code duplication and improves maintainability by extracting common functionality used across benchmarks.
32-33: Well-implemented code extractionReplacing the detailed graph generation code with a single call to
fixture::make_cached_graphsignificantly simplifies this file while preserving functionality. This improves maintainability and readability.
43-47: Correctly updated search function callThe search function call has been properly updated to include the new
&DEFAULT_STOPPEDparameter, aligning with the PR objective to pass the stopped flag explicitly rather than maintaining it internally.
60-64: Consistently updated search function callThis second search function call is also correctly updated with the
&DEFAULT_STOPPEDparameter, maintaining consistency throughout the codebase.lib/segment/benches/hnsw_build_asymptotic.rs (14)
4-4: Good use of LazyCell for resource managementUsing LazyCell for lazy initialization is a good improvement that allows benchmark setup to be created only when needed, improving memory usage efficiency.
13-13: Consistent use of DEFAULT_STOPPEDThe import of DEFAULT_STOPPED is consistent with the changes in other files and the PR's objective to pass the stopped flag explicitly to search functions.
22-22: Good reuse of fixture moduleReusing the fixture module across benchmark files promotes code reuse and ensures consistent behavior in all benchmark tests.
29-33: Improved benchmark setup with LazyCellThis implementation uses LazyCell to initialize the 5k vector benchmark setup only when needed, which is more efficient than eagerly creating resources that might not be used immediately.
34-34: Clearer benchmark namingRenaming the benchmark to "build-n-search-hnsw-5k" better reflects what's being tested and makes the benchmark results more meaningful.
41-43: Correctly updated search call with DEFAULT_STOPPEDThe search function call has been properly updated to include the new
&DEFAULT_STOPPEDparameter, consistent with the PR objectives.
47-47: Good resource management with explicit dropExplicitly dropping the LazyCell after use ensures resources are properly freed, which is especially important for large benchmark setups.
49-53: Well-structured initialization for 1M vectorsThe constant definition followed by LazyCell initialization for the 1M vector benchmark setup is well structured and makes the code's intent clear.
55-55: Descriptive benchmark naming conventionRenaming the benchmarks to clearly indicate they're using 1M vectors improves clarity and makes benchmark results more interpretable.
Also applies to: 68-68
62-64: Consistently updated search function callThis search function call is correctly updated with the
&DEFAULT_STOPPEDparameter, maintaining consistency with other changes in the codebase.
77-77: Improved random range definitionUsing NUM_VECTORS as the upper bound for the random range is more consistent and maintainable than a hardcoded value, as it will automatically adjust if the vector count changes.
83-83: Good cleanup with explicit dropExplicitly dropping the setup after use ensures proper resource cleanup, which is important for benchmarks using large amounts of memory.
150-155: Consistent use of LazyCell patternThe LazyCell pattern is consistently applied throughout the file for all benchmark setups that require significant memory. This promotes a consistent approach to resource management.
Also applies to: 171-176
168-168: Thorough resource cleanupThe explicit drops of all LazyCell instances ensure thorough resource cleanup throughout the benchmark execution.
Also applies to: 189-189
lib/segment/src/index/hnsw_index/graph_layers.rs (17)
4-4: Appropriate addition of AtomicBool import.The addition of the
AtomicBoolimport supports the new cancellation mechanism being implemented, which is consistent with the PR's objective of explicitly passing the stop condition flag.
15-17: Good restructuring of error handling imports.The addition of cancellation-specific imports from
operation_erroraligns perfectly with the implementation of the cancellation mechanism. UsingCancellableResultandcheck_process_stoppedprovides a clean way to handle operation cancellation.
67-68: Well-implemented signature change for cancellation support.Adding the
is_stoppedparameter and updating the return type toCancellableResult<()>enables proper cancellation propagation in the search mechanism. This change aligns with the PR objective of explicitly passing the stop condition.
73-74: Effective placement of cancellation check.The cancellation check is strategically placed at the beginning of each iteration in the search loop. This ensures responsive cancellation without significantly impacting performance, as it's not in the innermost loop of the score calculation.
92-93: Proper return value update.Updating the return to explicitly use
Ok(())correctly implements the new return type and ensures the function conforms to the updated trait signature.
102-103: Consistent signature update for search_on_level method.The addition of the
is_stoppedparameter and return type change toCancellableResultmaintains consistency with the cancellation pattern being implemented throughout the search functionality.
108-116: Correct propagation of cancellation parameter.The method properly passes the
is_stoppedparameter to the lower-level search function and handles the result with the?operator, which will propagate any cancellation. The return value is also correctly wrapped inOk().
126-127: Consistent cancellation support in search_entry method.Adding the
is_stoppedparameter and updating the return type toCancellableResult<ScoredPointOffset>ensures cancellation capability is consistently implemented across all search-related methods.
135-136: Strategic placement of cancellation check.Placing the cancellation check at the beginning of each level iteration in
search_entryallows for timely cancellation between each search level without impacting the inner loop performance.
157-158: Proper return value update for search_entry.The function now correctly returns the result wrapped in
Ok()to conform to the updated return type.
252-253: Consistent implementation of cancellation support in search method.Adding the
is_stoppedparameter and updating the return type toCancellableResult<Vec<ScoredPointOffset>>completes the implementation of cancellation support throughout the search hierarchy.
255-256: Updated return statement for consistency.The empty result is now properly wrapped in
Ok(Vec::default())to match the updated return type. This ensures proper error handling when no entry point is found.
258-264: Consistent parameter passing in search method.The
is_stoppedparameter is correctly passed to thesearch_entrymethod call, ensuring cancellation capability is propagated down the call stack.
265-272: Proper result handling for cancellation.The
search_on_levelcall now includes theis_stoppedparameter, and both method calls use the?operator for propagating cancellation errors. The final result is correctly wrapped inOk().
394-394: Appropriate import for tests.The import of
DEFAULT_STOPPEDis necessary for the updated test code to work with the new method signatures.
408-411: Updated test code for new method signature.The test code correctly passes
&DEFAULT_STOPPEDto thesearchmethod and handles the result with.unwrap(), maintaining the test's functionality with the new API.
451-463: Test code correctly updated for new search_on_level signature.The test for
search_on_levelhas been properly updated to include the&DEFAULT_STOPPEDparameter and handle theCancellableResultreturn type with.unwrap().
| // and notify the user that they better use the default IO implementation. | ||
|
|
||
| pq.into_sorted_vec() | ||
| Ok(pq.into_sorted_vec()) |
There was a problem hiding this comment.
should we check flag one more time before returning? If operation was cancelled we might have partial result. Since we make it explicit, it might be worth maintaining the declared guarantees
6428f15 to
b23bc08
Compare
* Introduce CancelledError * Remove is_stopped from RawScorer
Depends on #6299 as it touches the same benchmark. Ignore first commit during the review.
This PR removes
is_stoppedflag fromRawScoreras suggested in #6245 (comment). Instead, it is passed to search functions explicitly.