Conversation
📝 WalkthroughWalkthroughThis set of changes adds a new boolean feature flag Possibly related PRs
Suggested reviewers
Note ⚡️ AI Code Reviews for VS Code, Cursor, WindsurfCodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback. Note ⚡️ Faster reviews with cachingCodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (13)
🔇 Additional comments (8)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
lib/segment/src/index/hnsw_index/hnsw.rs (1)
1402-1478: 💡 Verification agent🧩 Analysis chain
Consider tolerant vector comparison to reduce false-negative reuse rate
old_vector == new_vectorperforms an exact floating-point comparison.
Even when a vector has not semantically changed, different storage
back-ends (e.g. serialization, quantization-round-trip, SIMD alignment)
may re-emit the very same numbers with minor binary drift, causing
values that are numerically equal to fail an==test.
As soon as that happens the point is treated as “missing”, forcing the
healing path and potentially discarding a large portion of an otherwise
good graph.A cheap alternative is to fall back to an approximate check that is
already available in the code base (e.g. cosine/L2 similarity ≥
1-1e-6).- if old_vector == new_vector { + // Treat vectors as equal if the distance is below a tiny epsilon + if metric::cosine::distance(&old_vector, &new_vector) <= 1e-6 {This keeps the incremental-build fast path effective while still
catching truly modified vectors.
🏁 Script executed:
#!/bin/bash # Search for metric module and cosine/distance functions rg -n "mod metric" . rg -n "cosine" . rg -n "distance(" . # Find definition and usage of get_vector to inspect return type rg -n "fn get_vector" . rg -n "get_vector(" .Length of output: 32005
🏁 Script executed:
#!/bin/bash # List files in the metric module echo "=== Metric module files ===" ls lib/segment/src/spaces/metric # Search for any distance function definitions in metric module echo -e "\n=== Definitions of fn distance ===" rg -n "fn distance" lib/segment/src/spaces/metric # Inspect the top of mod.rs to see submodules echo -e "\n=== lib/segment/src/spaces/metric/mod.rs (first 200 lines) ===" sed -n '1,200p' lib/segment/src/spaces/metric/mod.rsLength of output: 657
🏁 Script executed:
#!/bin/bash # 1. List the contents of the spaces directory to locate the metric module echo "=== lib/segment/src/spaces directory ===" ls -R lib/segment/src/spaces # 2. Search for any definitions of `fn distance` across Rust files echo -e "\n=== Definitions of `fn distance` ===" rg -n 'fn distance' -g '*.rs' # 3. Search for invocations of `distance(` to see how it’s called echo -e "\n=== Usages of `distance(` ===" rg -n 'distance\s*\(' -g '*.rs'Length of output: 14578
🏁 Script executed:
#!/bin/bash # Show the contents of the metric module file to find how distance functions are exposed echo "=== lib/segment/src/spaces/metric.rs ===" sed -n '1,200p' lib/segment/src/spaces/metric.rs # Locate the Distance enum declaration echo -e "\n=== enum Distance in lib/segment/src/types.rs ===" rg -n "enum Distance" -g lib/segment/src/types.rs # Locate the impl block for Distance to see its method signature echo -e "\n=== impl Distance in lib/segment/src/types.rs ===" rg -n "impl Distance" -g lib/segment/src/types.rsLength of output: 1303
Use an approximate‐equality check instead of
==for floating-point vectorsExact
old_vector == new_vectorwill drop points whose values only differ by tiny fp noise. To avoid spuriously entering the healing path, replace it with an element-wise epsilon test. For example, in lib/segment/src/index/hnsw_index/hnsw.rs around line 1435:- if old_vector == new_vector { + // Treat vectors as equal if all components differ by at most EPS + const EPS: f32 = 1e-6; + if old_vector.len() == new_vector.len() + && old_vector + .iter() + .zip(new_vector.iter()) + .all(|(a, b)| (a - b).abs() <= EPS) { old_to_new[old_offset as usize] = Some(new_offset); valid_points += 1;This simple check avoids pulling in non-existent
metric::cosine::distancefunctions and keeps the fast path intact while only rejecting truly changed vectors.
🧹 Nitpick comments (2)
lib/segment/src/index/hnsw_index/hnsw.rs (1)
386-407: Possible data race between concurrent healing tasks
to_heal.into_par_iter().try_for_each(|old_offset| { old_index.migrate_point(old_offset, &graph_layers_builder) })
migrate_pointmutates shared state insidegraph_layers_builder
(ready_list, link containers, entry-points). Internally the builder
relies onRwLock/Mutexfor safety, so the code compiles, but:
- every healed point writes the global
ready_list; withpar_iter
there will be heavy cross-thread contention on the same lock,
significantly degrading throughput for large batches;- a thread may call
search_shortcuts_on_levelbefore another thread
marks a neighbour as “ready”, therefore exploring a slower “old-graph”
path and producing sub-optimal shortcuts.If practical, heal sequentially per level or switch to a two-phase
approach:
- build a local
Vec<(point_id, links)>in parallel;- merge results into the builder in a single threaded pass.
That removes lock contention and eliminates the ready-list timing race.
lib/segment/src/index/hnsw_index/graph_layers_builder.rs (1)
497-608: Heavy lock usage & per-call allocations may hurt healing performance
search_shortcuts_on_levelis invoked for every healed point and:
- Acquires
ready_list.read()on every neighbour inspection,
potentially thousands of times per call.- Allocates new
Vecs (old_offsets,scores_buffer,pending)
for each call, although their maximum length is bounded by2 * M.Under large update batches this becomes a hot path.
Recommended micro-optimisations:
- let mut old_offsets: Vec<PointOffsetType> = Vec::with_capacity(2 * limit); - let mut scores_buffer = Vec::with_capacity(limit); + // Re-use small vec stacks on the caller thread to avoid heap traffic + let mut old_offsets = smallvec::SmallVec::<[PointOffsetType; 32]>::new(); + let mut scores_buffer = smallvec::SmallVec::<[f32; 32]>::new(); - let mut pending = Vec::new(); + let mut pending = VecDeque::with_capacity(limit);and cache the
ready_listguard once per call:let ready = self.ready_list.read(); ... if ready[new_offset as usize] { ... }This removes a large chunk of lock acquisitions and allocations without
changing semantics.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
lib/common/common/src/fixed_length_priority_queue.rs(2 hunks)lib/common/common/src/flags.rs(3 hunks)lib/segment/src/index/hnsw_index/graph_layers.rs(1 hunks)lib/segment/src/index/hnsw_index/graph_layers_builder.rs(5 hunks)lib/segment/src/index/hnsw_index/hnsw.rs(11 hunks)lib/segment/src/index/hnsw_index/search_context.rs(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
lib/segment/src/index/hnsw_index/graph_layers.rs (1)
lib/segment/src/index/hnsw_index/search_context.rs (1)
new(16-21)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consensus-compose
- GitHub Check: integration-tests-consensus
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: test-consistency
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: integration-tests
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: storage-compat-test
- GitHub Check: lint
🔇 Additional comments (8)
lib/common/common/src/fixed_length_priority_queue.rs (2)
48-51: Good refactoring using the new is_full() method.The push method now leverages the newly added is_full() method instead of directly comparing the length. This improves code readability and maintainability.
94-97: Well-designed helper method for queue capacity check.The new is_full() method encapsulates the logic for checking if the queue has reached its fixed capacity. This is a clean extraction of a common operation that was previously embedded in the push method, improving code reusability and readability.
lib/common/common/src/flags.rs (4)
25-26: LGTM: New feature flag for HNSW healing.The new boolean flag is well-documented with a clear purpose statement.
35-35: LGTM: Appropriate default value for new feature flag.Setting the default value to false for the new hnsw_healing flag is appropriate for a new experimental feature.
54-54: LGTM: Updated destructuring pattern.Correctly added the new hnsw_healing flag to the destructuring pattern in init_feature_flags.
61-61: LGTM: Updated "all" flag handling.Properly enabled the new hnsw_healing flag when the "all" flag is set to true.
lib/segment/src/index/hnsw_index/graph_layers.rs (1)
106-107: Good decoupling of search context creation from candidate addition.This change separates the creation of the SearchContext from the addition of the initial candidate point. This approach is more flexible and aligns with modifications in the SearchContext constructor, improving control over the search initialization process.
lib/segment/src/index/hnsw_index/search_context.rs (1)
16-20: Improved SearchContext constructor design.The constructor has been simplified to only initialize an empty context with the specified capacity, removing the requirement for an initial entry point. This change supports more flexible usage patterns where candidates can be added explicitly after initialization, which is particularly useful for the new HNSW healing functionality.
| fn migrate_point( | ||
| &self, | ||
| src_old: PointOffsetType, | ||
| builder: &GraphLayersBuilder, | ||
| ) -> OperationResult<()> { | ||
| let links = &self.index.graph.links; | ||
| let point_level = links.point_level(src_old); | ||
|
|
||
| let hardware_counter = HardwareCounterCell::disposable(); // Internal operation. No measurements needed | ||
| let vector_storage = self.index.vector_storage.borrow(); | ||
| let old_scorer = new_raw_scorer( | ||
| vector_storage.get_vector(src_old).as_vec_ref().into(), | ||
| &vector_storage, | ||
| hardware_counter, | ||
| )?; | ||
|
|
||
| let links = (0..=point_level).map(|level| { | ||
| links | ||
| .links(src_old, level) | ||
| .take(if level == 0 { self.m0 } else { self.m }) | ||
| .filter_map(|dst_old| self.old_to_new[dst_old as usize]) | ||
| .collect() | ||
| let level_m = if level == 0 { self.m0 } else { self.m }; | ||
| let mut new_links = Vec::new(); | ||
| let mut need_fixing = 0; | ||
|
|
||
| for link in links.links(src_old, level).take(level_m) { | ||
| if let Some(new_link) = self.old_to_new[link as usize] { | ||
| new_links.push(new_link); | ||
| } else { | ||
| need_fixing += 1; | ||
| } | ||
| } | ||
|
|
||
| if need_fixing > 0 { | ||
| let shortcuts = builder | ||
| .search_shortcuts_on_level(src_old, level, old_scorer.as_ref(), self) | ||
| .into_iter_sorted(); | ||
| let mut container = LinksContainer::with_capacity(level_m); | ||
| let new_scorer = |a, b| { | ||
| let a_old = self.new_to_old[a as usize].unwrap(); | ||
| let b_old = self.new_to_old[b as usize].unwrap(); | ||
| old_scorer.score_internal(a_old, b_old) | ||
| }; | ||
| container.fill_from_sorted_with_heuristic(shortcuts, need_fixing, new_scorer); | ||
| new_links.extend_from_slice(container.links()); | ||
| } | ||
|
|
||
| new_links | ||
| }); | ||
| Some(links.collect()) | ||
|
|
||
| builder.add_new_point(self.old_to_new[src_old as usize].unwrap(), links.collect()); | ||
|
|
||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
unwrap() on internal mapping can panic in inconsistent states
new_scorer assumes that every shortcut returned by
search_shortcuts_on_level has a valid back-mapping:
let a_old = self.new_to_old[a as usize].unwrap();While the surrounding logic should guarantee this, any future change
in search_shortcuts_on_level or misuse of LinksContainer could break
the invariant and bring the whole build down with an unexpected panic.
A safer pattern is to propagate the error explicitly:
- let a_old = self.new_to_old[a as usize].unwrap();
+ let a_old = self.new_to_old[a as usize]
+ .ok_or_else(|| OperationError::service_error("Inconsistent mapping (a)"))?;Do this for both a_old and b_old, or at minimum replace unwrap
with debug_assert! so that release builds stay panic-free.
🤖 Prompt for AI Agents
In lib/segment/src/index/hnsw_index/hnsw.rs around lines 1555 to 1604, the use
of unwrap() on self.new_to_old[a as usize] and self.new_to_old[b as usize]
inside the new_scorer closure can cause panics if the internal mapping is
inconsistent. To fix this, replace unwrap() with explicit error handling by
propagating the error or, at minimum, replace unwrap() calls with debug_assert!
checks to ensure release builds do not panic while still catching issues in
debug builds.
generall
left a comment
There was a problem hiding this comment.
There are a lot of thing which require more detailed explanation. I propose to schedule a call to comment stuff together.
| vector_storage: &VectorStorageEnum, | ||
| id_tracker: &IdTrackerSS, | ||
| ) -> Option<Self> { | ||
| feature_flags.incremental_hnsw_building.then_some(())?; |
There was a problem hiding this comment.
I would rather made more explicit.
| // Old index has deleted points. | ||
| // FIXME: Not supported yet. | ||
| return None; | ||
| feature_flags.hnsw_healing.then_some(())?; |
There was a problem hiding this comment.
same, this ie might be compact, but not obvious use of ?
| // FIXME: Not supported yet. | ||
| return None; | ||
| missing_points += 1; | ||
| feature_flags.hnsw_healing.then_some(())?; |
| // FIXME: Not supported yet. | ||
| return None; | ||
| missing_points += 1; | ||
| feature_flags.hnsw_healing.then_some(())?; |
| { | ||
| // Vector is deleted, but still present in the graph. | ||
| missing_points += 1; | ||
| feature_flags.hnsw_healing.then_some(())?; |
| .filter_map(|dst_old| self.old_to_new[dst_old as usize]) | ||
| .collect() | ||
| let level_m = if level == 0 { self.m0 } else { self.m }; | ||
| let mut new_links = Vec::new(); |
There was a problem hiding this comment.
| let mut new_links = Vec::new(); | |
| let mut new_links = Vec::with_capacity(level_m); |
| if let Some(links) = old_index.get_links(vector_id) { | ||
| graph_layers_builder.add_new_point(vector_id, links); | ||
| if let Some(old_offset) = old_index.new_to_old[vector_id as usize] { | ||
| to_heal.push(old_offset); |
There was a problem hiding this comment.
It looks like the only condition for to_heal is that point exists in another graph. Which is confusing, because we already could re-use points before. Do we call all migrations healing, or I am missing something?
| } | ||
| } | ||
|
|
||
| for old_offset in 0..old_to_new.len() { |
There was a problem hiding this comment.
This section requires a comment
|
|
||
| let hardware_counter = HardwareCounterCell::disposable(); // Internal operation. No measurements needed | ||
| let vector_storage = self.index.vector_storage.borrow(); | ||
| let old_scorer = new_raw_scorer( |
| hardware_counter, | ||
| )?; | ||
|
|
||
| let links = (0..=point_level).map(|level| { |
There was a problem hiding this comment.
links -> links_per_level, otherwise it is not clear without a typehint
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
lib/segment/src/index/hnsw_index/hnsw.rs (1)
1621-1623:⚠️ Potential issue
unwrap()on internal mapping can panic in inconsistent states
new_scorerassumes that every shortcut returned bysearch_shortcuts_on_levelhas a valid back-mapping:let a_old = self.new_to_old[a as usize].unwrap();While the surrounding logic should guarantee this, any future change in
search_shortcuts_on_levelor misuse ofLinksContainercould break the invariant and bring the whole build down with an unexpected panic.A safer pattern is to propagate the error explicitly:
let new_scorer = |a, b| { - let a_old = self.new_to_old[a as usize].unwrap(); - let b_old = self.new_to_old[b as usize].unwrap(); - old_scorer.score_internal(a_old, b_old) + match (self.new_to_old[a as usize], self.new_to_old[b as usize]) { + (Some(a_old), Some(b_old)) => old_scorer.score_internal(a_old, b_old), + _ => { + // This should not happen, but handle it gracefully + debug_assert!(false, "Missing mapping in new_scorer"); + 0.0 // Return worst possible score for invalid mappings + } + } };
🧹 Nitpick comments (3)
lib/segment/src/index/hnsw_index/hnsw.rs (2)
1496-1506: This section requires a commentThis block identifies points that once existed in the old index but were deleted in its tracker, marking them as missing points for healing purposes.
Add a more detailed comment explaining this loop's purpose:
// Loop 1: use both id trackers to find matching external ids. for item in itertools::merge_join_by( id_tracker.iter_from(None), old_id_tracker.iter_from(None), |(new_external_id, _), (old_external_id, _)| new_external_id.cmp(old_external_id), ) { - // Loop 2: find points that once were in the old index, but deleted in - // its id tracker. + // Loop 2: Find points that were deleted in the old index's tracker + // but still exist in the graph structure. These points need healing + // since they have connections in the graph but no valid vectors. for old_offset in 0..old_to_new.len() { let old_offset = old_offset as PointOffsetType; if old_id_tracker.is_deleted_point(old_offset) && old_graph_has_point(old_offset) { missing_points += 1; if feature_flags.hnsw_healing { return None; } } }
1593-1630: This requires commentThe
links_on_levelmethod is complex and would benefit from a clear explanation of how it identifies and fixes links when migrating points from an old index.Add a detailed comment explaining the method's purpose and approach:
+ /// Reconstructs links for a point on a specific level during migration. + /// + /// This method performs two key operations: + /// 1. Preserves valid links that point to points still existing in the new index + /// 2. For links pointing to deleted/missing points, finds alternative shortcuts + /// using a specialized graph search to maintain connectivity fn links_on_level( &self, src_old: PointOffsetType, level: usize, builder: &GraphLayersBuilder, old_scorer: &dyn RawScorer, ) -> Vec<PointOffsetType> { let links = &self.index.graph.links; let level_m = if level == 0 { self.m0 } else { self.m }; let mut new_links = Vec::with_capacity(level_m); let mut need_fixing = 0; for link in links.links(src_old, level).take(level_m) { if let Some(new_link) = self.old_to_new[link as usize] { new_links.push(new_link); } else { need_fixing += 1; } } if need_fixing > 0 { let shortcuts = builder .search_shortcuts_on_level(src_old, level, old_scorer, self) .into_iter_sorted(); let mut container = LinksContainer::with_capacity(level_m); let new_scorer = |a, b| { let a_old = self.new_to_old[a as usize].unwrap(); let b_old = self.new_to_old[b as usize].unwrap(); old_scorer.score_internal(a_old, b_old) }; container.fill_from_sorted_with_heuristic(shortcuts, need_fixing, new_scorer); new_links.extend_from_slice(container.links()); } new_links }lib/segment/src/index/hnsw_index/graph_layers_builder.rs (1)
526-612: Comprehensive implementation of shortcut searchThe
search_shortcuts_on_levelmethod effectively implements a specialized search that can traverse deleted points to find valid shortcuts. The implementation is thorough and covers edge cases well.There are a few debug assertions that could be improved with more informative error messages to help diagnose issues in development.
Add more descriptive messages to debug assertions:
debug_assert!( false, - "The new graph points expected to be a subset of the old graph points" + "Found a point in new graph that doesn't exist in old graph. Point ID: {new_link}" ); debug_assert!( offset_new_ready.is_none(), - "For ready points, every neighbor expected to be in the new graph" + "Found a ready point with neighbors missing from new graph. Point ID: {candidate_old.idx}" );
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/segment/src/index/hnsw_index/graph_layers_builder.rs(6 hunks)lib/segment/src/index/hnsw_index/hnsw.rs(11 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-low-resources
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-consistency
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: integration-tests-consensus
- GitHub Check: test-consensus-compose
- GitHub Check: integration-tests
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: lint
- GitHub Check: storage-compat-test
🔇 Additional comments (16)
lib/segment/src/index/hnsw_index/hnsw.rs (9)
15-15: Good addition of FeatureFlags importThe code now imports the
FeatureFlagsstruct, which will be used to control the new healing functionality.
261-267: Feature flag gating for incremental buildingProperly gates the old index evaluation with feature flags, ensuring compatibility with existing configurations.
387-406: Good implementation of parallel migrationThe migration process has been parallelized using the thread pool, which should significantly improve performance when reusing the index. The timers also help with performance monitoring.
1383-1384: Well-structured data model for tracking point statusAdding the
missing_pointsfield allows tracking points that need healing, which is essential for determining whether healing is more efficient than rebuilding.
1408-1411: Proper feature flag checkEarly return if incremental building is disabled, which prevents unnecessary computation.
1437-1441: Improved point validationThe new helper function
old_graph_has_pointprovides a clear way to check if a point is included in the old graph by looking for outgoing links at level 0.
1512-1519: Well-designed healing thresholdLimiting healing to cases where less than 10% of points are missing is a good optimization. If too many points need healing, rebuilding from scratch is likely more efficient.
1542-1546: Improved logging for operational visibilityAdded detailed logging of reused and healed points, which helps with monitoring and debugging the healing process.
1564-1566: Useful accessor methodThe
graph()method provides clean access to the internal graph structure, which is needed for the healing process.lib/segment/src/index/hnsw_index/graph_layers_builder.rs (7)
6-8: Appropriate import updatesUpdated imports to support the new functionality while maintaining a clean dependency structure.
354-358: Improved parameter namingChanged parameter name from
levelstolinks_by_levelto better reflect the actual content of the parameter, making the code more self-documenting.
387-388: Updated initialization pattern for SearchContextSplit the search context initialization into separate creation and processing steps, improving consistency with other parts of the codebase and making the flow clearer.
501-525: Well-documented dual-graph search algorithmThe documentation for the
search_shortcuts_on_levelmethod clearly explains the dual-graph search approach and how it differs from regular search. This is essential for understanding this complex algorithm.
554-559: Efficient pruning of search spaceThis early termination check improves efficiency by skipping candidates that won't improve the results, which is particularly important for a search algorithm that might traverse many deleted points.
564-578: Conditional graph traversal based on point readinessThis section intelligently decides whether to use the new or old graph for traversal based on whether the point is ready in the new graph, implementing the dual-graph search strategy described in the comments.
601-608: Careful handling of search resultsThe code distinguishes between points that should be added to the search results (points with mappings to the new index) and points that should be queued for further exploration (points without mappings), implementing the traversal strategy described in the comments.
| /// Migrate point from old index to new index. | ||
| fn migrate_point( | ||
| &self, | ||
| src_old: PointOffsetType, | ||
| builder: &GraphLayersBuilder, | ||
| ) -> OperationResult<()> { | ||
| let links = &self.index.graph.links; | ||
| let point_level = links.point_level(src_old); | ||
|
|
||
| let links = (0..=point_level).map(|level| { | ||
| links | ||
| .links(src_old, level) | ||
| .take(if level == 0 { self.m0 } else { self.m }) | ||
| .filter_map(|dst_old| self.old_to_new[dst_old as usize]) | ||
| .collect() | ||
| }); | ||
| Some(links.collect()) | ||
| let hardware_counter = HardwareCounterCell::disposable(); // Internal operation. No measurements needed | ||
| let vector_storage = self.index.vector_storage.borrow(); | ||
| let old_scorer = new_raw_scorer( | ||
| vector_storage.get_vector(src_old).as_vec_ref().into(), | ||
| &vector_storage, | ||
| hardware_counter, | ||
| )?; | ||
|
|
||
| let links_by_level: Vec<Vec<PointOffsetType>> = (0..=point_level) | ||
| .map(|level| self.links_on_level(src_old, level, builder, old_scorer.as_ref())) | ||
| .collect(); | ||
|
|
||
| builder.add_new_point(self.old_to_new[src_old as usize].unwrap(), links_by_level); | ||
|
|
||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
This migration function needs additional error handling
The migrate_point function is a core part of the healing process, migrating a point from the old index to the new one. However, it uses .unwrap() on the point access which could panic if the mapping becomes inconsistent.
Safer handling of potential mapping errors can prevent runtime panics:
fn migrate_point(
&self,
src_old: PointOffsetType,
builder: &GraphLayersBuilder,
) -> OperationResult<()> {
let links = &self.index.graph.links;
let point_level = links.point_level(src_old);
let hardware_counter = HardwareCounterCell::disposable(); // Internal operation. No measurements needed
let vector_storage = self.index.vector_storage.borrow();
let old_scorer = new_raw_scorer(
vector_storage.get_vector(src_old).as_vec_ref().into(),
&vector_storage,
hardware_counter,
)?;
let links_by_level: Vec<Vec<PointOffsetType>> = (0..=point_level)
.map(|level| self.links_on_level(src_old, level, builder, old_scorer.as_ref()))
.collect();
- builder.add_new_point(self.old_to_new[src_old as usize].unwrap(), links_by_level);
+ if let Some(new_id) = self.old_to_new[src_old as usize] {
+ builder.add_new_point(new_id, links_by_level);
+ } else {
+ return Err(OperationError::service_error(
+ format!("Inconsistent mapping for point {}", src_old)
+ ));
+ }
Ok(())
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// Migrate point from old index to new index. | |
| fn migrate_point( | |
| &self, | |
| src_old: PointOffsetType, | |
| builder: &GraphLayersBuilder, | |
| ) -> OperationResult<()> { | |
| let links = &self.index.graph.links; | |
| let point_level = links.point_level(src_old); | |
| let links = (0..=point_level).map(|level| { | |
| links | |
| .links(src_old, level) | |
| .take(if level == 0 { self.m0 } else { self.m }) | |
| .filter_map(|dst_old| self.old_to_new[dst_old as usize]) | |
| .collect() | |
| }); | |
| Some(links.collect()) | |
| let hardware_counter = HardwareCounterCell::disposable(); // Internal operation. No measurements needed | |
| let vector_storage = self.index.vector_storage.borrow(); | |
| let old_scorer = new_raw_scorer( | |
| vector_storage.get_vector(src_old).as_vec_ref().into(), | |
| &vector_storage, | |
| hardware_counter, | |
| )?; | |
| let links_by_level: Vec<Vec<PointOffsetType>> = (0..=point_level) | |
| .map(|level| self.links_on_level(src_old, level, builder, old_scorer.as_ref())) | |
| .collect(); | |
| builder.add_new_point(self.old_to_new[src_old as usize].unwrap(), links_by_level); | |
| Ok(()) | |
| } | |
| /// Migrate point from old index to new index. | |
| fn migrate_point( | |
| &self, | |
| src_old: PointOffsetType, | |
| builder: &GraphLayersBuilder, | |
| ) -> OperationResult<()> { | |
| let links = &self.index.graph.links; | |
| let point_level = links.point_level(src_old); | |
| let hardware_counter = HardwareCounterCell::disposable(); // Internal operation. No measurements needed | |
| let vector_storage = self.index.vector_storage.borrow(); | |
| let old_scorer = new_raw_scorer( | |
| vector_storage.get_vector(src_old).as_vec_ref().into(), | |
| &vector_storage, | |
| hardware_counter, | |
| )?; | |
| let links_by_level: Vec<Vec<PointOffsetType>> = (0..=point_level) | |
| .map(|level| self.links_on_level(src_old, level, builder, old_scorer.as_ref())) | |
| .collect(); | |
| - builder.add_new_point(self.old_to_new[src_old as usize].unwrap(), links_by_level); | |
| + if let Some(new_id) = self.old_to_new[src_old as usize] { | |
| + builder.add_new_point(new_id, links_by_level); | |
| + } else { | |
| + return Err(OperationError::service_error( | |
| + format!("Inconsistent mapping for point {}", src_old) | |
| + )); | |
| + } | |
| Ok(()) | |
| } |
🤖 Prompt for AI Agents
In lib/segment/src/index/hnsw_index/hnsw.rs around lines 1568 to 1592, the
migrate_point function uses unwrap() on self.old_to_new[src_old as usize], which
can cause a panic if the mapping is inconsistent. Replace unwrap() with proper
error handling by checking if the mapping exists and returning an error if it
does not, to prevent runtime panics and ensure safer migration.
* Let SearchContext::new create empty context * FixedLengthPriorityQueue: add `is_full()` * Implement HNSW healing * Apply review suggestions * review comments and docstrcigs --------- Co-authored-by: generall <andrey@vasnetsov.com>
* Let SearchContext::new create empty context * FixedLengthPriorityQueue: add `is_full()` * Implement HNSW healing * Apply review suggestions * review comments and docstrcigs --------- Co-authored-by: generall <andrey@vasnetsov.com>
This PR implements graph healing algorithm, allowing HNSW index to reuse the old graph even if some of the vectors are updated or deleted.
This is done by the new graph search function
GraphLayersBuilder::search_shortcuts_on_levelwhich finds shortcut links around the deleted points. (check it's documentation comment for more info)The main graph building is still done in two stages (migrate phase, and main phase).
But in this PR,
By default, this feature is hidden behind the new
hnsw_healingfeature flag.The threshold is 10% of missing points (if more than that, then the old index is discarded).
Benchmark on random data
Index build time from scratch: 4519 ms
Index build time with healing (2% updated): 117+363 = 480 ms (117 ms for migration and 363 ms for main phase)
Index build time with healing (4% updated): 245+506 = 751 ms
Index build time with healing (8% updated): 660+725 = 1385 ms