Merge take_snapshot and take_partial_snapshot methods#6232
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
lib/collection/src/collection_manager/holders/proxy_segment.rs (1)
1302-1332: Ensure partial snapshot logic and error handling are clear.This new
take_snapshotimplementation straightforwardly delegates to bothwrapped_segmentandwrite_segment. Consider these points:
- If snapshot creation fails for one segment, there is no fallback or rollback mechanism.
- Currently, partial snapshot is not fully implemented (consistent with PR objectives).
- You may want additional logs or error-handling steps to clarify partial snapshot progress and potential failures.
lib/segment/src/segment/snapshot.rs (1)
129-218:get_segment_manifest: versioning logic is thorough.
- Collects and version-tags files meticulously, ensuring unversioned entries do not override versioned entries.
- Inserts “”unversioned” placeholders for RocksDB references.
- Includes a TODO for RocksDB versioning, which may be worth exploring soon to ensure complete partial snapshot coverage.
Overall, the logic is clear and consistent, though be mindful of the performance impact if the file set is large.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
lib/collection/src/collection_manager/holders/proxy_segment.rs(4 hunks)lib/collection/src/collection_manager/holders/segment_holder.rs(2 hunks)lib/segment/src/data_types/segment_manifest.rs(2 hunks)lib/segment/src/entry/entry_point.rs(2 hunks)lib/segment/src/entry/mod.rs(1 hunks)lib/segment/src/entry/snapshot_entry.rs(1 hunks)lib/segment/src/segment/entry.rs(2 hunks)lib/segment/src/segment/mod.rs(3 hunks)lib/segment/src/segment/partial_snapshot.rs(0 hunks)lib/segment/src/segment/snapshot.rs(2 hunks)
💤 Files with no reviewable changes (1)
- lib/segment/src/segment/partial_snapshot.rs
🧰 Additional context used
🧬 Code Definitions (4)
lib/segment/src/data_types/segment_manifest.rs (4)
lib/collection/src/collection_manager/holders/segment_holder.rs (3)
get(99-104)get(308-312)is_empty(178-180)lib/collection/src/collection_manager/holders/proxy_segment.rs (2)
is_empty(930-932)is_empty(1380-1382)lib/segment/src/entry/entry_point.rs (1)
is_empty(218-218)lib/segment/src/segment/entry.rs (1)
is_empty(428-430)
lib/collection/src/collection_manager/holders/proxy_segment.rs (5)
lib/segment/src/segment/snapshot.rs (1)
take_snapshot(29-100)lib/segment/src/entry/snapshot_entry.rs (1)
take_snapshot(16-23)lib/collection/src/collection_manager/optimizers/indexing_optimizer.rs (1)
temp_path(247-249)lib/collection/src/collection_manager/optimizers/vacuum_optimizer.rs (1)
temp_path(174-176)lib/collection/src/collection_manager/optimizers/config_mismatch_optimizer.rs (1)
temp_path(223-225)
lib/segment/src/segment/mod.rs (1)
lib/segment/src/common/operation_error.rs (1)
service_error(84-89)
lib/segment/src/segment/snapshot.rs (10)
lib/segment/src/vector_storage/vector_storage_base.rs (2)
files(85-85)files(723-749)lib/collection/src/collection_manager/holders/proxy_segment.rs (4)
take_snapshot(1303-1332)collect_segment_manifests(1334-1346)version(355-360)version(1411-1416)lib/segment/src/entry/snapshot_entry.rs (2)
take_snapshot(16-23)collect_segment_manifests(25-25)lib/segment/src/data_types/segment_manifest.rs (1)
version(12-16)lib/segment/src/index/vector_index_base.rs (2)
files(37-37)files(194-208)lib/segment/src/id_tracker/id_tracker_base.rs (2)
files(163-163)files(390-397)lib/segment/src/index/payload_index_base.rs (1)
files(151-151)lib/segment/src/payload_storage/payload_storage_base.rs (1)
files(76-76)lib/segment/src/vector_storage/quantized/quantized_vectors.rs (1)
files(202-215)lib/segment/src/segment/entry.rs (1)
version(35-37)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test (macos-latest)
- GitHub Check: test
- GitHub Check: test (windows-latest)
- GitHub Check: test-consensus
- GitHub Check: test-consensus-compose
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (28)
lib/segment/src/entry/mod.rs (2)
2-2: Module renaming looks good.The change from
partial_snapshot_entrytosnapshot_entryreflects the consolidation of snapshot functionality described in the PR objectives.
4-5: Appropriate public exports added.Good addition of public exports for
SegmentEntryandSnapshotEntry, making them directly accessible from the entry module. This improves the API's usability.lib/collection/src/collection_manager/holders/segment_holder.rs (2)
21-21: Added necessary import for SegmentManifests.Good addition of the import for
SegmentManifests, which is required for the updated snapshot method signature.
1358-1364: Method call updated with the new parameter.The
take_snapshotmethod call has been updated to include a defaultSegmentManifestsinstance, aligning with the method signature changes. This is a good implementation of the PR's objective to merge the snapshot methods.lib/segment/src/data_types/segment_manifest.rs (3)
36-38: Method reordering looks good.The
getmethod has been reordered within the implementation. The functionality remains unchanged, which is appropriate.
40-42: Useful helper method added.The
is_emptymethod provides a convenient way to check if there are any manifests present. This is a good addition that will help with conditional logic in the snapshot implementation.
53-59: Good utility method for creating empty manifests.The
emptymethod provides a clean way to create an empty manifest with a specified segment ID. This will be useful for initializing new manifests without redundant code.lib/segment/src/entry/snapshot_entry.rs (3)
10-10: Good trait renaming.Renaming from
PartialSnapshotEntrytoSnapshotEntryreflects the consolidation of snapshot functionality as described in the PR objectives.
11-16: Clear and comprehensive documentation.The added documentation clearly explains the purpose and functionality of the
take_snapshotmethod. This is particularly important given the method signature changes.
20-22: Method signature updated correctly with new parameters.The method signature has been updated to include the
formatandmanifestparameters, properly implementing the PR's objective to merge the snapshot methods.lib/segment/src/segment/entry.rs (2)
2-2: Remove unused import for efficiencyIt appears that only
PathBufis needed here, so removing otherstd::pathimports helps keep the module focused and free of clutter.
28-28: Add new imports for extended type usageImporting
VectorDataInfo,VectorName, and others aligns with their usage in the segment implementation. This change keeps the code organized.lib/segment/src/segment/mod.rs (4)
10-10: Expose snapshot module publiclyChanging
mod snapshottopub mod snapshotmakes snapshot functionality available outside this module. This is beneficial for broader integration.
17-17: Import bothPathandPathBufAdding
Pathis a sensible change, enabling direct usage in functions that manipulate filesystem paths.
27-27: Update imports to includeOperationErrorImporting
OperationErrorenables clearer error handling practices within this module.
120-130: Introducedestroy_rocksdbfunctionThis function cleanly destroys a RocksDB instance at a given path and translates any failure into
OperationError. Consider ensuring it’s only called when the DB is no longer in use to avoid concurrency issues.lib/segment/src/entry/entry_point.rs (4)
2-2: Import onlyPathBufRestricting the import to
PathBufhelps keep imports minimal and consistent with usage in this file.
15-15: Use the newSnapshotEntrytraitSwitching from
PartialSnapshotEntrytoSnapshotEntryaligns with the consolidated snapshot logic across the codebase.
22-23: Refine import list for type usageIncluding the newly introduced or reorganized types supports the updated snapshot-based workflow.
29-29: Adopt unified snapshot approachHaving
SegmentEntrydirectly extendSnapshotEntrycentralizes the snapshot functionality and makes future enhancements more straightforward.lib/collection/src/collection_manager/holders/proxy_segment.rs (1)
21-21: Import for implementing the SnapshotEntry trait.This import is necessary for the subsequent implementation of the
SnapshotEntrytrait and looks correct.lib/segment/src/segment/snapshot.rs (7)
1-21: Imports for snapshot functionality and dependencies.These new imports (including
HashMap,HashSet,Deref, andSnapshotFormat) are appropriate for the snapshot implementation and appear consistent with the rest of the file.
28-101: Review the newly introducedtake_snapshotmethod.
- The logic correctly prevents multiple snapshots of the same segment by using
snapshotted_segments.- The segment is flushed before snapshotting to ensure a consistent state.
- The partial snapshot path starts to appear if
manifestis not empty—even though partial logic is still evolving, this is a suitable placeholder.- The
Ancientformat is explicitly unsupported with a debug assertion and returns a helpful error.Overall, this method effectively handles fundamental snapshot creation. In the future, consider concurrency and performance implications of synchronous flushes for large segments.
102-106: Simplecollect_segment_manifestsfunction.This function adds the segment’s manifest into the provided
SegmentManifestsmap. The approach is minimal and matches the trait's requirement.
108-127:segment_idfor extracting UUID from path.Validates the segment directory name as a UUID and returns an error if not. This check is robust and prevents ambiguous or invalid segment naming.
220-238: Comprehensive file gathering infiles.This enumerates vector data, index data, payloads, and the ID tracker to generate a complete file list. The approach is straightforward and essential for snapshot consistency.
240-258:versioned_filesaligns well with the unversionedfilesmethod.Mirrors the logic of
files, but incorporates specific version tracking from each component. This design fosters future partial snapshot enhancements.
368-397:updated_filesmethod for partial snapshot detection.Compares file versions between the old manifest and the current one to detect changes. This is crucial for deciding which files to include in partial snapshots. The logic is clear and well-structured.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/segment/tests/integration/segment_on_disk_snapshot.rs (1)
158-166: Consider adding a test for partial snapshots.While the current implementation uses
SegmentManifests::default()for regular snapshots, it would be beneficial to add a test case that exercises the partial snapshot functionality once it's implemented.// Example test case to add in the future: #[test] fn test_partial_snapshot() { // Setup similar to existing tests // Create a non-default manifest for partial snapshot let partial_manifest = SegmentManifests::new_partial(...); // Take a partial snapshot segment.take_snapshot( temp_dir.path(), &tar, format, &partial_manifest, &mut HashSet::new(), ).unwrap(); // Verify partial snapshot behavior }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/segment/src/segment/tests.rs(2 hunks)lib/segment/tests/integration/segment_on_disk_snapshot.rs(2 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
lib/segment/tests/integration/segment_on_disk_snapshot.rs (1)
lib/segment/src/segment_constructor/segment_builder.rs (1)
new(76-157)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-consistency
- GitHub Check: test-consensus-compose
- GitHub Check: test (macos-latest)
- GitHub Check: test
- GitHub Check: test (windows-latest)
- GitHub Check: test-consensus
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
🔇 Additional comments (6)
lib/segment/src/segment/tests.rs (3)
15-15: LGTM: New import for SegmentManifests is correctly added.This import aligns with the PR objective of unifying
take_snapshotandtake_partial_snapshotmethods by introducing a manifest parameter.
17-17: LGTM: Correctly importing SnapshotEntry trait.Using the underscore suffix in the import statement is the proper way to import a trait solely for its methods.
211-216: LGTM: Updated method call with the new manifest parameter.The
take_snapshotmethod call is correctly updated to include the new&SegmentManifests::default()parameter. This aligns with the PR objective of unifying the snapshot methods.lib/segment/tests/integration/segment_on_disk_snapshot.rs (3)
9-9: LGTM: Appropriate import for SegmentManifests added.Added the necessary import for the
SegmentManifeststype which is now required for the updatedtake_snapshotmethod.
12-12: LGTM: Correctly importing SnapshotEntry trait.The use of the underscore suffix in
use segment::entry::snapshot_entry::SnapshotEntry as _;properly imports the trait methods.
159-165: LGTM: Updated take_snapshot method call with manifest parameter.The method call has been correctly updated to include the new
&SegmentManifests::default()parameter, consistent with the PR objective of unifying the snapshot methods.
0311f59 to
3cf94f6
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
lib/segment/src/segment/snapshot.rs (2)
80-85: Consider removing debug_assert for unsupported formatThe
debug_assert!(false, ...)pattern is unusual here. Since the code already returns an error for unsupported formats, the assertion adds little value and could be removed.match format { SnapshotFormat::Ancient => { - debug_assert!(false, "Unsupported snapshot format: {format:?}"); return Err(OperationError::service_error(format!( "Unsupported snapshot format: {format:?}" ))); }
206-211:Details
❓ Verification inconclusive
Consider addressing RocksDB versioning TODO
There's a TODO indicating RocksDB versioning is needed, but the code adds RocksDB files with
FileVersion::Unversioned. While this works for now, it might be worth creating a more structured roadmap for implementing proper RocksDB versioning in the future.
🏁 Script executed:
#!/bin/bash # Check if there are any other TODOs or discussions about RocksDB versioning rg -i "version.*rocksdb|rocksdb.*version" --type rustLength of output: 906
Action: Plan a structured roadmap for RocksDB versioning
In
lib/segment/src/segment/snapshot.rs(lines 206–211), the code assignsFileVersion::Unversionedto RocksDB virtual files and leaves a TODO regarding versioning. Although this temporary solution works, it’s advisable to outline a clear roadmap for implementing proper RocksDB versioning in the future—especially since similar versioning-related discussions exist in other modules (e.g.,common/rocksdb_wrapper.rsand some id tracker files).
- Verify if a dedicated issue or roadmap exists for RocksDB versioning.
- If not, consider opening an issue to track and plan the transition to a versioned approach.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
lib/collection/src/collection_manager/holders/proxy_segment.rs(4 hunks)lib/collection/src/collection_manager/holders/segment_holder.rs(2 hunks)lib/segment/src/data_types/segment_manifest.rs(2 hunks)lib/segment/src/entry/entry_point.rs(2 hunks)lib/segment/src/entry/mod.rs(1 hunks)lib/segment/src/entry/snapshot_entry.rs(1 hunks)lib/segment/src/segment/entry.rs(2 hunks)lib/segment/src/segment/mod.rs(3 hunks)lib/segment/src/segment/partial_snapshot.rs(0 hunks)lib/segment/src/segment/snapshot.rs(2 hunks)lib/segment/src/segment/tests.rs(2 hunks)lib/segment/tests/integration/segment_on_disk_snapshot.rs(2 hunks)
💤 Files with no reviewable changes (1)
- lib/segment/src/segment/partial_snapshot.rs
🚧 Files skipped from review as they are similar to previous changes (6)
- lib/segment/src/entry/mod.rs
- lib/segment/src/data_types/segment_manifest.rs
- lib/collection/src/collection_manager/holders/segment_holder.rs
- lib/segment/src/segment/tests.rs
- lib/segment/src/entry/snapshot_entry.rs
- lib/segment/tests/integration/segment_on_disk_snapshot.rs
🧰 Additional context used
🧬 Code Definitions (2)
lib/collection/src/collection_manager/holders/proxy_segment.rs (1)
lib/segment/src/entry/snapshot_entry.rs (1)
take_snapshot(16-23)
lib/segment/src/segment/mod.rs (1)
lib/segment/src/common/operation_error.rs (1)
service_error(84-89)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus
- GitHub Check: test (windows-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (21)
lib/segment/src/segment/entry.rs (1)
2-2: Appropriate removal of unused Path import.The import of
Pathis no longer needed as thetake_snapshotmethod (that used&Pathin its signature) has been removed from this implementation as part of consolidating snapshot functionality.lib/segment/src/segment/mod.rs (4)
10-10: Exposing the snapshot module as public is well-aligned with the PR goal.Making the
snapshotmodule public is necessary to allow external code to access the consolidated snapshot implementation, which supports the PR objective of merging the snapshot methods.
17-17: Appropriate import change to include Path.Adding
Pathto the import is necessary for the newdestroy_rocksdbfunction parameter.
27-27: Good addition of OperationError import.This import is needed for the new
destroy_rocksdbfunction to create and return operation errors.
121-130: Well-implemented utility function for RocksDB cleanup.The
destroy_rocksdbfunction provides a clean way to handle RocksDB instance cleanup, which is likely used in the new snapshot implementation. The error handling is robust, converting RocksDB errors to operation errors with descriptive messages.lib/segment/src/entry/entry_point.rs (3)
2-2: Appropriate removal of Path import.The
Pathimport is no longer needed since thetake_snapshotmethod (which used this type) has been removed from theSegmentEntrytrait.
15-15: Good transition from PartialSnapshotEntry to SnapshotEntry.Updating the import to use
SnapshotEntryinstead ofPartialSnapshotEntrysupports the PR goal of consolidating snapshot functionality.
29-29: Well-designed trait inheritance change.Changing the trait inheritance from
PartialSnapshotEntrytoSnapshotEntrycentralizes the snapshot functionality in a single trait, which aligns perfectly with the PR goal of merging the snapshot methods.lib/collection/src/collection_manager/holders/proxy_segment.rs (7)
21-21: Good update to import SnapshotEntry instead of PartialSnapshotEntry.This change correctly updates the import to match the new trait inheritance structure, supporting the consolidated snapshot approach.
1308-1309: Well-implemented trait change from PartialSnapshotEntry to SnapshotEntry.The implementation correctly transitions from implementing
PartialSnapshotEntryto implementingSnapshotEntry, which is necessary for the consolidated snapshot functionality.
1310-1316: Successfully updated method signature to include manifest parameter.The
take_snapshotmethod signature has been properly updated to include the newmanifestparameter, as required by the PR objectives.
1317-1318: Good addition of logging for improved debugging.Adding a log message at the beginning of the method provides useful context for debugging snapshot operations on proxy segments.
1319-1326: Correctly passing manifest to wrapped segment.The implementation properly passes the new
manifestparameter to the wrapped segment'stake_snapshotmethod, ensuring consistent snapshot behavior.
1328-1336: Correctly passing manifest to write segment.The implementation properly passes the new
manifestparameter to the write segment'stake_snapshotmethod, ensuring consistent snapshot behavior across all components.
1888-1888: Tests correctly updated to include manifest parameter.The test cases have been properly updated to include the
&SegmentManifests::default()parameter when callingtake_snapshot, ensuring that the tests continue to work with the new method signature.Also applies to: 1897-1897
lib/segment/src/segment/snapshot.rs (6)
34-35: Good addition of the manifest parameter for supporting partial snapshotsThe implementation aligns with the PR objectives to merge
take_snapshotandtake_partial_snapshotmethods. The newmanifestparameter allows for conditional file inclusion based on file versions, which is essential for partial snapshots.
49-69: Well-implemented conditional file inclusion logicThis code elegantly handles both regular and partial snapshots through the same method. When
manifestis empty, it creates a regular snapshot with all files. Otherwise, it determines which files have changed since the previous snapshot by comparing file versions, enabling the partial snapshot functionality.
102-105: Good implementation of manifest collectionThe
collect_segment_manifestsmethod is a clean and straightforward implementation that builds upon the existingget_segment_manifestfunctionality. This helps maintain a consistent approach to manifest handling throughout the codebase.
109-127: Well-structured segment_id extraction with validationThe method correctly extracts the segment ID from the path with proper error handling. The debug assertion validates that the ID is a valid UUID, which is a good safeguard against potential issues. Consider adding this validation in production builds too.
136-142: Excellent documentation for versioning logicThe implementation includes detailed comments explaining the complex versioning logic, which makes the code much more maintainable. These comments clearly document the expected behavior and edge cases, which is particularly valuable for this critical component.
Also applies to: 165-173
368-397: Well-implemented file comparison logicThe
updated_filesfunction provides a clear implementation for determining which files should be included in a partial snapshot. The detailed comments explaining the inclusion criteria make the logic easy to understand. The comparison of different version types (unversioned vs. versioned) is handled elegantly.
3cf94f6 to
159e37e
Compare
| let mut empty_manifest = None; | ||
| let request_manifest = manifest | ||
| .get(segment_id) | ||
| .unwrap_or_else(|| empty_manifest.insert(SegmentManifest::empty(segment_id))); |
There was a problem hiding this comment.
I don't really get how this works. Shouldn't we take out the empty_manifest in between changing it to:
let request_manifest = manifest
.get(segment_id)
.unwrap_or_else(|| SegmentManifest::empty(segment_id));There was a problem hiding this comment.
manifest is Option<&SegmentManifest>. We can't return &SegmentManifest::empty(...) from unwrap_or_else, so we either have to clone manifest or use this trick to conditionally initialize empty manifest (into empty_manifest variable) and return reference to this empty manifest from unwrap_or_else call.
There was a problem hiding this comment.
Ah, I get it, it's because of the reference.
Construction of SegmentManifest::empty(segment_id) is cheap though, it's only a &str -> String.
This PR merges
ShardReplicaSet::take_snapshotandShardReplicaSet::take_partial_snapshotmethods. They are very similar to each other, and it's easier (on the HTTP API level) to call a single method instead of two separate ones.This PR adds
manifestparameter totake_snapshotand logic that creates a partial snapshot (instead of a regular one) based on this parameter, but it's not yet propagated intotake_snapshot(it will be done in yet another PR, that will add partial snapshot API endpoints), so it always creates a regular snapshot.All Submissions:
devbranch. Did you create your branch fromdev?New Feature Submissions:
cargo +nightly fmt --allcommand prior to submission?cargo clippy --all --all-featurescommand?Changes to Core Features: