Simplify ShardKeyMapping implementation#6220
Conversation
|
Spray and pray PR, really bet on those integration tests 🤞✅ |
📝 WalkthroughWalkthroughThe pull request implements widespread changes to standardize the handling of shard key mappings across the codebase. The update replaces the previous wrapper type ( Possibly related PRs
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
💤 Files with no reviewable changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (13)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (5)
lib/storage/src/content_manager/toc/collection_meta_ops.rs (1)
386-386: Unnecessary self-assignment could be removedThe line
let shards_key_mapping = shards_key_mapping;is a self-assignment that doesn't perform any transformation. This was likely a result of removing the previous.to_map()call during theShardKeyMappingsimplification refactoring. Consider removing this line altogether since the original variable can be used directly.- let shards_key_mapping = shards_key_mapping;lib/collection/src/collection/snapshots.rs (1)
187-188: Update comment to reflect new implementationThe comment "Use wrapper type to support both formats" is now inconsistent with the implementation, which directly uses
ShardKeyMapping. Update the comment to reflect the new implementation approach.- // Use wrapper type to support both formats + // Read the shard key mapping file let shard_key_mapping: ShardKeyMapping = read_json(&mapping_path)?;lib/collection/src/shards/shard_holder/shard_mapping.rs (3)
18-30: Consider restricting direct mutation viaDerefMut.While implementing
DerefandDerefMutimproves convenience, it exposes the coreHashMapfor mutation, potentially bypassing invariants. Consider providing explicit API methods instead to keep the mapping consistent.
32-44: Rename method for clarity.The function name
shards()might be renamed to match the “shard_id_to_shard_key” comment. This improves readability and consistency.- pub fn shards(&self) -> HashMap<ShardId, ShardKey> { + pub fn shard_id_to_shard_key(&self) -> HashMap<ShardId, ShardKey> {
101-115: Feature-flag-based serialization.Relying on
use_new_shard_key_mapping_formatis fine, but to prevent confusion (e.g. numeric keys with the old format), consider forcing the new format if numeric shard keys are found. This may reduce edge cases.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
lib/collection/src/collection/mod.rs(3 hunks)lib/collection/src/collection/snapshots.rs(2 hunks)lib/collection/src/collection/state_management.rs(2 hunks)lib/collection/src/collection_state.rs(2 hunks)lib/collection/src/shards/shard_holder/mod.rs(6 hunks)lib/collection/src/shards/shard_holder/shard_mapping.rs(2 hunks)lib/storage/src/content_manager/toc/collection_meta_ops.rs(1 hunks)src/migrations/single_to_cluster.rs(1 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
lib/collection/src/collection/mod.rs (3)
lib/collection/src/shards/shard_holder/mod.rs (1) (1)
shards(280-280)lib/collection/src/collection/state_management.rs (1) (1)
shards(183-183)lib/collection/src/shards/shard_holder/shard_mapping.rs (1) (1)
shards(35-44)
🔇 Additional comments (21)
lib/collection/src/collection_state.rs (1)
26-27: Implementation matches TODO comment - good work!This change successfully implements the TODO comment by replacing the wrapper type with the new
ShardKeyMappingtype.src/migrations/single_to_cluster.rs (2)
100-100: Simplified accessing of shard key mapping entriesThe change from using
.to_map()to directly using.iter()on theshards_key_mappingobject aligns with the overall refactoring to simplify theShardKeyMappingimplementation.
104-104: Removed reference operator when accessing shard IDRemoved the reference operator (
&) fromshard_idwhen callingget(), which matches with how the newShardKeyMappingtype handles shard IDs.lib/collection/src/collection/state_management.rs (2)
179-179: Signature successfully updated to reflect the new type.This change aligns well with the overall refactor to use
ShardKeyMapping.
12-12:Details
❓ Verification inconclusive
Confirm old references are fully removed.
Swapping from
ShardKeyMappingWrappertoShardKeyMappingis consistent with the new design. Please ensure no references toShardKeyMappingWrapperremain in the codebase.
🏁 Script executed:
#!/bin/bash rg 'ShardKeyMappingWrapper'Length of output: 27
Let's perform a slightly more targeted search before finalizing this review comment. I'll re-run the search specifically in Rust files to be sure there are no stale references:
Once you verify that this command produces no output, we can confidently conclude that all references to
ShardKeyMappingWrapperhave been removed.Rewritten Review Comment:
Ensure complete removal of
ShardKeyMappingWrapper.
The change fromShardKeyMappingWrappertoShardKeyMappinginlib/collection/src/collection/state_management.rs(line 12) aligns with the new design. Please confirm by re-running a thorough search (e.g., usingrg -t rust 'ShardKeyMappingWrapper') that no lingering references toShardKeyMappingWrapperexist anywhere in the codebase.
🏁 Script executed:
#!/bin/bash rg -t rust 'ShardKeyMappingWrapper'Length of output: 35
Action: Confirm removal of
ShardKeyMappingWrapperfrom the entire codebase.
The change inlib/collection/src/collection/state_management.rs(line 12) replacingShardKeyMappingWrapperwithShardKeyMappingfollows the new design. Automated searches using both generic and Rust-specific queries have not returned any occurrences ofShardKeyMappingWrapper. However, the output was minimal, so please perform a manual review (or run an alternate search) across the repository to fully ensure no residual references remain.lib/collection/src/collection/mod.rs (3)
50-50: Import change for the refactored type.Switching the import to
ShardKeyMappingreflects the new approach. Looks good.
108-108: Parameter type refactor.Replacing
Option<ShardKeyMappingWrapper>withOption<ShardKeyMapping>ensures consistency with the updated mapping structure.
352-353: Correct approach to return owned keys.Using
.cloned().collect()to produce a vector of ownedShardKeys is concise and clear.lib/collection/src/shards/shard_holder/shard_mapping.rs (7)
2-2: Additional import for ops.Importing
std::opsto implementDerefandDerefMutis appropriate. No issues found.
12-16: New struct definition for simplified shard mapping.Defining
ShardKeyMappingwith serde attributes and derived traits appears solid. Good job incorporating the new format handling.
58-63: Iterator logic is straightforward.Collecting all shard IDs through
iter_shard_idsis clear and follows well from usingHashSetfor uniqueness.
68-74: Check for multiple keys referencing the same ID.This returns the first key found if a single shard ID appears under multiple shard keys. Please verify that such overlap cannot occur or else handle conflicts explicitly.
77-91: Backward compatibility for old vs. new storage formats.Parsing the helper’s
OldorNewvariants supports existing and new workflows effectively. Nicely done.
94-99: Untagged enum for flexible deserialization.Untagged serde approach provides a seamless fallback for legacy data. Good choice here.
117-139: Simple intermediate struct for new format.
KeyIdsPairis straightforward and helps cleanly separate serialization logic. Looks good.lib/collection/src/shards/shard_holder/mod.rs (6)
17-17: LGTM! Updated import for ShardKeyMapping.This import change is part of the refactoring to use the direct
ShardKeyMappingtype instead of a wrapper.
59-59: LGTM! Updated field type from wrapper to direct type.Changed the field type from
SaveOnDiskShardKeyMappingWrappertoSaveOnDisk<ShardKeyMapping>, removing an unnecessary wrapper layer.
79-80: LGTM! Updated initialization to use direct ShardKeyMapping type.Simplified initialization now loads a
SaveOnDisk<ShardKeyMapping>directly instead of wrapping it.
119-120: LGTM! Simplified return type.Method now returns
ShardKeyMappingdirectly instead of wrapping it, consistent with the type simplification.
275-275: LGTM! Updated parameter type.Changed parameter type from
ShardKeyMappingWrappertoShardKeyMappingfor consistency.
132-139:Details
❓ Verification inconclusive
LGTM! Streamlined implementation with direct mapping.
The implementation has been simplified to:
- Extract the shard ID to key mapping
- Write the mapping directly to storage
- Update the local cache
This is cleaner than the previous approach that required conversion to a map first.
🏁 Script executed:
#!/bin/bash # Verify that the ShardKeyMapping type implements the necessary shards() method ast-grep --pattern 'impl $_ for ShardKeyMapping { $$$ fn shards($$$) -> $_ { $$$ } $$$ }'Length of output: 103
Action Required: Verify ShardKeyMapping Implementation
The updated streamlined implementation looks clean and correct. However, the initial automated check for the
shards()method onShardKeyMappingdid not return output. Please manually verify that theShardKeyMappingtype indeed implements ashards()method with the correct behavior, ensuring that it returns the expected mapping.Bullet points for review:
- Confirm that
ShardKeyMappinghas an implementation offn shards(&self) -> _(with the correct return type) that extracts the shard ID to key mapping.- Ensure that the behavior of this method matches the expectations in the streamlined code (i.e., direct mapping extraction, proper updating of storage, and cache).
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
lib/collection/src/shards/shard_holder/shard_mapping.rs (2)
34-35: Consider addressing the TODO in this PRThe TODO for renaming the method is included in the PR. Consider addressing it now rather than leaving it for future work, especially if this is part of a larger API refactoring effort.
68-74: Consider addressing the TODO and potential performance issue
- The TODO for renaming the method is included in the PR. Consider addressing it now.
- The implementation searches for a shard ID in all shard keys, which could be inefficient for large mappings. If this is a frequently called method, consider adding an inverse mapping from shard ID to shard key as a field in the struct.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/collection/src/shards/shard_holder/shard_mapping.rs(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test-consensus-compose
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus
- GitHub Check: test (windows-latest)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
🔇 Additional comments (8)
lib/collection/src/shards/shard_holder/shard_mapping.rs (8)
2-2: Well-structured encapsulation with Deref patternThe refactoring encapsulates the shard key mapping inside a struct with proper derive traits for serialization/deserialization, which is a better design approach than the previous type alias. Using the
Derefpattern (importingstd::ops) allows for backward compatibility while providing proper encapsulation.Also applies to: 12-16
18-30: Good implementation of Deref and DerefMutThe
DerefandDerefMuttraits allow the struct to be used as if it were aHashMapdirectly, providing backward compatibility with existing code while maintaining proper encapsulation. This is a standard Rust idiom for wrapping collections.
35-44: Method implementation looks correctThe implementation of
shards()correctly usesflat_mapto transform the mapping from shard key to shard IDs into a mapping from shard ID to shard key. The approach is efficient and clear.
59-63: Good iterator implementationCreating a dedicated iterator method is a good practice in Rust, allowing for more flexible and potentially more efficient code. Using
copied()instead of cloning each shard ID is also more efficient.
77-92: Clean implementation of conversion from SerdeHelperThe conversion from the serialization helper to the actual struct is clean and handles both the new and old formats correctly. The matching is straightforward and maps directly to the struct field.
94-99: Good serialization strategyUsing an untagged enum for serialization is a good approach to handle multiple formats without complicating the main struct. This separates serialization concerns from the core data structure.
101-117: Format selection aligns with previous discussionsThe logic for selecting between new and old formats correctly follows the requirements discussed in the previous reviews. It uses the new format if either the feature flag is enabled or if any numeric shard key is used, which preserves type information.
This implementation matches the discussion in the past review comments where using numeric shard keys should opt-in to the new format to preserve type information, even if it breaks backward compatibility.
119-141: Well-designed helper structThe
KeyIdsPairstruct and its implementations are well-designed for serialization purposes. Theinto_partsmethod andFromimplementation make the conversion between the pair and tuple formats clean and explicit.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
lib/collection/src/shards/shard_holder/shard_mapping.rs (2)
18-30: Consider restricting direct mutation of the internal map.
ImplementingDerefandDerefMutonShardKeyMappingexposes the internalHashMapfor direct external modification, which can degrade encapsulation. If external mutation is intended, this is fine; otherwise, consider providing read/write methods to maintain stricter control.
64-72: Potential performance bottleneck for frequent lookups.
Since this performs a linear search over all keys, frequently callingshard_key(shard_id)may be costly for large data sets. Consider a dedicated reverse map if needed for performance.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
lib/collection/src/collection/mod.rs(4 hunks)lib/collection/src/collection/state_management.rs(3 hunks)lib/collection/src/collection_state.rs(2 hunks)lib/collection/src/shards/shard_holder/mod.rs(6 hunks)lib/collection/src/shards/shard_holder/shard_mapping.rs(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- lib/collection/src/collection_state.rs
- lib/collection/src/collection/state_management.rs
- lib/collection/src/collection/mod.rs
🧰 Additional context used
🧬 Code Definitions (1)
lib/collection/src/shards/shard_holder/shard_mapping.rs (3)
lib/common/common/src/flags.rs (1) (1)
feature_flags(38-46)lib/collection/src/collection/state_management.rs (1) (1)
shards(183-183)lib/collection/src/shards/shard_holder/mod.rs (1) (1)
shards(280-280)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-consistency
- GitHub Check: test-consensus
- GitHub Check: test (macos-latest)
- GitHub Check: test
- GitHub Check: test (windows-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (17)
lib/collection/src/shards/shard_holder/shard_mapping.rs (8)
2-2: No issues with addingstd::ops.
This import is clearly needed for implementingDerefandDerefMut.
12-16: Structured and well-derived type definition.
The use of#[derive(Default)]along withDeserializeandSerializestreamlines the creation and persistence ofShardKeyMapping. Overall, this approach is clean.
32-43: Efficient method for reversing the shard mapping.
Generating an inverse map on-the-fly is straightforward and maintains clarity. Complexity is reasonable for most use cases.
57-62: Iterator method looks good.
The fluent approach of returning an iterator (rather than a collected list) is memory-friendly.
75-91: Deserialization logic aligns well with migrating formats.
The fallback strategy for the old format ensures backward compatibility while enabling updates to the new format.
92-105: Preserving older format under a feature flag is a sensible approach.
The plan to remove this in a future version (as noted by TODO) is clear. No issues.
107-123: Correctly switching to the new format for numeric shard keys.
This addresses previous concerns about losing numeric type information, matching user and reviewer feedback.
126-148: Helper struct and conversions are straightforward.
Round-trip conversions for(ShardKey, HashSet<ShardId>)look clean. No further recommendations.lib/collection/src/shards/shard_holder/mod.rs (9)
17-17: Import looks appropriate.
No concerns regarding referencingShardKeyMappinghere.
59-59: Storing the mapping inSaveOnDiskis consistent.
PersistingShardKeyMappingwith the same pattern as other data structures is logical and maintains uniformity.
79-80: Initialization strategy makes sense.
Usingload_or_init_defaultis straightforward for ensuring the mapping is present after restarts.
119-120: Simple getter returning a cloned mapping.
This is fine if returning a full copy is expected. Monitor performance if the mapping grows large.
132-133: Straightforward signature update.
Accepting a plainShardKeyMappingparameter aligns with the new design.
134-135: Good use ofshard_id_to_shard_key().
Generating the inverse map once prevents repeated computation, improving readability and performance.
137-138: Overwriting the stored mapping is handled cleanly.
Using.write_optionalwithSome(shard_key_mapping)is clear and consistent with the rest of the codebase.
139-140: Properly caching the inverse mapping in memory.
No issues with synchronizing it right after the write operation.
275-276:apply_shards_stateparameter update is consistent.
IntegratingShardKeyMappingaligns the method with the rest of the refactor.
|
I consider this to be done. Thanks! 🙏 |
* Simplify `ShardKeyMapping` impl * Automatically switch to new mappings format, if `Number` shard key is used * Apply TODOs * Add comment describing the issue, leave instructions for migration * Cleanup Remove unnecessary assignment * Cleanup Remove misleading comment --------- Co-authored-by: timvisee <tim@visee.me>
This PR simplifies new
ShardKeyMappingformat support introduced in #6209.All Submissions:
devbranch. Did you create your branch fromdev?New Feature Submissions:
cargo +nightly fmt --allcommand prior to submission?cargo clippy --all --all-featurescommand?Changes to Core Features: