DiffConfig: replace serde-based impl with explicit impl#7294
Conversation
|
@coderabbitai review |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (4)
lib/collection/src/operations/types.rs (1)
1539-1540: Empty-diff check: semantics shift OK; minor readability nit.This directly treats None or Some(default) as empty, matching the new macro-based Diff semantics. Optionally, this reads a bit clearer with matches!.
Please confirm HnswConfigDiff derives Default + PartialEq so Some(default()) reliably represents “no fields set”.
-fn is_hnsw_diff_empty(hnsw_config: &Option<HnswConfigDiff>) -> bool { - hnsw_config.is_none() || *hnsw_config == Some(HnswConfigDiff::default()) -} +fn is_hnsw_diff_empty(hnsw_config: &Option<HnswConfigDiff>) -> bool { + matches!(hnsw_config.as_ref(), None | Some(d) if d == &HnswConfigDiff::default()) +}lib/collection/src/collection/collection_ops.rs (1)
190-190: Borrowed update for strict mode LGTM.Naming nit: strict_mode_diff is a full StrictModeConfig, not a diff type, but that’s existing and non-blocking.
lib/collection/src/config.rs (1)
450-464: Avoid cloning VectorsConfigDiff entries (optional).You can iterate by reference and clone only when needed, reducing intermediate allocations.
Apply this refactor:
- for (vector_name, update_params) in update_vectors_diff.0.iter() { - let vector_params = self.get_vector_params_mut(vector_name)?; - let VectorParamsDiff { - hnsw_config, - quantization_config, - on_disk, - } = update_params.clone(); + for (vector_name, update_params) in &update_vectors_diff.0 { + let vector_params = self.get_vector_params_mut(vector_name)?; + let VectorParamsDiff { + hnsw_config, + quantization_config, + on_disk, + } = update_params; - if let Some(hnsw_diff) = hnsw_config { - if let Some(existing_hnsw) = &vector_params.hnsw_config { - vector_params.hnsw_config = Some(existing_hnsw.update(&hnsw_diff)); - } else { - vector_params.hnsw_config = Some(hnsw_diff); - } + if let Some(hnsw_diff) = hnsw_config.as_ref() { + vector_params.hnsw_config = Some( + vector_params + .hnsw_config + .as_ref() + .map(|existing| existing.update(hnsw_diff)) + .unwrap_or_else(|| hnsw_diff.clone()), + ); } - if let Some(quantization_diff) = quantization_config { + if let Some(quantization_diff) = quantization_config.as_ref() { vector_params.quantization_config = match quantization_diff.clone() { QuantizationConfigDiff::Scalar(scalar) => { Some(QuantizationConfig::Scalar(scalar)) } QuantizationConfigDiff::Product(product) => { Some(QuantizationConfig::Product(product)) } QuantizationConfigDiff::Binary(binary) => { Some(QuantizationConfig::Binary(binary)) } QuantizationConfigDiff::Disabled(_) => None, } } - if let Some(on_disk) = on_disk { + if let Some(on_disk) = on_disk.as_ref() { vector_params.on_disk = Some(on_disk); } }lib/collection/src/operations/config_diff.rs (1)
392-397: Optional fields cannot be explicitly cleared via diff
impl DiffConfigMerge<T> for Option<T>treatsNonein diff as “no-op”. There’s no way to set a config field of typeOption<T>toNonethrough a diff. If clearing is required, consider a wrapper enum (e.g., Set(T)/Clear) orOption<Option<T>>in the diff type.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
lib/collection/src/collection/collection_ops.rs(4 hunks)lib/collection/src/collection/mod.rs(3 hunks)lib/collection/src/collection_manager/optimizers/config_mismatch_optimizer.rs(2 hunks)lib/collection/src/collection_manager/optimizers/segment_optimizer.rs(1 hunks)lib/collection/src/config.rs(2 hunks)lib/collection/src/operations/config_diff.rs(11 hunks)lib/collection/src/operations/types.rs(2 hunks)lib/storage/src/content_manager/toc/create_collection.rs(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: integration-tests-consensus
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: e2e-tests
- GitHub Check: test-consistency
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: integration-tests
- GitHub Check: storage-compat-test
- GitHub Check: test-consensus-compose
- GitHub Check: lint
🔇 Additional comments (19)
lib/collection/src/operations/types.rs (1)
52-52: Import tweak looks correct.Localizing ClockTag import removes the unused config_diff import coupling.
lib/collection/src/collection_manager/optimizers/segment_optimizer.rs (1)
251-251: API alignment with borrowed diffs.Switch to update_opt(param_hnsw.as_ref()) correctly follows the new DiffConfig API and avoids needless cloning.
lib/collection/src/collection/collection_ops.rs (3)
37-37: Borrowed-diff update for params LGTM.Infallible update(¶ms_diff) is consistent with the macro-generated DiffConfig.
55-55: Borrowed-diff update for HNSW config LGTM.
107-107: Borrowed-diff update for optimizers LGTM.lib/storage/src/content_manager/toc/create_collection.rs (4)
167-167: WAL config: update_opt(...as_ref()) LGTM.
169-173: Optimizers config: update_opt(...as_ref()) LGTM.
174-177: HNSW config: update_opt(...as_ref()) LGTM.
196-197: Strict mode: borrowed update LGTM.Confirm that StrictModeConfig::update(&StrictModeConfig) is intended (no separate Diff type).
lib/collection/src/config.rs (1)
285-288: Validation path uses borrowed per‑vector HNSW diff correctly.update_opt(vector_config.hnsw_config.as_ref()) matches the new API and preserves behavior.
lib/collection/src/collection/mod.rs (3)
135-136: Borrowed-diff optimizer overwrite in new() LGTM.
256-257: Borrowed-diff optimizer overwrite in load() LGTM.
848-849: Borrowed-diff in effective_optimizers_config() LGTM.lib/collection/src/collection_manager/optimizers/config_mismatch_optimizer.rs (2)
119-121: Good switch to borrowed diff with as_ref()Passing
Option<&HnswConfigDiff>intoupdate_optis correct and aligns with the new API.
582-582: Tests updated to new update(&Diff) signatureUsing
hnsw_config_collection.update(&hnsw_config_vectorN)is correct with the borrow-based API.Also applies to: 587-587
lib/collection/src/operations/config_diff.rs (4)
18-31: Borrow-basedDiffConfigAPI looks goodTrait shape and
update_optdefault are sound. Call sites can now pass&DifforOption<&Diff>without allocations.
326-352: ConfirmStrictModeConfigis intended as its own Diff typeThis macro arm implements
DiffConfig<StrictModeConfig> for StrictModeConfig. This works if all listed fields are alreadyOption<...>(diff semantics). If some are plain values, the merge semantics will treatdiff.$fieldas a plain value, not anOption, which does not matchDiffConfigMerge’s expectedOption<T>. Please confirm the field shapes or adjust to a dedicatedStrictModeConfigDiff.
354-378: ManualFrom<OptimizersConfig>mapping looks correctCovers special handling for
max_optimization_threadsand deprecatedmemmap_threshold. Matches the tests.
461-466: Tests correctly migrated to borrowed-diff APIUsing
update(&diff)across types looks consistent and validates the new merge behavior, includingMaxOptimizationThreads.Also applies to: 472-474, 490-492, 513-517, 523-525
| impl DiffConfig<$TDiff> for $TSelf { | ||
| fn update(&self, diff: &$TDiff) -> Self { | ||
| let $TDiff { | ||
| $($field: _,)* | ||
| } = diff; // Make sure that we did not miss any field | ||
| $TSelf { | ||
| $($field: DiffConfigMerge::merge(&self.$field, &diff.$field),)* | ||
| $($ignored_field: self.$ignored_field.clone(),)* | ||
| } | ||
| } | ||
| } |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| impl From<$TSelf> for $TDiff { | ||
| fn from(config: $TSelf) -> Self { | ||
| let $TSelf { | ||
| $($field: _,)* | ||
| $($ignored_field: _,)* | ||
| } = config; // Make sure that we did not miss any field | ||
| $TDiff { | ||
| $($field: Option::from(config.$field),)* | ||
| } | ||
| } | ||
| } |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
I understand the benefit of having explicit approach with compile-time errors if we forget something. But why can't we just write the code instead of macro? Please don't tell me that macros are simpler. |
Sure. |
generall
left a comment
There was a problem hiding this comment.
It is even less boilerplate than I expected, thanks!
|
I liked the macro more 🥲 But let's merge it like this! |
879b85f to
47cf1fc
Compare
d2cfcd6 to
95979ee
Compare
47da93e to
0693595
Compare
95979ee to
ee24c30
Compare
📝 WalkthroughWalkthroughThis PR removes the merge crate dependency from workspace crates and refactors config-diff handling to use explicit, reference-based update semantics. The DiffConfig trait is redesigned: update now takes &Diff and returns Self (no Result), update_opt takes Option<&Diff>, and Merge-derived behavior and helpers are removed. Call sites are updated accordingly across collection, optimizers, and storage code. Several structs drop Merge derives; tests and logic now pass references. StrictModeSparse gains a new optional max_length field with validation. Error propagation from config updates is eliminated where update now returns plain values. Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🧪 Early access (Sonnet 4.5): enabledWe are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience. Note:
Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/collection/src/collection/mod.rs (1)
844-852: LGTM! Consider simplifying return type in future refactor.The update call correctly passes
&optimizers_overwriteby reference and wraps the infallible result inOk(), consistent with the newDiffConfig::updatesignature.Since
updateis now infallible, this function could potentially returnOptimizersConfigdirectly instead ofCollectionResult<OptimizersConfig>to better reflect that no error can occur. However, keeping theResulttype may be intentional for API consistency or future extensibility.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (12)
Cargo.toml(0 hunks)lib/collection/Cargo.toml(0 hunks)lib/collection/src/collection/collection_ops.rs(4 hunks)lib/collection/src/collection/mod.rs(3 hunks)lib/collection/src/collection_manager/optimizers/config_mismatch_optimizer.rs(2 hunks)lib/collection/src/collection_manager/optimizers/segment_optimizer.rs(1 hunks)lib/collection/src/config.rs(2 hunks)lib/collection/src/operations/config_diff.rs(10 hunks)lib/collection/src/operations/types.rs(3 hunks)lib/segment/Cargo.toml(0 hunks)lib/segment/src/types.rs(3 hunks)lib/storage/src/content_manager/toc/create_collection.rs(2 hunks)
💤 Files with no reviewable changes (3)
- lib/collection/Cargo.toml
- Cargo.toml
- lib/segment/Cargo.toml
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/*.rs: Prefer explicit SomeType::from(x) over implicit x.into() in Rust code
Do not use transmute_from_u8, transmute_to_u8, transmute_from_u8_to_slice, transmute_from_u8_to_mut_slice, transmute_to_u8_slice in new code; use bytemuck or zerocopy instead
Files:
lib/collection/src/collection/collection_ops.rslib/collection/src/config.rslib/collection/src/collection/mod.rslib/collection/src/collection_manager/optimizers/segment_optimizer.rslib/storage/src/content_manager/toc/create_collection.rslib/collection/src/collection_manager/optimizers/config_mismatch_optimizer.rslib/segment/src/types.rslib/collection/src/operations/types.rslib/collection/src/operations/config_diff.rs
**/src/**/*.rs
📄 CodeRabbit inference engine (.github/review-rules.md)
**/src/**/*.rs: Prefer exhaustive match arms over a catch-all _ arm to avoid missing new enum variants (except in tests/benchmarks or when provably safe)
Prefer explicit field ignoring with : _ over .. in struct patterns (except in tests/benchmarks or when provably safe)
Files:
lib/collection/src/collection/collection_ops.rslib/collection/src/config.rslib/collection/src/collection/mod.rslib/collection/src/collection_manager/optimizers/segment_optimizer.rslib/storage/src/content_manager/toc/create_collection.rslib/collection/src/collection_manager/optimizers/config_mismatch_optimizer.rslib/segment/src/types.rslib/collection/src/operations/types.rslib/collection/src/operations/config_diff.rs
🧬 Code graph analysis (2)
lib/collection/src/collection_manager/optimizers/config_mismatch_optimizer.rs (1)
lib/collection/src/operations/config_diff.rs (7)
update(23-23)update(207-227)update(231-251)update(255-279)update(283-295)update(299-318)update(322-376)
lib/collection/src/operations/config_diff.rs (2)
lib/collection/src/config.rs (1)
from(52-63)lib/storage/src/content_manager/collection_meta_ops.rs (4)
from(85-87)from(91-93)from(97-99)from(431-472)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: e2e-tests
- GitHub Check: test-consistency
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-consensus-compose
- GitHub Check: integration-tests-consensus
- GitHub Check: integration-tests
- GitHub Check: lint
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: storage-compat-test
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
🔇 Additional comments (33)
lib/collection/src/operations/types.rs (3)
52-52: LGTM: Import simplified correctly.The import was simplified from
super::{ClockTag, config_diff::*}to justsuper::ClockTag. Theconfig_diffitems are imported at line 55, so this change is correct.
1872-1872: LGTM: Merge trait removed as intended.The
Mergetrait has been correctly removed from the derive list forVectorParamsDiff, consistent with the PR's goal to replace serde-based implementations with explicit ones.
1535-1537: Verification complete: the default comparison is correct.The
HnswConfigDiffstruct derivesDefault(line 34 inlib/collection/src/operations/config_diff.rs) and all its fields areOptiontypes (m,ef_construct,full_scan_threshold,max_indexing_threads,on_disk, etc.). WhenDefault::default()is called onHnswConfigDiff, allOptionfields are initialized toNone, which correctly represents an empty diff with no changes. The comparison*hnsw_config == Some(HnswConfigDiff::default())accurately identifies empty diffs.lib/collection/src/collection/collection_ops.rs (4)
31-41: LGTM: Update method now uses reference semantics.The call to
config.params.update()now correctly passes¶ms_diffinstead of an owned value, consistent with the new DiffConfig trait design whereupdatetakes&Diffand returnsSelfinfallibly.
48-58: LGTM: HNSW config update uses reference semantics.The update call correctly passes
&hnsw_config_diff, aligning with the new reference-based DiffConfig API.
101-111: LGTM: Optimizer config update uses reference semantics.The update call correctly passes
&optimizer_config_diff, consistent with the new DiffConfig trait signature.
182-203: LGTM: Strict mode config update uses reference semantics.The update call at line 189 correctly passes
&strict_mode_diffinstead of an owned value. The pattern correctly updates the existing config in-place when present, or inserts the diff as a new config when absent.lib/storage/src/content_manager/toc/create_collection.rs (4)
167-167: LGTM: WAL config update uses reference via as_ref().The call to
update_optnow correctly passeswal_config_diff.as_ref()to convertOption<WalConfigDiff>toOption<&WalConfigDiff>, consistent with the new reference-based DiffConfig API.
169-172: LGTM: Optimizer config update uses reference via as_ref().The call to
update_optcorrectly passesoptimizers_config_diff.as_ref(), aligning with the reference-based update semantics.
174-177: LGTM: HNSW config update uses reference via as_ref().The call to
update_optcorrectly passeshnsw_config_diff.as_ref(). The reformatting to multiple lines improves readability while maintaining correctness.
188-204: LGTM: Strict mode config update uses reference semantics.At line 196, the update call correctly passes
&diffinstead of an owned value, consistent with the new DiffConfig trait design. The logic properly merges the diff with default config when provided.lib/collection/src/operations/config_diff.rs (13)
18-31: LGTM: Trait signature aligns with infallible, reference-based update semantics.The removal of
Resultand the switch to borrowing&Diffare consistent with the PR's goal of eliminating JSON serialization errors and making updates infallible.
33-161: LGTM: Merge derive removed as intended.Removal of
Mergederives aligns with the PR's shift to explicit implementations, eliminating reliance on themergecrate.
206-228: LGTM: Field-wise update logic is correct.The destructuring ensures exhaustive field coverage. The use of
unwrap_orfor required fields and.orfor optional fields correctly merges the diff with the base config.
230-252: LGTM: Diff-on-diff update logic is correct.Using
.orfor all fields is appropriate for merging two diff structs, where all fields areOption<T>.
254-280: LGTM: Optimizer config update logic is correct.Field destructuring is exhaustive. The use of
unwrap_orfor required fields,.orfor optional fields, andmap_orwithFrom::fromformax_optimization_threadscorrectly handles the merge and type conversion.
282-296: LGTM: WalConfig update logic is correct.All fields are destructured and merged using
unwrap_or, which is appropriate for the required fields.
298-319: LGTM: CollectionParams update logic is correct.The diff fields are merged correctly, and immutable fields (
shard_number,sharding_method,sparse_vectors,vectors) are preserved from the base config. Cloning ofsparse_vectorsandvectorsis necessary.
321-377: LGTM: StrictModeConfig update logic is correct.All fields use
.orto merge optional settings. The pattern.as_ref().or(self.X.as_ref()).cloned()correctly handlesOption<T>fields whereT: Clone.
379-401: LGTM: HnswConfig to HnswConfigDiff conversion is correct.Exhaustive destructuring ensures all fields are accounted for. Required fields are wrapped in
Some, and optional fields are passed through as-is.
403-417: LGTM: WalConfig to WalConfigDiff conversion is correct.All fields are destructured and wrapped in
Someto create the diff representation.
419-439: LGTM: CollectionParams to CollectionParamsDiff conversion is correct.Fields that are part of the diff are correctly wrapped in
Someor passed through. Fields not part of the diff are explicitly ignored with_, which satisfies the coding guideline to prefer explicit field ignoring.
441-467: LGTM: OptimizersConfig to OptimizersConfigDiff conversion is correct.All fields are handled appropriately. The mapping of
max_optimization_threadsfromOption<usize>toOption<MaxOptimizationThreads>usingmap(MaxOptimizationThreads::Threads)is correct.
501-591: LGTM: Tests updated to match new signature.All test calls to
.update()now pass references to diffs, which is consistent with the new trait signature.lib/collection/src/config.rs (2)
284-286: LGTM: Reference passed correctly to update_opt.The use of
.as_ref()to convertOption<HnswConfig>toOption<&HnswConfig>aligns with the new signature ofupdate_opt.
454-454: LGTM: Reference passed correctly to update.The call to
.update(&hnsw_diff)correctly passes a reference to the diff, consistent with the new trait signature.lib/collection/src/collection_manager/optimizers/segment_optimizer.rs (1)
257-257: LGTM: Reference passed correctly to update_opt.The use of
.as_ref()correctly convertsOption<HnswConfig>toOption<&HnswConfig>for the newupdate_optsignature.lib/collection/src/collection_manager/optimizers/config_mismatch_optimizer.rs (2)
119-121: LGTM! Reference-based update aligns with DiffConfig refactor.The use of
as_ref()correctly borrows thehnsw_configinstead of consuming it, which aligns with the newDiffConfig::update_optsignature that takesOption<&Diff>. This preserves ownership of the original config invector_params.
582-587: LGTM! Test assertions correctly use reference-based update.The test assertions now pass references (
&hnsw_config_vector1,&hnsw_config_vector2) to theupdatemethod, which aligns with the newDiffConfig::updatesignature that takes&Diff. The test logic remains sound: it verifies that the segment's effective HNSW config matches the expected result of merging the collection-level config with vector-specific overrides.lib/segment/src/types.rs (2)
868-874: LGTM! Newmax_lengthfield is properly validated.The addition of the
max_lengthfield toStrictModeSparseis well-implemented with appropriate validation (range(min = 1)), serde skip configuration, and consistent with similar fields in the codebase.
868-882: Removal ofMergetrait is correct and complete.The
Mergetrait has been successfully removed fromStrictModeSparse,StrictModeSparseConfig,StrictModeMultivector,StrictModeMultivectorConfig, andStrictModeConfig. The explicitDiffConfigimplementation forStrictModeConfig(atlib/collection/src/operations/config_diff.rs:321-375) properly handles the nested configuration structs usingOption::or()logic, replacing them atomically rather than merging field-by-field. No individualDiffConfigimplementations are needed for the nested structs, as they are treated as complete units during configuration updates.lib/collection/src/collection/mod.rs (2)
132-136: LGTM! Refactor aligns with infallible DiffConfig trait.The update call now passes
&optimizers_overwriteby reference and assigns the result directly without error propagation, consistent with the new infallibleDiffConfig::updatesignature.
253-257: LGTM! Refactor aligns with infallible DiffConfig trait.The update call now passes
&optimizers_overwriteby reference and assigns the result directly without error propagation, consistent with the new infallibleDiffConfig::updatesignature.
* refactor: replace serde-based DiffConfig implementation with macro * refactor: macroexpand impl_diff_config * refactor: drop `merge` dependency as unused
This PR replaces serde-based
DiffConfigimplementation with explicit implementation. Depends on #7293.Issue
In
dev, we have this type coercion that copies fields from*ConfigDiffstructs into*Configstructs:It's hacky indeed. It's error-prone: missing/wrong field names not caught by a compiler). And it's makes the code hard to explore: rust-analyzer "find usages" feature will newer help you find out that
HnswConfig::copy_vectorscan be populated throughHnswConfigDiff::copy_vectors.Solution
This PR replaces this coercion-via-json with a few hand-rolled implementations of
impl DiffConfigandimpl From. Additionally,DiffConfigtrait methods are infallible now (as there no JSON serialization/serialization errors possible).Discarded approaches
Q: What about generating
*ConfigDifffrom*Configdefinition?A: I decided to keep them explicitly hand-rolled as these structures have some subtle differences, e.g., see the following diff. While it's possible encode these differences via proc macro and attributes, I find it less maintainable. And probably
macro_rulesgive better LSP experience than proc macros.Previous version of this PR were using macro_rules.
old description
This PR replaces this coercion-via-json with a
impl_diff_configthat macro that produces the conversion boilerplate. Any missing field will result in a compile error; and "find usages" onHnswConfigfields will find this definition. Additionally,DiffConfigtrait methods are infallible now (as there no JSON serialization/serialization errors possible). Also, the macro syntax mimics real Rust syntax, thusrustfmtwill format it correctly.