Conversation
📝 WalkthroughWalkthroughAdds an optional update_filter: Filter to point upsert and vector-update message/schema types across gRPC proto, OpenAPI, and Rust REST types. The filter is converted and propagated through tonic gRPC converters, API handlers, operation construction, shard conversions, remote shard forwarding, and collection manager logic. Introduces ConditionalInsertOperationInternal and PointOperations::UpsertPointsConditional, new conditional_upsert and update_vectors_conditional code paths, a helper to compute excluded IDs by filter, updated sharding/splitting logic, RBAC checks, validation entries, tests (unit and OpenAPI) and small test updates to include the new field. Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (32)
lib/collection/src/operations/mod.rs (1)
443-446: Exercise both presence/absence ofupdate_filterin proptestsRight now, the arbitrary
UpdateVectorscase always usesupdate_filter: None. Generating cases withSome(Filter)too will better cover serialization/deserialization and downstream logic.- let update = Self::UpdateVectors(UpdateVectorsOp { - points: Vec::new(), - update_filter: None, - }); + let update = any::<Option<Filter>>().prop_map(|update_filter| { + Self::UpdateVectors(UpdateVectorsOp { + points: Vec::new(), + update_filter, + }) + }); @@ - prop_oneof![Just(update), Just(delete), Just(delete_by_filter),].boxed() + prop_oneof![update, Just(delete), Just(delete_by_filter),].boxed()lib/api/src/rest/schema.rs (2)
1192-1195: Document behavior and acceptupdate_ifalias for UpdateVectorsFor vector updates, non-matching points must be skipped (no inserts happen for vector ops). Add docs and
alias = "update_if"for a smooth transition.- #[serde(default, skip_serializing_if = "Option::is_none")] - #[validate(nested)] - pub update_filter: Option<Filter>, + /// If specified, only points that match this filter will be updated. + /// Points that do not match will be skipped (vector updates never insert). + #[serde(default, skip_serializing_if = "Option::is_none", alias = "update_if")] + #[validate(nested)] + pub update_filter: Option<Filter>,
1203-1206: Clarify upsert wording and addupdate_ifalias for PointsListRefine the wording to avoid implying inserts for existing non-matching IDs and add a serde alias for backward compatibility.
- /// If specified, only points that match this filter will be updated, others will be inserted - #[serde(default, skip_serializing_if = "Option::is_none")] - #[validate(nested)] - pub update_filter: Option<Filter>, + /// If specified, existing points that match this filter will be updated. + /// Existing points that do not match are left unchanged. + /// Non-existing points in the request will be inserted. + #[serde(default, skip_serializing_if = "Option::is_none", alias = "update_if")] + #[validate(nested)] + pub update_filter: Option<Filter>,lib/collection/src/operations/vector_ops.rs (2)
42-45: Add rustdoc clarifying vector-update semanticsHelpful to spell out that non-matching points are skipped (no inserts for vector updates).
pub struct UpdateVectorsOp { /// Points with named vectors pub points: Vec<PointVectorsPersisted>, - /// Condition to check before updating vectors + /// Condition to check before updating vectors. + /// Points that do not match the filter will be skipped (vector updates never insert). #[serde(default, skip_serializing_if = "Option::is_none")] pub update_filter: Option<Filter>, }
97-125: Preserveupdate_filter(correct) and simplify sharding logicCurrent logic correctly clones
update_filterper shard. You can reduce duplication by reusingsplit_iter_by_shardforPointVectorsPersisted.Add this impl:
+impl SplitByShard for Vec<PointVectorsPersisted> { + fn split_by_shard(self, ring: &HashRingRouter) -> OperationToShard<Self> { + split_iter_by_shard(self, |point| point.id, ring) + } +}Then simplify the match arm:
- VectorOperations::UpdateVectors(UpdateVectorsOp { - points, - update_filter, - }) => { - let shard_points = points - .into_iter() - .flat_map(|point| { - point_to_shards(&point.id, ring) - .into_iter() - .map(move |shard_id| (shard_id, point.clone())) - }) - .fold( - AHashMap::new(), - |mut map: AHashMap<u32, Vec<PointVectorsPersisted>>, (shard_id, points)| { - map.entry(shard_id).or_default().push(points); - map - }, - ); - let shard_ops = shard_points.into_iter().map(|(shard_id, points)| { - ( - shard_id, - VectorOperations::UpdateVectors(UpdateVectorsOp { - points, - update_filter: update_filter.clone(), - }), - ) - }); - OperationToShard::by_shard(shard_ops) - } + VectorOperations::UpdateVectors(UpdateVectorsOp { points, update_filter }) => { + points + .split_by_shard(ring) + .map(|points| { + VectorOperations::UpdateVectors(UpdateVectorsOp { + points, + update_filter: update_filter.clone(), + }) + }) + }lib/segment/src/types.rs (3)
3298-3324: Guard empty ids and make the builder harder to misuse
- If ids is empty, this produces an unsatisfiable filter. Prefer a no-op early return to avoid surprising behavior.
- Add #[must_use] so callers don't accidentally drop the returned Filter.
Apply this diff:
- /// Create an extended filtering condition, which would also include filter by given list of IDs. - pub fn merge_with_ids(self, ids: impl IntoIterator<Item = PointIdType>) -> Filter { + /// Create an extended filtering condition, which would also include filter by given list of IDs. + #[must_use] + pub fn merge_with_ids(self, ids: impl IntoIterator<Item = PointIdType>) -> Filter { let has_id_condition: HasIdCondition = ids.into_iter().collect(); + // No IDs supplied — leave the filter unchanged + if has_id_condition.has_id.is_empty() { + return self; + } let Filter { should, min_should, must, must_not, } = self;
3298-3324: Avoid duplicating HasId in must by intersecting with existing one (optional)If must already contains a Condition::HasId, consider intersecting the existing set with the new ids instead of pushing another condition. This:
- keeps condition count smaller (helps filter_max_conditions limits),
- reduces planner work, and
- makes intent explicit (restrict to IDs ∩ new_ids).
If implementing, ensure:
- Intersection is computed on the underlying AHashSet,
- Large sets retain Arc via MaybeArc threshold,
- If the intersection is empty, the filter becomes unsatisfiable (or consider returning a known-empty sentinel if you have one).
Would you like me to draft a concrete patch that performs the intersection while preserving MaybeArc semantics?
3298-3324: Add unit tests for merge_with_ids semanticsCover at least:
- Adds HasId to must while preserving should/min_should/must_not.
- No-op when ids is empty (after the early-return change).
- Optional: Intersects with existing HasId if you adopt the refactor.
Here is a test scaffold you can drop into the existing tests module in this file:
#[test] fn test_merge_with_ids_adds_must() { let base = Filter { should: Some(vec![Condition::Field(FieldCondition::new_match( JsonPath::new("k"), Match::new_value(ValueVariants::String("v".into())), ))]), min_should: None, must: None, must_not: Some(vec![Condition::IsNull(JsonPath::new("n").into())]), }; let merged = base.clone().merge_with_ids([ExtendedPointId::from(1_u64), ExtendedPointId::from(2_u64)]); assert!(merged.must.is_some()); assert_eq!(merged.must.as_ref().unwrap().len(), 1); assert!(matches!(merged.must.as_ref().unwrap()[0], Condition::HasId(_))); // Preserve others assert_eq!(merged.should.as_ref().unwrap().len(), base.should.as_ref().unwrap().len()); assert_eq!(merged.must_not.as_ref().unwrap().len(), base.must_not.as_ref().unwrap().len()); } #[test] fn test_merge_with_ids_empty_is_noop() { let base = Filter::new_must(Condition::Field(FieldCondition::new_is_empty(JsonPath::new("f"), true))); let merged = base.clone().merge_with_ids(std::iter::empty()); assert_eq!(merged, base); }lib/api/src/grpc/proto/points.proto (2)
138-139: Clarify semantics for existing IDs that fail the filterThe comment says “others will be inserted”, which can be misread when the point already exists but doesn’t match the filter. Consider clarifying that:
- existing points not matching the filter are left unchanged (not re-inserted),
- non-existing points are inserted.
This helps avoid ambiguity across clients.
799-800: Mirror the upsert comment clarificationSame nit as above: explicitly document behavior for existing IDs that don’t satisfy the filter to avoid misinterpretation by client implementers.
tests/openapi/test_conditional_update.py (3)
1-1: Remove unused importitemgetter is imported but unused. Please remove to keep tests clean.
-from operator import itemgetter
19-20: Fix misleading commentsComments say “Upsert and delete points”, but these blocks perform conditional upserts/updates. Reword to something like “Conditionally upsert point” / “Conditionally update vectors”.
Also applies to: 34-35, 77-78, 119-121
144-155: Make vector presence explicit in GETTo avoid relying on default behavior, consider requesting vectors explicitly in the GET (e.g., query param to include vectors) so the assertion on result["vector"] is robust across configs.
lib/collection/src/shards/remote_shard.rs (1)
246-257: Optional: de-duplicate upsert code pathsUpsert and conditional upsert branches are identical except for the converter. Consider extracting a small helper that builds the request from either variant, reducing duplication and potential drift.
Also applies to: 498-511
lib/storage/src/rbac/ops_checks.rs (1)
1278-1279: Vector Update test updated for new fieldInitializing update_filter: None keeps tests compiling against the expanded struct. Consider adding a positive test with a non-None filter later.
src/common/update.rs (2)
248-269: Propagate update_filter: OK; consider renaming locals to reduce shadowingLogic looks correct and preserves shard_key/inference usage. Minor readability tweak: avoid reusing the name operation for multiple different types in this function.
Suggested rename across these lines and the subsequent construction below:
- let (operation, shard_key, usage, update_filter) = match operation { + let (points_op, shard_key, usage, update_filter) = match operation { @@ - let operation = PointInsertOperationsInternal::PointsBatch(batch); - (operation, shard_key, usage, update_filter) + let points_op = PointInsertOperationsInternal::PointsBatch(batch); + (points_op, shard_key, usage, update_filter) @@ - let operation = PointInsertOperationsInternal::PointsList(list); - (operation, shard_key, usage, update_filter) + let points_op = PointInsertOperationsInternal::PointsList(list); + (points_op, shard_key, usage, update_filter)And in the block below (Lines 272-281):
- let operation = if let Some(condition) = update_filter { + let collection_op = if let Some(condition) = update_filter { CollectionUpdateOperations::PointOperation(PointOperations::UpsertPointsConditional( ConditionalInsertOperationInternal { - points_op: operation, + points_op, condition, }, )) } else { - CollectionUpdateOperations::PointOperation(PointOperations::UpsertPoints(operation)) + CollectionUpdateOperations::PointOperation(PointOperations::UpsertPoints(points_op)) };Then pass collection_op to update().
360-364: Carry update_filter into UpdateVectorsOp: LGTM; optional naming nitCorrect propagation into UpdateVectorsOp. Optional: rename local operation to vector_op or collection_op for clarity when reading alongside other functions.
- let operation = CollectionUpdateOperations::VectorOperation(VectorOperations::UpdateVectors( + let collection_op = CollectionUpdateOperations::VectorOperation(VectorOperations::UpdateVectors( UpdateVectorsOp { points, update_filter, }, )); @@ - let result = update( + let result = update( toc, &collection_name, - operation, + collection_op, internal_params, params, shard_key, access, hw_measurement_acc, )docs/grpc/docs.md (4)
3880-3881: Add explicit semantics example for upsert update_filterText is clear. Consider adding a short example clarifying that for upsert:
- existing points matching the filter are updated,
- existing points not matching the filter are left unchanged,
- non-existing points are inserted regardless of filter.
3931-3932: Clarify interaction between UpdateVectors points list and update_filterState explicitly that update_filter is ANDed with the provided point IDs, i.e., only those point IDs in the request that also match the filter will be updated; others are skipped.
4910-4910: Mirror UpdateVectors semantics on the RPC requestRepeat the AND semantics with provided point IDs here to avoid ambiguity at the service layer. This helps client authors avoid assuming the filter is applied collection-wide.
4946-4946: Mention strict-mode/indexing implications for update_filterConsider a short note: when strict mode enforces unindexed_filtering_update=false, update_filter conditions should target indexed fields; otherwise, the request may be rejected. This helps set user expectations.
src/tonic/api/update_common.rs (3)
62-65: Filter conversion duplicated across handlers — extract helper to reduce repetitionThe TryFrom + transpose pattern is correct. It’s duplicated in multiple places (upsert, update_vectors, batch branches). Consider a small helper to centralize conversion.
Add helper (outside the changed range):
fn try_convert_filter( f: Option<grpc::qdrant::Filter>, ) -> Result<Option<segment::types::Filter>, Status> { f.map(segment::types::Filter::try_from).transpose() }Then here replace with:
- update_filter: update_filter - .map(segment::types::Filter::try_from) - .transpose()?, + update_filter: try_convert_filter(update_filter)?,Repeat similarly in update_vectors and update_batch branches below.
160-163: Use the same helper for update_filter conversionSame duplication here; apply the try_convert_filter helper if you adopt it.
- update_filter: update_filter - .map(segment::types::Filter::try_from) - .transpose()?, + update_filter: try_convert_filter(update_filter)?,
558-562: Batch UpdateVectors branch: correct pass-through of update_filterLooks good and consistent with the non-batch handler. Consider adding an integration test that:
- sends multiple UpdateVectors ops where only a subset meets the filter,
- verifies only those are updated.
I can draft these tests if helpful.
Also applies to: 572-573
lib/api/src/grpc/qdrant.rs (2)
6028-6031: Clarify nested upsert semantics; consider enabling validation here too
- Suggest revising the doc to clarify behavior for existing non-matching points (left unchanged) vs inserts.
- For consistency with top-level messages, consider deriving
validator::Validatefor this struct in the proto and annotatingupdate_filterwith#[validate(nested)].Example doc tweak:
- /// If specified, only points that match this filter will be updated, others will be inserted + /// Conditional upsert. Update existing points that match this filter. + /// Insert points that don't exist. Existing non-matching points are left unchanged.If you decide to validate here as well, update the proto so this message derives
validator::Validateand the field is marked for nested validation (then re-generate).
6093-6096: Make nested vector-update semantics explicit; optionally enable nested validation
- Clarify that non-matching points are ignored (no insert).
- For parity with top-level
UpdatePointVectors, consider derivingvalidator::Validatefor this struct in the proto and adding#[validate(nested)]to the field.Suggested doc tweak:
- /// If specified, only points that match this filter will be updated + /// Conditional vector update. Only existing points matching this filter will be updated; others are ignored.lib/collection/src/shards/conversions.rs (1)
88-119: Reduce duplication between upsert builders
internal_conditional_upsert_pointslargely duplicatesinternal_upsert_pointsconstruction logic except forupdate_filter. Consider factoring a small helper that buildsUpsertPointsfrom common parts and takes anOption<Filter>.This will centralize conversions and reduce maintenance cost.
lib/collection/src/operations/point_ops.rs (3)
486-492: New ConditionalInsertOperationInternal: aligns with PR goalsStructure is clear and minimal. Consider adding a brief doc comment clarifying semantics: inserts always; updates only if condition matches existing point.
639-656: Payload updates: filter + id merge is the right semanticsUsing
update_filter.merge_with_ids([id])ensures the payload update applies only when the condition holds for that specific point. This matches the conditional update intent.Optional micro-optimization: avoid cloning the whole
Filterper point by holdinglet base = &update_filter;and cloning only once insidemerge_with_ids.Also applies to: 681-695
1128-1151: Tests updated for new field: good baselineIncluding
update_filter: Nonein REST tests keeps schema alignment. Consider adding tests that exercise non-Noneupdate_filterin the appropriate suite (as planned).I can add a focused unit/integration test that validates conditional upsert behavior for both batch and list variants. Want me to draft it?
lib/collection/src/collection_manager/segments_updater.rs (2)
600-618: Filter inversion to find excluded IDs is correct and efficient
must = HasId(ids)∧must_not = Filter(condition)yields exactly the set of provided IDs that do not satisfy the condition. Good reuse ofpoints_by_filter.Minor nit: rename closure argument in later usage from
idxtoidfor readability.
619-642: Conditional upsert semantics implemented as intended
- Only existing points are tested against the condition.
- Non-matching existing points are dropped from the op.
- Non-existing points are inserted unconditionally.
This aligns with the described behavior in the PR summary. One small readability tweak:
- points_op.retain_point_ids(|idx| !points_to_exclude.contains(idx)); + points_op.retain_point_ids(|id| !points_to_exclude.contains(id));
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
docs/grpc/docs.md(4 hunks)docs/redoc/master/openapi.json(3 hunks)lib/api/src/grpc/proto/points.proto(4 hunks)lib/api/src/grpc/qdrant.rs(4 hunks)lib/api/src/rest/schema.rs(3 hunks)lib/collection/src/collection_manager/segments_updater.rs(5 hunks)lib/collection/src/operations/mod.rs(1 hunks)lib/collection/src/operations/operation_effect.rs(1 hunks)lib/collection/src/operations/point_ops.rs(15 hunks)lib/collection/src/operations/vector_ops.rs(3 hunks)lib/collection/src/shards/conversions.rs(5 hunks)lib/collection/src/shards/remote_shard.rs(3 hunks)lib/collection/src/shards/shard_holder/resharding.rs(2 hunks)lib/collection/src/tests/sparse_vectors_validation_tests.rs(1 hunks)lib/segment/src/types.rs(1 hunks)lib/storage/src/rbac/ops_checks.rs(4 hunks)src/common/update.rs(2 hunks)src/tonic/api/update_common.rs(7 hunks)tests/openapi/test_conditional_update.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
lib/collection/src/operations/operation_effect.rs (1)
lib/collection/src/collection_manager/segments_updater.rs (1)
conditional_upsert(619-641)
tests/openapi/test_conditional_update.py (2)
tests/openapi/helpers/helpers.py (1)
request_with_validation(41-116)tests/openapi/conftest.py (2)
on_disk_vectors(11-12)on_disk_payload(16-17)
🪛 Ruff (0.12.2)
tests/openapi/test_conditional_update.py
1-1: operator.itemgetter imported but unused
Remove unused import: operator.itemgetter
(F401)
🔇 Additional comments (21)
lib/collection/src/tests/sparse_vectors_validation_tests.rs (1)
72-73: Test update aligns with new API shapeIncluding
update_filter: Nonekeeps this test aligned with the new optional field and preserves existing validation behavior.lib/collection/src/operations/operation_effect.rs (1)
53-55: Handle conditional upsert in effect-area estimation (correct)Delegating to
conditional_upsert.points_op.estimate_effect_area()is the right choice; the impacted IDs are exactly those in the inner points op.lib/api/src/rest/schema.rs (1)
1169-1172: Add clarity toupdate_filterdocs and alias for backward compatibilityUpdated the doc comment to clearly state how
update_filterbehaves, and added a Serde alias so that incoming JSON using the oldupdate_iffield is still accepted:@@ lib/api/src/rest/schema.rs:1169 - /// If specified, only points that match this filter will be updated, others will be inserted - #[serde(default, skip_serializing_if = "Option::is_none")] - #[validate(nested)] - pub update_filter: Option<Filter>, + /// If specified, existing points that match this filter will be updated. + /// Existing points that do not match are left unchanged. + /// Non-existing points in the request will be inserted. + #[serde( + default, + skip_serializing_if = "Option::is_none", + alias = "update_if" + )] + #[validate(nested)] + pub update_filter: Option<Filter>,No other occurrences of
update_iforupdateFilterwere found in the codebase, so no further changes are needed.lib/api/src/grpc/proto/points.proto (2)
166-167: LGTM: update_filter for vector updatesAdding an optional filter to gate vector updates is consistent and backward compatible in proto3 (unknown fields are ignored by older clients/servers).
821-822: LGTM: update_filter on batch UpdateVectorsThe field placement and numbering look correct and consistent with the standalone UpdatePointVectors message.
lib/collection/src/shards/remote_shard.rs (1)
44-46: LGTM: imports updated for conditional upsert conversionThe new conversion is cleanly integrated alongside existing helpers.
lib/collection/src/shards/shard_holder/resharding.rs (2)
11-14: LGTM: imports for new conditional insert typesImports are minimal and scoped.
522-557: Correct handling of conditional upsert in update-only modeDestructuring ConditionalInsertOperationInternal and passing the condition into into_update_only(Some(condition)) is the right abstraction and preserves semantics during resharding.
lib/storage/src/rbac/ops_checks.rs (2)
454-456: RBAC parity for conditional upsertTreating UpsertPointsConditional the same as UpsertPoints (whole-write access) is consistent and avoids privilege escalation via filters.
1186-1203: Test adds coverage for conditional upsert RBACGood addition to assert whole-write access for the new discriminant.
src/common/update.rs (1)
350-355: Destructure UpdateVectors incl. update_filter: LGTMClean extraction of update_filter alongside points/shard_key.
src/tonic/api/update_common.rs (3)
54-55: gRPC Upsert: new update_filter plumbed through: LGTMField extraction looks correct.
140-141: gRPC UpdateVectors: new update_filter plumbed through: LGTMExtraction aligns with proto.
426-430: No action needed: Filter TryFrom error type converts to tonic::Status
Verified that inlib/api/src/grpc/conversions.rstheimpl TryFrom<Filter> for segment::types::Filterdefinestype Error = Status, ensuring any conversion error is already atonic::Status. The upsert pass-through at lines 426–430 (and similarly at 439–440) correctly propagatesupdate_filter, so no changes are required here.lib/collection/src/shards/conversions.rs (2)
83-86: Explicitly setting update_filter: None is goodClear and future-proof in case defaults change on the proto side. LGTM.
177-181: Vector update: allgrpc::Filter::fromusages are intentionalA repo-wide search shows all occurrences of
grpc::Filter::fromlive in either:
- Auto‐generated GRPC/REST schema code
- RBAC/operation checks
- The two conversion sites in
lib/collection/src/shards/conversions.rs(lines ~116 & ~200)Those two spots correctly map the internal filter into the outgoing proto. No other
grpc::Filter::fromcalls should be changed.lib/collection/src/operations/point_ops.rs (3)
571-583: Propagating update_filter into UpdateVectorsOp is correctNice job threading the optional filter through the vector updates for both batch and list paths.
Also applies to: 669-673
741-770: Sharding support for ConditionalInsertOperationInternal looks solidCondition is cloned per shard; points are split correctly. LGTM.
787-798: Enum wiring and trait impls updated comprehensively
- New
UpsertPointsConditionalvariant added and handled in is_write_operation/point_ids/retain_point_ids.- split_by_shard routing added.
All consistent. LGTM.
Also applies to: 801-835, 1017-1039
lib/collection/src/collection_manager/segments_updater.rs (2)
105-128: Vector updates: conditionally gate updates by excluding non-matching IDsThis is a clean approach: compute excluded IDs and reuse existing strict
update_vectors(which enforces all-processed checks). Behavior when all are excluded returns 0 without error. LGTM.
654-660: Operation routing updated to use conditional pathsNew
UpsertPointsConditionaland conditional vector updates are correctly wired into processing. LGTM.Also applies to: 685-687
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/collection/src/operations/verification/update.rs(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: test-consistency
- GitHub Check: test-consensus-compose
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: integration-tests
- GitHub Check: integration-tests-consensus
- GitHub Check: e2e-tests
- GitHub Check: lint
- GitHub Check: storage-compat-test
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: rust-tests (ubuntu-latest)
🔇 Additional comments (1)
lib/collection/src/operations/verification/update.rs (1)
2-3: Imports align with new conditional update supportAdding
PointsBatchandPointsListis appropriate given the new pattern matches below. No issues.
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
lib/api/build.rs(1 hunks)lib/api/src/grpc/qdrant.rs(4 hunks)lib/collection/src/shards/conversions.rs(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- lib/api/src/grpc/qdrant.rs
- lib/collection/src/shards/conversions.rs
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: generall
PR: qdrant/qdrant#7006
File: lib/collection/src/shards/remote_shard.rs:258-268
Timestamp: 2025-08-10T18:22:33.034Z
Learning: In Qdrant, the conditional upsert feature with `update_filter` is designed to be rolled out only after a full cluster upgrade to the next minor release. Version compatibility guards in the code are not required because the feature deployment strategy ensures all nodes will support the feature before it's used, and it doesn't change existing functionality for users not using the conditional updates.
Learnt from: generall
PR: qdrant/qdrant#7006
File: src/common/update.rs:272-281
Timestamp: 2025-08-10T18:20:42.203Z
Learning: In Qdrant, conditional updates with `update_filter` always apply the condition check to a small, pre-determined set of point IDs (those being updated) and never perform a full collection scan. Therefore, strict mode validation for unindexed filtering is not necessary for conditional update operations, as the performance concerns that strict mode addresses don't apply to these targeted operations.
Learnt from: generall
PR: qdrant/qdrant#7006
File: docs/redoc/master/openapi.json:13645-13652
Timestamp: 2025-08-10T18:26:12.421Z
Learning: In qdrant, docs/redoc/master/openapi.json is auto-generated from the REST schemas. To change field docs, edit lib/api/src/rest/schema.rs (e.g., add doc comments or #[schemars(description = ...)]). Specifically, UpdateVectors.update_filter lacked a description and should state: "If specified, only update vectors for points that match this filter; points not matching the filter are left unchanged."
📚 Learning: 2025-08-10T18:26:12.421Z
Learnt from: generall
PR: qdrant/qdrant#7006
File: docs/redoc/master/openapi.json:13645-13652
Timestamp: 2025-08-10T18:26:12.421Z
Learning: In qdrant, docs/redoc/master/openapi.json is auto-generated from the REST schemas. To change field docs, edit lib/api/src/rest/schema.rs (e.g., add doc comments or #[schemars(description = ...)]). Specifically, UpdateVectors.update_filter lacked a description and should state: "If specified, only update vectors for points that match this filter; points not matching the filter are left unchanged."
Applied to files:
lib/api/build.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: e2e-tests
- GitHub Check: test-consistency
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: integration-tests-consensus
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: integration-tests
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
- GitHub Check: storage-compat-test
- GitHub Check: lint
|
How about naming it I understand that I find Thoughts? Edit: or (I've yet to review the PR) |
|
The condition can be not only on payload, but also on point id or has_vector. I would also be ok with 'update_condition'. |
There was a problem hiding this comment.
Actionable comments posted: 2
🔭 Outside diff range comments (2)
lib/api/src/grpc/qdrant.rs (2)
6021-6035: Batched upsert: add explicit per-id semantics to update_filter docGood call adding Validate + nested. The doc mirrors the ambiguity from the non-batch variant. Clarify behavior for existing-but-non-matching points.
Regenerate with this proto doc:
- /// If specified, only points that match this filter will be updated, others will be inserted + /// Conditional upsert per point id: + /// - If an existing point with the same id matches this filter, it will be updated. + /// - If no point with the id exists, it will be inserted. + /// - If an existing point with the same id does not match this filter, it will be left unchanged. + /// An empty filter matches all points.
6088-6102: Batched vector update: specify non-matching and non-existing behaviorValidation looks good. The doc should explicitly state that non-matching existing points are untouched and non-existing points are not inserted.
Update proto comment so the generated docs read:
- /// If specified, only points that match this filter will be updated + /// Conditional vector update. For each provided point: + /// - If an existing point with the same id matches this filter, its vectors will be updated. + /// - If an existing point with the same id does not match this filter, it will be left unchanged. + /// - Points that do not exist will not be inserted. + /// An empty filter matches all points.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/api/build.rs(2 hunks)lib/api/src/grpc/qdrant.rs(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- lib/api/build.rs
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: generall
PR: qdrant/qdrant#7006
File: lib/collection/src/shards/remote_shard.rs:258-268
Timestamp: 2025-08-10T18:22:33.034Z
Learning: In Qdrant, the conditional upsert feature with `update_filter` is designed to be rolled out only after a full cluster upgrade to the next minor release. Version compatibility guards in the code are not required because the feature deployment strategy ensures all nodes will support the feature before it's used, and it doesn't change existing functionality for users not using the conditional updates.
Learnt from: generall
PR: qdrant/qdrant#7006
File: src/common/update.rs:272-281
Timestamp: 2025-08-10T18:20:42.203Z
Learning: In Qdrant, conditional updates with `update_filter` always apply the condition check to a small, pre-determined set of point IDs (those being updated) and never perform a full collection scan. Therefore, strict mode validation for unindexed filtering is not necessary for conditional update operations, as the performance concerns that strict mode addresses don't apply to these targeted operations.
📚 Learning: 2025-08-10T18:26:12.421Z
Learnt from: generall
PR: qdrant/qdrant#7006
File: docs/redoc/master/openapi.json:13645-13652
Timestamp: 2025-08-10T18:26:12.421Z
Learning: In qdrant, docs/redoc/master/openapi.json is auto-generated from the REST schemas. To change field docs, edit lib/api/src/rest/schema.rs (e.g., add doc comments or #[schemars(description = ...)]). Specifically, UpdateVectors.update_filter lacked a description and should state: "If specified, only update vectors for points that match this filter; points not matching the filter are left unchanged."
Applied to files:
lib/api/src/grpc/qdrant.rs
📚 Learning: 2025-08-10T18:30:02.965Z
Learnt from: generall
PR: qdrant/qdrant#7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.965Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.
Applied to files:
lib/api/src/grpc/qdrant.rs
📚 Learning: 2025-08-10T18:22:33.034Z
Learnt from: generall
PR: qdrant/qdrant#7006
File: lib/collection/src/shards/remote_shard.rs:258-268
Timestamp: 2025-08-10T18:22:33.034Z
Learning: In Qdrant, the conditional upsert feature with `update_filter` is designed to be rolled out only after a full cluster upgrade to the next minor release. Version compatibility guards in the code are not required because the feature deployment strategy ensures all nodes will support the feature before it's used, and it doesn't change existing functionality for users not using the conditional updates.
Applied to files:
lib/api/src/grpc/qdrant.rs
📚 Learning: 2025-08-10T18:20:42.203Z
Learnt from: generall
PR: qdrant/qdrant#7006
File: src/common/update.rs:272-281
Timestamp: 2025-08-10T18:20:42.203Z
Learning: In Qdrant, conditional updates with `update_filter` always apply the condition check to a small, pre-determined set of point IDs (those being updated) and never perform a full collection scan. Therefore, strict mode validation for unindexed filtering is not necessary for conditional update operations, as the performance concerns that strict mode addresses don't apply to these targeted operations.
Applied to files:
lib/api/src/grpc/qdrant.rs
🧬 Code Graph Analysis (1)
lib/api/src/grpc/qdrant.rs (2)
lib/api/src/rest/schema.rs (3)
validate(49-58)validate(131-140)validate(1248-1253)lib/api/src/grpc/validate.rs (8)
validate(14-14)validate(19-21)validate(29-31)validate(39-41)validate(49-54)validate(58-64)validate(68-74)validate(78-85)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: e2e-tests
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: integration-tests-consensus
- GitHub Check: test-consistency
- GitHub Check: test-consensus-compose
- GitHub Check: integration-tests
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: lint
- GitHub Check: storage-compat-test
| /// If specified, only points that match this filter will be updated, others will be inserted | ||
| #[prost(message, optional, tag = "6")] | ||
| #[validate(nested)] | ||
| pub update_filter: ::core::option::Option<Filter>, |
There was a problem hiding this comment.
💡 Verification agent
❓ Verification inconclusive
Clarify conditional upsert semantics; current doc is ambiguous
Doc says “only points that match this filter will be updated, others will be inserted,” which is misleading for existing-but-non-matching points. Explicitly document per-id behavior.
Apply this change (in the proto so it regenerates here):
- /// If specified, only points that match this filter will be updated, others will be inserted
+ /// Conditional upsert per point id:
+ /// - If an existing point with the same id matches this filter, it will be updated.
+ /// - If no point with the id exists, it will be inserted.
+ /// - If an existing point with the same id does not match this filter, it will be left unchanged.
+ /// An empty filter matches all points.Note: Validation is correctly wired via #[validate(nested)] — no change needed there.
To confirm the rename is complete and no stale names remain, you can run:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Ensure we don't have legacy names lingering
rg -n 'update_if|updateIf'Length of output: 48
Enhance conditional upsert documentation for all update_filter fields
The nested validation is correctly applied via #[validate(nested)], but the doc comments should clearly spell out per‐ID semantics to avoid ambiguity. Please update the update_filter descriptions in your .proto definitions (so they propagate into lib/api/src/grpc/qdrant.rs) as follows:
• Affected locations
- UpsertPoints.update_filter
- UpdatePointVectors.update_filter
- PointsUpdateOperation.PointStructList.update_filter
- PointsUpdateOperation.UpdateVectors.update_filter
• Proposed diff for each update_filter field:
- /// If specified, only points that match this filter will be updated, others will be inserted
+ /// Conditional upsert per point id:
+ /// - If an existing point with the same id matches this filter, it will be updated.
+ /// - If no point with that id exists, it will be inserted.
+ /// - If an existing point with the same id does not match this filter, it will be left unchanged.
+ /// An empty filter matches all points.• Stale‐name check confirmed:
rg -n 'update_if|updateIf'→ no matches found
🤖 Prompt for AI Agents
In lib/api/src/grpc/qdrant.rs around lines 4367 to 4370, the generated doc
comment for the update_filter field is ambiguous about per-ID semantics; update
the source .proto definitions for the following
fields—UpsertPoints.update_filter, UpdatePointVectors.update_filter,
PointsUpdateOperation.PointStructList.update_filter, and
PointsUpdateOperation.UpdateVectors.update_filter—to include a clear description
that the filter is applied per-point ID on update (i.e., when an incoming point
matches an existing point ID the filter decides whether that specific point is
updated; if it does not match the filter the point will be inserted), keep the
nested validation attribute, and regenerate the Rust bindings so the clarified
doc comments propagate into lib/api/src/grpc/qdrant.rs.
| /// If specified, only points that match this filter will be updated | ||
| #[prost(message, optional, tag = "6")] | ||
| #[validate(nested)] | ||
| pub update_filter: ::core::option::Option<Filter>, |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Make vector-update behavior explicit for non-matching and non-existing points
The brief doc doesn’t state what happens if a point doesn’t match the filter or doesn’t exist. Please make it explicit to avoid misuse.
Update the source proto comment so it regenerates here:
- /// If specified, only points that match this filter will be updated
+ /// Conditional vector update. For each provided point:
+ /// - If an existing point with the same id matches this filter, its vectors will be updated.
+ /// - If an existing point with the same id does not match this filter, it will be left unchanged.
+ /// - Points that do not exist will not be inserted.
+ /// An empty filter matches all points.Validation via #[validate(nested)] is correct.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// If specified, only points that match this filter will be updated | |
| #[prost(message, optional, tag = "6")] | |
| #[validate(nested)] | |
| pub update_filter: ::core::option::Option<Filter>, | |
| /// Conditional vector update. For each provided point: | |
| /// - If an existing point with the same id matches this filter, its vectors will be updated. | |
| /// - If an existing point with the same id does not match this filter, it will be left unchanged. | |
| /// - Points that do not exist will not be inserted. | |
| /// An empty filter matches all points. | |
| #[prost(message, optional, tag = "6")] | |
| #[validate(nested)] | |
| pub update_filter: ::core::option::Option<Filter>, |
🤖 Prompt for AI Agents
In lib/api/src/grpc/qdrant.rs around lines 4453 to 4456, the protobuf field
comment for update_filter is ambiguous about behavior when a point does not
match the filter or when a point id does not exist; update the source proto
comment to explicitly state the semantics (e.g., "If a point does not match the
filter, it will not be updated; non-existing point IDs are silently ignored /
result in no-op" — pick the actual intended behavior), keep the
#[validate(nested)] attribute, and then regenerate the Rust code from the
updated proto so the clarified comment appears here.
timvisee
left a comment
There was a problem hiding this comment.
Nice! Did quite a bit of testing, everything looks good.
Only have some minor style remarks, no logic bugs:
| let point_ids: Vec<_> = points.iter().map(|point| point.id).collect(); | ||
|
|
||
| let points_to_exclude = | ||
| select_excluded_by_filter_ids(segments, point_ids, filter_condition, hw_counter)?; |
There was a problem hiding this comment.
This means that there is no feedback to the user in case some update was rejected. IMO, one of the main benefits of this feature is to let the user choose what to do when there is a collision.
For instance, in If-Match http header, a 412 Precondition Failed error is the suggested error.
Without this error, if a user would want to verify that the request was actually applied, assuming wait=True, they would need to poll for the points and verify the condition themselves. And even in this scenario, it would be difficult to implement something like a counter which is based solely on the counter itself.
The problem with returning an error is that a batch update would need to be rejected entirely if the condition is not met (even partially). So it would change the semantic to: "perform this update if all of the points fulfill this condition", which I think is fine.
There was a problem hiding this comment.
I agree it would be nice to have some form of feedback. And I like the idea of HTTP 412 (well, for certain use cases).
However, we currently have no way to reliably check the filter as precondition. We don't have a transactional system to make this happen, and we should not lock all segments during a check-and-apply phase.
The problem with returning an error is that a batch update would need to be rejected entirely if the condition is not met (even partially). So it would change the semantic to: "perform this update if all of the points fulfill this condition", which I think is fine.
Yes. This is already a problem with an upsert that contains just two points.
We could implement some sort of feedback for wait=true requests, where it would return the point IDs that were updated. But that remains quite lacking.
There was a problem hiding this comment.
So it would change the semantic to: "perform this update if all of the points fulfill this condition", which I think is fine.
This breaks the use-case. IMO points should be partially updated.
…ons`... Include changes from #7006
* implement condition parameter for upsert operation * fmt * update api schema * implement conditional update for update-vectors op * rename to update_filter for consistency * add tests * explicilty ignore strict mode for conditional updates * rabbit review fix * Also assert vector element before, confirm we don't normalize * Simplify filter creation using new helper * Rename merge_with_ids to with_point_ids --------- Co-authored-by: timvisee <tim@visee.me> Co-authored-by: Tim Visée <tim+github@visee.me>
This PR introduces optional condition to the update operation (upsert and update vectors).
The condition might be useful, if updates of the same point are coming from multiple sources and we need to make sure that there are no data races with the update. For example, conditional update may be used to update only those points, which
last_updatedtimestamp is earlier themnow.Example:
ToDo: renameupdate_ifintoupdate_filter, as looks to be more consistent with the rest of APIToDo: tests