introduce update_mode parameter for upsert operation to control if we want to insert, update, or upsert#7963
introduce update_mode parameter for upsert operation to control if we want to insert, update, or upsert#7963
Conversation
… want to insert, update, or upsert
| pub fn into_update_only( | ||
| self, | ||
| update_filter: Option<Filter>, | ||
| ) -> Vec<CollectionUpdateOperations> { | ||
| let mut operations = Vec::new(); | ||
|
|
||
| match self { | ||
| Self::PointsBatch(batch) => { | ||
| let mut update_vectors = UpdateVectorsOp { | ||
| points: Vec::new(), | ||
| update_filter: update_filter.clone(), |
There was a problem hiding this comment.
seems like this function is no longer needed for good. cc @ffuugoo
There was a problem hiding this comment.
Is it possible for a partially upgraded cluster to be in process of resharding? E.g., some nodes already use update_mode, but some nodes don't yet "understand" it?
There was a problem hiding this comment.
Good question.
Resharding should not progress (or be triggered) if nodes are down. I'll verify this on my end.
Also, resharding is currently aborted when a node restarts.
There was a problem hiding this comment.
There was a problem hiding this comment.
Yeah, but
- is it possible to trigger node upgrade while resharding is in progress?
- can we guarantee that we will notice a quick node restart?
- do we care?
There was a problem hiding this comment.
By itself, resharding won't be cancelled if you restart a node (e.g., if the node is not currently doing resharding transfer or smth). Only if cluster-manager observed node being unavailable.
Also you don't even need restarting nodes during resharding. It's enough to upgrade one node right before starting a resharding. And then we would have nodes with (silently) incompatible protocols.
I'm just not sure if there's already logic to prevent this in CM. If there is, then maybe we don't care.
There was a problem hiding this comment.
If I understand correctly, worst case it will casue transfer error and cancellation of resharding?
In this case we don't care.
There was a problem hiding this comment.
I think if we send operations with UpdateOnly to old node it will just ignore it? 🤔
There was a problem hiding this comment.
you are right, added version gate
| /// - Removes found IDs from consideration for subsequent segments | ||
| /// - Early terminates when all IDs are found | ||
| /// - Pre-allocates result set | ||
| pub fn select_existing_points(&self, ids: Vec<PointIdType>) -> AHashSet<PointIdType> { |
There was a problem hiding this comment.
this is slightly optimized function, assumed to be working better if we have 10+ segments
| match update_mode { | ||
| UpdateMode::Upsert => { | ||
| // Default behavior: insert new points, update existing points that match the condition | ||
| let points_to_exclude = | ||
| select_excluded_by_filter_ids(segments, point_ids, condition, hw_counter)?; | ||
| points_op.retain_point_ids(|idx| !points_to_exclude.contains(idx)); | ||
| } | ||
| UpdateMode::InsertOnly => { | ||
| // Only insert new points, skip all existing points entirely | ||
| let existing_points = segments.select_existing_points(point_ids); | ||
| points_op.retain_point_ids(|idx| !existing_points.contains(idx)); | ||
| } | ||
| UpdateMode::UpdateOnly => { | ||
| // Only update existing points that match the condition, don't insert new points | ||
| let points_to_exclude = | ||
| select_excluded_by_filter_ids(segments, point_ids.clone(), condition, hw_counter)?; | ||
| let existing_points = segments.select_existing_points(point_ids); | ||
| // Keep only points that exist AND are not excluded by the condition | ||
| points_op.retain_point_ids(|idx| { | ||
| existing_points.contains(idx) && !points_to_exclude.contains(idx) | ||
| }); | ||
| } | ||
| } |
There was a problem hiding this comment.
I like that this logic is now centralized here 👌
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/redoc/master/openapi.json (1)
10852-10871: Document update_filter vs update_mode interactionThe current text says new points are inserted “regardless of the filter,” but that’s false when
update_mode=update_only. Please align the description with the new mode semantics so users don’t assume inserts always happen.✏️ Suggested doc tweak
- "description": "Filter to apply when updating existing points. Only points matching this filter will be updated. Points that don't match will keep their current state. New points will be inserted regardless of the filter.", + "description": "Filter to apply when updating existing points. Only points matching this filter will be updated when update_mode allows updates. Points that don't match will keep their current state. New points are inserted only when update_mode is `upsert` or `insert_only`.",Apply the same wording in both
PointsBatch.update_filterandPointsList.update_filter.Also applies to: 11200-11219
🤖 Fix all issues with AI agents
In `@lib/api/src/grpc/proto/points.proto`:
- Around line 175-182: Update the comment for the optional Filter update_filter
(and the duplicate comment where update_filter is documented) to qualify
behavior based on the UpdateMode named by update_mode: state that when
update_mode is set to UpdateOnly matching points will be updated and
non-matching points will NOT be inserted, whereas for Upsert (default) new
points are inserted regardless of the filter and for InsertOnly the filter is
ignored (or only inserts occur). Reference the update_filter field and the
UpdateMode/update_mode enum in the text so readers know the behavior varies by
mode.
♻️ Duplicate comments (1)
lib/api/src/rest/schema.rs (1)
1378-1386: Update filter docs still imply inserts even for UpdateOnly.Also applies to: 1417-1424
🧹 Nitpick comments (1)
docs/redoc/master/openapi.json (1)
11168-11176: Consider making the default explicit in UpdateModeOptional: add a
default: "upsert"(and/or set a property default onupdate_mode) so generated clients and docs clearly reflect behavior when the field is omitted.
| // Filter to apply when updating existing points. Only points matching this filter will be updated. | ||
| // Points that don't match will keep their current state. New points will be inserted regardless of the filter. | ||
| optional Filter update_filter = 6; | ||
| // Timeout for the request in seconds | ||
| optional uint64 timeout = 7; | ||
| // Mode of the upsert operation: insert_only, upsert (default), update_only | ||
| optional UpdateMode update_mode = 8; | ||
| } |
There was a problem hiding this comment.
Clarify update_filter docs for update_mode=UpdateOnly.
The comment says new points are inserted regardless of the filter, which is misleading for UpdateOnly (no inserts). Please qualify the behavior by update_mode to avoid incorrect gRPC docs.
📝 Suggested doc tweak
- // Filter to apply when updating existing points. Only points matching this filter will be updated.
- // Points that don't match will keep their current state. New points will be inserted regardless of the filter.
+ // Filter to apply when updating existing points. Only points matching this filter will be updated.
+ // Points that don't match will keep their current state.
+ // New points are inserted only when update_mode is Upsert or InsertOnly.Also applies to: 1180-1184
🤖 Prompt for AI Agents
In `@lib/api/src/grpc/proto/points.proto` around lines 175 - 182, Update the
comment for the optional Filter update_filter (and the duplicate comment where
update_filter is documented) to qualify behavior based on the UpdateMode named
by update_mode: state that when update_mode is set to UpdateOnly matching points
will be updated and non-matching points will NOT be inserted, whereas for Upsert
(default) new points are inserted regardless of the filter and for InsertOnly
the filter is ignored (or only inserts occur). Reference the update_filter field
and the UpdateMode/update_mode enum in the text so readers know the behavior
varies by mode.
| } | ||
| ] | ||
| }, | ||
| "update_mode": { |
There was a problem hiding this comment.
In validation: shall we make update_mode: insert_only and update_filter mutually exclusive?
The update filter is completely ignored in that case. It might help prevent accidental misuse.
There was a problem hiding this comment.
I was thinking about it, and decided not to.
|
Note that the comment about multi-versioned cluster still stands. |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/redoc/master/openapi.json (1)
10851-10872: Updateupdate_filterdocs to reflectupdate_modebehavior.The description says new points are inserted regardless of the filter, which is only true for
update_mode=upsert. Forupdate_only, inserts are skipped; forinsert_only, updates (and thus the filter) are ignored. Please adjust the wording accordingly in the REST schema so the generated OpenAPI stays accurate.✏️ Suggested wording update (apply in REST schema source, not the generated JSON)
- "description": "Filter to apply when updating existing points. Only points matching this filter will be updated. Points that don't match will keep their current state. New points will be inserted regardless of the filter.", + "description": "Filter to apply when updating existing points. Only points matching this filter will be updated; others keep their current state. When update_mode is upsert (default), new points are inserted regardless of the filter. When update_mode is update_only, new points are not inserted. When update_mode is insert_only, the filter is ignored.",Based on learnings, please update the REST schema source (e.g., lib/api/src/rest/schema.rs) instead of editing this generated OpenAPI file.
Also applies to: 11199-11220
🤖 Fix all issues with AI agents
In `@lib/api/src/grpc/qdrant.rs`:
- Around line 4953-4966: Update the proto source comment for the update_filter
field to clarify that new points are only inserted when the chosen update_mode
permits insertion (i.e., insertion does NOT occur when UpdateMode=UpdateOnly);
update the proto's comment near the Filter field (referencing update_filter and
update_mode / UpdateOnly) to state this behavior explicitly, then regenerate the
Rust gRPC bindings so lib/api/src/grpc/qdrant.rs is updated (do not edit the
generated file directly).
♻️ Duplicate comments (2)
lib/api/src/grpc/proto/points.proto (1)
175-182: Clarify update_filter docs for update_mode semantics (still misleading).The comment still states “New points will be inserted regardless of the filter,” which is false for
UpdateOnlyand misleading forInsertOnly. This duplicates a prior comment.📝 Suggested doc tweak
- // Filter to apply when updating existing points. Only points matching this filter will be updated. - // Points that don't match will keep their current state. New points will be inserted regardless of the filter. + // Filter to apply when updating existing points. Only points matching this filter will be updated. + // Points that don't match will keep their current state. + // New points are inserted only when update_mode is Upsert or InsertOnly; + // with UpdateOnly no inserts occur, and with InsertOnly this filter is ignored.Also applies to: 1180-1184
lib/api/src/rest/schema.rs (1)
1378-1386: Clarify update_filter semantics when update_mode ≠ upsert.The doc comment states "New points will be inserted regardless of the filter" which is misleading when
update_modeis set toUpdateOnly(no inserts occur) orInsertOnly(no updates occur). Consider clarifying the interaction between these two fields.
🧹 Nitpick comments (3)
tests/openapi/test_update_mode.py (1)
18-90: Tests look good, consider adding coverage for the default upsert mode.The tests comprehensively verify
update_onlyandinsert_onlysemantics. For completeness, consider adding a test that verifies the defaultupsertmode (or explicit"update_mode": "upsert") both inserts new points and updates existing ones.lib/shard/src/update.rs (1)
305-314: Minor: Consider optimizingpoint_ids.clone()in UpdateOnly branch.The
point_ids.clone()on line 308 allocates a new vector. Since bothselect_excluded_by_filter_idsandselect_existing_pointsneed the point IDs, consider whether one of these functions could accept a reference/slice instead to avoid the clone. However, this is a minor optimization and can be deferred.lib/edge/python/src/update.rs (1)
41-68: Avoid unnecessary conditional path for defaultUpdateMode::Upsertwhen no condition is provided.When callers pass
UpdateMode::Upsertwith no condition, the current code routes throughUpsertPointsConditionalwithFilter::default(). SinceUpsertis the default mode, this adds filter processing overhead without behavioral benefit. Consider routing toUpsertPointsdirectly:♻️ Proposed adjustment
let operation = match (condition, update_mode) { // If condition or non-default update_mode is provided, use conditional upsert (Some(condition), mode) => point_ops::PointOperations::UpsertPointsConditional( point_ops::ConditionalInsertOperationInternal { points_op: points, condition: Filter::from(condition), update_mode: mode, }, ), + (None, Some(UpdateMode::Upsert)) => point_ops::PointOperations::UpsertPoints(points), (None, Some(mode)) => point_ops::PointOperations::UpsertPointsConditional( point_ops::ConditionalInsertOperationInternal { points_op: points, condition: Filter::default(), update_mode: Some(mode), }, ), // Default case: regular upsert (None, None) => point_ops::PointOperations::UpsertPoints(points), };
… want to insert, update, or upsert (#7963) * introduce update_mode parameter for upsert operation to control if we want to insert, update, or upsert * add test * upd dockstring * require resharding once all peers have updated version * use service error * fix clippy again * wait for same version before resharding in tests
… want to insert, update, or upsert (#7963) * introduce update_mode parameter for upsert operation to control if we want to insert, update, or upsert * add test * upd dockstring * require resharding once all peers have updated version * use service error * fix clippy again * wait for same version before resharding in tests
… want to insert, update, or upsert