Skip to content

introduce update_mode parameter for upsert operation to control if we want to insert, update, or upsert#7963

Merged
generall merged 7 commits intodevfrom
update-only
Jan 26, 2026
Merged

introduce update_mode parameter for upsert operation to control if we want to insert, update, or upsert#7963
generall merged 7 commits intodevfrom
update-only

Conversation

@generall
Copy link
Member

… want to insert, update, or upsert

Comment on lines -228 to -238
pub fn into_update_only(
self,
update_filter: Option<Filter>,
) -> Vec<CollectionUpdateOperations> {
let mut operations = Vec::new();

match self {
Self::PointsBatch(batch) => {
let mut update_vectors = UpdateVectorsOp {
points: Vec::new(),
update_filter: update_filter.clone(),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like this function is no longer needed for good. cc @ffuugoo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for a partially upgraded cluster to be in process of resharding? E.g., some nodes already use update_mode, but some nodes don't yet "understand" it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question.

Resharding should not progress (or be triggered) if nodes are down. I'll verify this on my end.

Also, resharding is currently aborted when a node restarts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but

  1. is it possible to trigger node upgrade while resharding is in progress?
  2. can we guarantee that we will notice a quick node restart?
  3. do we care?

Copy link
Contributor

@ffuugoo ffuugoo Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By itself, resharding won't be cancelled if you restart a node (e.g., if the node is not currently doing resharding transfer or smth). Only if cluster-manager observed node being unavailable.

Also you don't even need restarting nodes during resharding. It's enough to upgrade one node right before starting a resharding. And then we would have nodes with (silently) incompatible protocols.

I'm just not sure if there's already logic to prevent this in CM. If there is, then maybe we don't care.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, worst case it will casue transfer error and cancellation of resharding?
In this case we don't care.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we send operations with UpdateOnly to old node it will just ignore it? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, added version gate

/// - Removes found IDs from consideration for subsequent segments
/// - Early terminates when all IDs are found
/// - Pre-allocates result set
pub fn select_existing_points(&self, ids: Vec<PointIdType>) -> AHashSet<PointIdType> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is slightly optimized function, assumed to be working better if we have 10+ segments

Comment on lines +293 to +315
match update_mode {
UpdateMode::Upsert => {
// Default behavior: insert new points, update existing points that match the condition
let points_to_exclude =
select_excluded_by_filter_ids(segments, point_ids, condition, hw_counter)?;
points_op.retain_point_ids(|idx| !points_to_exclude.contains(idx));
}
UpdateMode::InsertOnly => {
// Only insert new points, skip all existing points entirely
let existing_points = segments.select_existing_points(point_ids);
points_op.retain_point_ids(|idx| !existing_points.contains(idx));
}
UpdateMode::UpdateOnly => {
// Only update existing points that match the condition, don't insert new points
let points_to_exclude =
select_excluded_by_filter_ids(segments, point_ids.clone(), condition, hw_counter)?;
let existing_points = segments.select_existing_points(point_ids);
// Keep only points that exist AND are not excluded by the condition
points_op.retain_point_ids(|idx| {
existing_points.contains(idx) && !points_to_exclude.contains(idx)
});
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main change is here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that this logic is now centralized here 👌

coderabbitai[bot]

This comment was marked as resolved.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/redoc/master/openapi.json (1)

10852-10871: Document update_filter vs update_mode interaction

The current text says new points are inserted “regardless of the filter,” but that’s false when update_mode=update_only. Please align the description with the new mode semantics so users don’t assume inserts always happen.

✏️ Suggested doc tweak
-            "description": "Filter to apply when updating existing points. Only points matching this filter will be updated. Points that don't match will keep their current state. New points will be inserted regardless of the filter.",
+            "description": "Filter to apply when updating existing points. Only points matching this filter will be updated when update_mode allows updates. Points that don't match will keep their current state. New points are inserted only when update_mode is `upsert` or `insert_only`.",

Apply the same wording in both PointsBatch.update_filter and PointsList.update_filter.

Also applies to: 11200-11219

🤖 Fix all issues with AI agents
In `@lib/api/src/grpc/proto/points.proto`:
- Around line 175-182: Update the comment for the optional Filter update_filter
(and the duplicate comment where update_filter is documented) to qualify
behavior based on the UpdateMode named by update_mode: state that when
update_mode is set to UpdateOnly matching points will be updated and
non-matching points will NOT be inserted, whereas for Upsert (default) new
points are inserted regardless of the filter and for InsertOnly the filter is
ignored (or only inserts occur). Reference the update_filter field and the
UpdateMode/update_mode enum in the text so readers know the behavior varies by
mode.
♻️ Duplicate comments (1)
lib/api/src/rest/schema.rs (1)

1378-1386: Update filter docs still imply inserts even for UpdateOnly.

Also applies to: 1417-1424

🧹 Nitpick comments (1)
docs/redoc/master/openapi.json (1)

11168-11176: Consider making the default explicit in UpdateMode

Optional: add a default: "upsert" (and/or set a property default on update_mode) so generated clients and docs clearly reflect behavior when the field is omitted.

Comment on lines +175 to 182
// Filter to apply when updating existing points. Only points matching this filter will be updated.
// Points that don't match will keep their current state. New points will be inserted regardless of the filter.
optional Filter update_filter = 6;
// Timeout for the request in seconds
optional uint64 timeout = 7;
// Mode of the upsert operation: insert_only, upsert (default), update_only
optional UpdateMode update_mode = 8;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify update_filter docs for update_mode=UpdateOnly.

The comment says new points are inserted regardless of the filter, which is misleading for UpdateOnly (no inserts). Please qualify the behavior by update_mode to avoid incorrect gRPC docs.

📝 Suggested doc tweak
-  // Filter to apply when updating existing points. Only points matching this filter will be updated.
-  // Points that don't match will keep their current state. New points will be inserted regardless of the filter.
+  // Filter to apply when updating existing points. Only points matching this filter will be updated.
+  // Points that don't match will keep their current state.
+  // New points are inserted only when update_mode is Upsert or InsertOnly.

Also applies to: 1180-1184

🤖 Prompt for AI Agents
In `@lib/api/src/grpc/proto/points.proto` around lines 175 - 182, Update the
comment for the optional Filter update_filter (and the duplicate comment where
update_filter is documented) to qualify behavior based on the UpdateMode named
by update_mode: state that when update_mode is set to UpdateOnly matching points
will be updated and non-matching points will NOT be inserted, whereas for Upsert
(default) new points are inserted regardless of the filter and for InsertOnly
the filter is ignored (or only inserts occur). Reference the update_filter field
and the UpdateMode/update_mode enum in the text so readers know the behavior
varies by mode.

@qdrant qdrant deleted a comment from coderabbitai bot Jan 22, 2026
}
]
},
"update_mode": {
Copy link
Member

@timvisee timvisee Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In validation: shall we make update_mode: insert_only and update_filter mutually exclusive?

The update filter is completely ignored in that case. It might help prevent accidental misuse.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about it, and decided not to.

@ffuugoo
Copy link
Contributor

ffuugoo commented Jan 23, 2026

Note that the comment about multi-versioned cluster still stands.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/redoc/master/openapi.json (1)

10851-10872: Update update_filter docs to reflect update_mode behavior.

The description says new points are inserted regardless of the filter, which is only true for update_mode=upsert. For update_only, inserts are skipped; for insert_only, updates (and thus the filter) are ignored. Please adjust the wording accordingly in the REST schema so the generated OpenAPI stays accurate.

✏️ Suggested wording update (apply in REST schema source, not the generated JSON)
- "description": "Filter to apply when updating existing points. Only points matching this filter will be updated. Points that don't match will keep their current state. New points will be inserted regardless of the filter.",
+ "description": "Filter to apply when updating existing points. Only points matching this filter will be updated; others keep their current state. When update_mode is upsert (default), new points are inserted regardless of the filter. When update_mode is update_only, new points are not inserted. When update_mode is insert_only, the filter is ignored.",

Based on learnings, please update the REST schema source (e.g., lib/api/src/rest/schema.rs) instead of editing this generated OpenAPI file.

Also applies to: 11199-11220

🤖 Fix all issues with AI agents
In `@lib/api/src/grpc/qdrant.rs`:
- Around line 4953-4966: Update the proto source comment for the update_filter
field to clarify that new points are only inserted when the chosen update_mode
permits insertion (i.e., insertion does NOT occur when UpdateMode=UpdateOnly);
update the proto's comment near the Filter field (referencing update_filter and
update_mode / UpdateOnly) to state this behavior explicitly, then regenerate the
Rust gRPC bindings so lib/api/src/grpc/qdrant.rs is updated (do not edit the
generated file directly).
♻️ Duplicate comments (2)
lib/api/src/grpc/proto/points.proto (1)

175-182: Clarify update_filter docs for update_mode semantics (still misleading).

The comment still states “New points will be inserted regardless of the filter,” which is false for UpdateOnly and misleading for InsertOnly. This duplicates a prior comment.

📝 Suggested doc tweak
-  // Filter to apply when updating existing points. Only points matching this filter will be updated.
-  // Points that don't match will keep their current state. New points will be inserted regardless of the filter.
+  // Filter to apply when updating existing points. Only points matching this filter will be updated.
+  // Points that don't match will keep their current state.
+  // New points are inserted only when update_mode is Upsert or InsertOnly;
+  // with UpdateOnly no inserts occur, and with InsertOnly this filter is ignored.

Also applies to: 1180-1184

lib/api/src/rest/schema.rs (1)

1378-1386: Clarify update_filter semantics when update_mode ≠ upsert.

The doc comment states "New points will be inserted regardless of the filter" which is misleading when update_mode is set to UpdateOnly (no inserts occur) or InsertOnly (no updates occur). Consider clarifying the interaction between these two fields.

🧹 Nitpick comments (3)
tests/openapi/test_update_mode.py (1)

18-90: Tests look good, consider adding coverage for the default upsert mode.

The tests comprehensively verify update_only and insert_only semantics. For completeness, consider adding a test that verifies the default upsert mode (or explicit "update_mode": "upsert") both inserts new points and updates existing ones.

lib/shard/src/update.rs (1)

305-314: Minor: Consider optimizing point_ids.clone() in UpdateOnly branch.

The point_ids.clone() on line 308 allocates a new vector. Since both select_excluded_by_filter_ids and select_existing_points need the point IDs, consider whether one of these functions could accept a reference/slice instead to avoid the clone. However, this is a minor optimization and can be deferred.

lib/edge/python/src/update.rs (1)

41-68: Avoid unnecessary conditional path for default UpdateMode::Upsert when no condition is provided.

When callers pass UpdateMode::Upsert with no condition, the current code routes through UpsertPointsConditional with Filter::default(). Since Upsert is the default mode, this adds filter processing overhead without behavioral benefit. Consider routing to UpsertPoints directly:

♻️ Proposed adjustment
         let operation = match (condition, update_mode) {
             // If condition or non-default update_mode is provided, use conditional upsert
             (Some(condition), mode) => point_ops::PointOperations::UpsertPointsConditional(
                 point_ops::ConditionalInsertOperationInternal {
                     points_op: points,
                     condition: Filter::from(condition),
                     update_mode: mode,
                 },
             ),
+            (None, Some(UpdateMode::Upsert)) => point_ops::PointOperations::UpsertPoints(points),
             (None, Some(mode)) => point_ops::PointOperations::UpsertPointsConditional(
                 point_ops::ConditionalInsertOperationInternal {
                     points_op: points,
                     condition: Filter::default(),
                     update_mode: Some(mode),
                 },
             ),
             // Default case: regular upsert
             (None, None) => point_ops::PointOperations::UpsertPoints(points),
         };

coderabbitai[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

@qdrant qdrant deleted a comment from coderabbitai bot Jan 26, 2026
@generall generall merged commit 81e7ab7 into dev Jan 26, 2026
15 checks passed
@generall generall deleted the update-only branch January 26, 2026 23:39
IvanPleshkov pushed a commit that referenced this pull request Jan 27, 2026
… want to insert, update, or upsert (#7963)

* introduce update_mode parameter for upsert operation to control if we want to insert, update, or upsert

* add test

* upd dockstring

* require resharding once all peers have updated version

* use service error

* fix clippy again

* wait for same version before resharding in tests
generall added a commit that referenced this pull request Feb 9, 2026
… want to insert, update, or upsert (#7963)

* introduce update_mode parameter for upsert operation to control if we want to insert, update, or upsert

* add test

* upd dockstring

* require resharding once all peers have updated version

* use service error

* fix clippy again

* wait for same version before resharding in tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants