Skip to content

Update if#7006

Merged
generall merged 11 commits intodevfrom
update-if
Aug 15, 2025
Merged

Update if#7006
generall merged 11 commits intodevfrom
update-if

Conversation

@generall
Copy link
Member

@generall generall commented Aug 9, 2025

This PR introduces optional condition to the update operation (upsert and update vectors).

The condition might be useful, if updates of the same point are coming from multiple sources and we need to make sure that there are no data races with the update. For example, conditional update may be used to update only those points, which last_updated timestamp is earlier them now.

Example:

PUT collection/my-collection/points
{
    "points": [
        {
            "id": 6,
            "vector": [1.1, 1.2, 1.3, 1.4],
            "payload": {
                "version": 4,
                "test": "test_value"
            }
        }
    ],
    "update_filter": {
        "must": {
            "key": "version",
            "range": {
                "lt": 4
            }
        }
    }
}

ToDo: rename update_if into update_filter, as looks to be more consistent with the rest of API
ToDo: tests

@generall generall changed the base branch from master to dev August 9, 2025 20:33
@generall generall marked this pull request as ready for review August 10, 2025 17:24
@generall generall requested a review from timvisee August 10, 2025 17:24
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 10, 2025

📝 Walkthrough

Walkthrough

Adds an optional update_filter: Filter to point upsert and vector-update message/schema types across gRPC proto, OpenAPI, and Rust REST types. The filter is converted and propagated through tonic gRPC converters, API handlers, operation construction, shard conversions, remote shard forwarding, and collection manager logic. Introduces ConditionalInsertOperationInternal and PointOperations::UpsertPointsConditional, new conditional_upsert and update_vectors_conditional code paths, a helper to compute excluded IDs by filter, updated sharding/splitting logic, RBAC checks, validation entries, tests (unit and OpenAPI) and small test updates to include the new field.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • agourlay
  • coszio

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch update-if

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (32)
lib/collection/src/operations/mod.rs (1)

443-446: Exercise both presence/absence of update_filter in proptests

Right now, the arbitrary UpdateVectors case always uses update_filter: None. Generating cases with Some(Filter) too will better cover serialization/deserialization and downstream logic.

-            let update = Self::UpdateVectors(UpdateVectorsOp {
-                points: Vec::new(),
-                update_filter: None,
-            });
+            let update = any::<Option<Filter>>().prop_map(|update_filter| {
+                Self::UpdateVectors(UpdateVectorsOp {
+                    points: Vec::new(),
+                    update_filter,
+                })
+            });
@@
-            prop_oneof![Just(update), Just(delete), Just(delete_by_filter),].boxed()
+            prop_oneof![update, Just(delete), Just(delete_by_filter),].boxed()
lib/api/src/rest/schema.rs (2)

1192-1195: Document behavior and accept update_if alias for UpdateVectors

For vector updates, non-matching points must be skipped (no inserts happen for vector ops). Add docs and alias = "update_if" for a smooth transition.

-    #[serde(default, skip_serializing_if = "Option::is_none")]
-    #[validate(nested)]
-    pub update_filter: Option<Filter>,
+    /// If specified, only points that match this filter will be updated.
+    /// Points that do not match will be skipped (vector updates never insert).
+    #[serde(default, skip_serializing_if = "Option::is_none", alias = "update_if")]
+    #[validate(nested)]
+    pub update_filter: Option<Filter>,

1203-1206: Clarify upsert wording and add update_if alias for PointsList

Refine the wording to avoid implying inserts for existing non-matching IDs and add a serde alias for backward compatibility.

-    /// If specified, only points that match this filter will be updated, others will be inserted
-    #[serde(default, skip_serializing_if = "Option::is_none")]
-    #[validate(nested)]
-    pub update_filter: Option<Filter>,
+    /// If specified, existing points that match this filter will be updated.
+    /// Existing points that do not match are left unchanged.
+    /// Non-existing points in the request will be inserted.
+    #[serde(default, skip_serializing_if = "Option::is_none", alias = "update_if")]
+    #[validate(nested)]
+    pub update_filter: Option<Filter>,
lib/collection/src/operations/vector_ops.rs (2)

42-45: Add rustdoc clarifying vector-update semantics

Helpful to spell out that non-matching points are skipped (no inserts for vector updates).

 pub struct UpdateVectorsOp {
     /// Points with named vectors
     pub points: Vec<PointVectorsPersisted>,
-    /// Condition to check before updating vectors
+    /// Condition to check before updating vectors.
+    /// Points that do not match the filter will be skipped (vector updates never insert).
     #[serde(default, skip_serializing_if = "Option::is_none")]
     pub update_filter: Option<Filter>,
 }

97-125: Preserve update_filter (correct) and simplify sharding logic

Current logic correctly clones update_filter per shard. You can reduce duplication by reusing split_iter_by_shard for PointVectorsPersisted.

Add this impl:

+impl SplitByShard for Vec<PointVectorsPersisted> {
+    fn split_by_shard(self, ring: &HashRingRouter) -> OperationToShard<Self> {
+        split_iter_by_shard(self, |point| point.id, ring)
+    }
+}

Then simplify the match arm:

-            VectorOperations::UpdateVectors(UpdateVectorsOp {
-                points,
-                update_filter,
-            }) => {
-                let shard_points = points
-                    .into_iter()
-                    .flat_map(|point| {
-                        point_to_shards(&point.id, ring)
-                            .into_iter()
-                            .map(move |shard_id| (shard_id, point.clone()))
-                    })
-                    .fold(
-                        AHashMap::new(),
-                        |mut map: AHashMap<u32, Vec<PointVectorsPersisted>>, (shard_id, points)| {
-                            map.entry(shard_id).or_default().push(points);
-                            map
-                        },
-                    );
-                let shard_ops = shard_points.into_iter().map(|(shard_id, points)| {
-                    (
-                        shard_id,
-                        VectorOperations::UpdateVectors(UpdateVectorsOp {
-                            points,
-                            update_filter: update_filter.clone(),
-                        }),
-                    )
-                });
-                OperationToShard::by_shard(shard_ops)
-            }
+            VectorOperations::UpdateVectors(UpdateVectorsOp { points, update_filter }) => {
+                points
+                    .split_by_shard(ring)
+                    .map(|points| {
+                        VectorOperations::UpdateVectors(UpdateVectorsOp {
+                            points,
+                            update_filter: update_filter.clone(),
+                        })
+                    })
+            }
lib/segment/src/types.rs (3)

3298-3324: Guard empty ids and make the builder harder to misuse

  • If ids is empty, this produces an unsatisfiable filter. Prefer a no-op early return to avoid surprising behavior.
  • Add #[must_use] so callers don't accidentally drop the returned Filter.

Apply this diff:

-    /// Create an extended filtering condition, which would also include filter by given list of IDs.
-    pub fn merge_with_ids(self, ids: impl IntoIterator<Item = PointIdType>) -> Filter {
+    /// Create an extended filtering condition, which would also include filter by given list of IDs.
+    #[must_use]
+    pub fn merge_with_ids(self, ids: impl IntoIterator<Item = PointIdType>) -> Filter {
         let has_id_condition: HasIdCondition = ids.into_iter().collect();
+        // No IDs supplied — leave the filter unchanged
+        if has_id_condition.has_id.is_empty() {
+            return self;
+        }
 
         let Filter {
             should,
             min_should,
             must,
             must_not,
         } = self;

3298-3324: Avoid duplicating HasId in must by intersecting with existing one (optional)

If must already contains a Condition::HasId, consider intersecting the existing set with the new ids instead of pushing another condition. This:

  • keeps condition count smaller (helps filter_max_conditions limits),
  • reduces planner work, and
  • makes intent explicit (restrict to IDs ∩ new_ids).

If implementing, ensure:

  • Intersection is computed on the underlying AHashSet,
  • Large sets retain Arc via MaybeArc threshold,
  • If the intersection is empty, the filter becomes unsatisfiable (or consider returning a known-empty sentinel if you have one).

Would you like me to draft a concrete patch that performs the intersection while preserving MaybeArc semantics?


3298-3324: Add unit tests for merge_with_ids semantics

Cover at least:

  • Adds HasId to must while preserving should/min_should/must_not.
  • No-op when ids is empty (after the early-return change).
  • Optional: Intersects with existing HasId if you adopt the refactor.

Here is a test scaffold you can drop into the existing tests module in this file:

#[test]
fn test_merge_with_ids_adds_must() {
    let base = Filter {
        should: Some(vec![Condition::Field(FieldCondition::new_match(
            JsonPath::new("k"),
            Match::new_value(ValueVariants::String("v".into())),
        ))]),
        min_should: None,
        must: None,
        must_not: Some(vec![Condition::IsNull(JsonPath::new("n").into())]),
    };

    let merged = base.clone().merge_with_ids([ExtendedPointId::from(1_u64), ExtendedPointId::from(2_u64)]);
    assert!(merged.must.is_some());
    assert_eq!(merged.must.as_ref().unwrap().len(), 1);
    assert!(matches!(merged.must.as_ref().unwrap()[0], Condition::HasId(_)));

    // Preserve others
    assert_eq!(merged.should.as_ref().unwrap().len(), base.should.as_ref().unwrap().len());
    assert_eq!(merged.must_not.as_ref().unwrap().len(), base.must_not.as_ref().unwrap().len());
}

#[test]
fn test_merge_with_ids_empty_is_noop() {
    let base = Filter::new_must(Condition::Field(FieldCondition::new_is_empty(JsonPath::new("f"), true)));
    let merged = base.clone().merge_with_ids(std::iter::empty());
    assert_eq!(merged, base);
}
lib/api/src/grpc/proto/points.proto (2)

138-139: Clarify semantics for existing IDs that fail the filter

The comment says “others will be inserted”, which can be misread when the point already exists but doesn’t match the filter. Consider clarifying that:

  • existing points not matching the filter are left unchanged (not re-inserted),
  • non-existing points are inserted.

This helps avoid ambiguity across clients.


799-800: Mirror the upsert comment clarification

Same nit as above: explicitly document behavior for existing IDs that don’t satisfy the filter to avoid misinterpretation by client implementers.

tests/openapi/test_conditional_update.py (3)

1-1: Remove unused import

itemgetter is imported but unused. Please remove to keep tests clean.

-from operator import itemgetter

19-20: Fix misleading comments

Comments say “Upsert and delete points”, but these blocks perform conditional upserts/updates. Reword to something like “Conditionally upsert point” / “Conditionally update vectors”.

Also applies to: 34-35, 77-78, 119-121


144-155: Make vector presence explicit in GET

To avoid relying on default behavior, consider requesting vectors explicitly in the GET (e.g., query param to include vectors) so the assertion on result["vector"] is robust across configs.

lib/collection/src/shards/remote_shard.rs (1)

246-257: Optional: de-duplicate upsert code paths

Upsert and conditional upsert branches are identical except for the converter. Consider extracting a small helper that builds the request from either variant, reducing duplication and potential drift.

Also applies to: 498-511

lib/storage/src/rbac/ops_checks.rs (1)

1278-1279: Vector Update test updated for new field

Initializing update_filter: None keeps tests compiling against the expanded struct. Consider adding a positive test with a non-None filter later.

src/common/update.rs (2)

248-269: Propagate update_filter: OK; consider renaming locals to reduce shadowing

Logic looks correct and preserves shard_key/inference usage. Minor readability tweak: avoid reusing the name operation for multiple different types in this function.

Suggested rename across these lines and the subsequent construction below:

-    let (operation, shard_key, usage, update_filter) = match operation {
+    let (points_op, shard_key, usage, update_filter) = match operation {
@@
-            let operation = PointInsertOperationsInternal::PointsBatch(batch);
-            (operation, shard_key, usage, update_filter)
+            let points_op = PointInsertOperationsInternal::PointsBatch(batch);
+            (points_op, shard_key, usage, update_filter)
@@
-            let operation = PointInsertOperationsInternal::PointsList(list);
-            (operation, shard_key, usage, update_filter)
+            let points_op = PointInsertOperationsInternal::PointsList(list);
+            (points_op, shard_key, usage, update_filter)

And in the block below (Lines 272-281):

-    let operation = if let Some(condition) = update_filter {
+    let collection_op = if let Some(condition) = update_filter {
         CollectionUpdateOperations::PointOperation(PointOperations::UpsertPointsConditional(
             ConditionalInsertOperationInternal {
-                points_op: operation,
+                points_op,
                 condition,
             },
         ))
     } else {
-        CollectionUpdateOperations::PointOperation(PointOperations::UpsertPoints(operation))
+        CollectionUpdateOperations::PointOperation(PointOperations::UpsertPoints(points_op))
     };

Then pass collection_op to update().


360-364: Carry update_filter into UpdateVectorsOp: LGTM; optional naming nit

Correct propagation into UpdateVectorsOp. Optional: rename local operation to vector_op or collection_op for clarity when reading alongside other functions.

-    let operation = CollectionUpdateOperations::VectorOperation(VectorOperations::UpdateVectors(
+    let collection_op = CollectionUpdateOperations::VectorOperation(VectorOperations::UpdateVectors(
         UpdateVectorsOp {
             points,
             update_filter,
         },
     ));
@@
-    let result = update(
+    let result = update(
         toc,
         &collection_name,
-        operation,
+        collection_op,
         internal_params,
         params,
         shard_key,
         access,
         hw_measurement_acc,
     )
docs/grpc/docs.md (4)

3880-3881: Add explicit semantics example for upsert update_filter

Text is clear. Consider adding a short example clarifying that for upsert:

  • existing points matching the filter are updated,
  • existing points not matching the filter are left unchanged,
  • non-existing points are inserted regardless of filter.

3931-3932: Clarify interaction between UpdateVectors points list and update_filter

State explicitly that update_filter is ANDed with the provided point IDs, i.e., only those point IDs in the request that also match the filter will be updated; others are skipped.


4910-4910: Mirror UpdateVectors semantics on the RPC request

Repeat the AND semantics with provided point IDs here to avoid ambiguity at the service layer. This helps client authors avoid assuming the filter is applied collection-wide.


4946-4946: Mention strict-mode/indexing implications for update_filter

Consider a short note: when strict mode enforces unindexed_filtering_update=false, update_filter conditions should target indexed fields; otherwise, the request may be rejected. This helps set user expectations.

src/tonic/api/update_common.rs (3)

62-65: Filter conversion duplicated across handlers — extract helper to reduce repetition

The TryFrom + transpose pattern is correct. It’s duplicated in multiple places (upsert, update_vectors, batch branches). Consider a small helper to centralize conversion.

Add helper (outside the changed range):

fn try_convert_filter(
    f: Option<grpc::qdrant::Filter>,
) -> Result<Option<segment::types::Filter>, Status> {
    f.map(segment::types::Filter::try_from).transpose()
}

Then here replace with:

-        update_filter: update_filter
-            .map(segment::types::Filter::try_from)
-            .transpose()?,
+        update_filter: try_convert_filter(update_filter)?,

Repeat similarly in update_vectors and update_batch branches below.


160-163: Use the same helper for update_filter conversion

Same duplication here; apply the try_convert_filter helper if you adopt it.

-        update_filter: update_filter
-            .map(segment::types::Filter::try_from)
-            .transpose()?,
+        update_filter: try_convert_filter(update_filter)?,

558-562: Batch UpdateVectors branch: correct pass-through of update_filter

Looks good and consistent with the non-batch handler. Consider adding an integration test that:

  • sends multiple UpdateVectors ops where only a subset meets the filter,
  • verifies only those are updated.

I can draft these tests if helpful.

Also applies to: 572-573

lib/api/src/grpc/qdrant.rs (2)

6028-6031: Clarify nested upsert semantics; consider enabling validation here too

  • Suggest revising the doc to clarify behavior for existing non-matching points (left unchanged) vs inserts.
  • For consistency with top-level messages, consider deriving validator::Validate for this struct in the proto and annotating update_filter with #[validate(nested)].

Example doc tweak:

-        /// If specified, only points that match this filter will be updated, others will be inserted
+        /// Conditional upsert. Update existing points that match this filter.
+        /// Insert points that don't exist. Existing non-matching points are left unchanged.

If you decide to validate here as well, update the proto so this message derives validator::Validate and the field is marked for nested validation (then re-generate).


6093-6096: Make nested vector-update semantics explicit; optionally enable nested validation

  • Clarify that non-matching points are ignored (no insert).
  • For parity with top-level UpdatePointVectors, consider deriving validator::Validate for this struct in the proto and adding #[validate(nested)] to the field.

Suggested doc tweak:

-        /// If specified, only points that match this filter will be updated
+        /// Conditional vector update. Only existing points matching this filter will be updated; others are ignored.
lib/collection/src/shards/conversions.rs (1)

88-119: Reduce duplication between upsert builders

internal_conditional_upsert_points largely duplicates internal_upsert_points construction logic except for update_filter. Consider factoring a small helper that builds UpsertPoints from common parts and takes an Option<Filter>.

This will centralize conversions and reduce maintenance cost.

lib/collection/src/operations/point_ops.rs (3)

486-492: New ConditionalInsertOperationInternal: aligns with PR goals

Structure is clear and minimal. Consider adding a brief doc comment clarifying semantics: inserts always; updates only if condition matches existing point.


639-656: Payload updates: filter + id merge is the right semantics

Using update_filter.merge_with_ids([id]) ensures the payload update applies only when the condition holds for that specific point. This matches the conditional update intent.

Optional micro-optimization: avoid cloning the whole Filter per point by holding let base = &update_filter; and cloning only once inside merge_with_ids.

Also applies to: 681-695


1128-1151: Tests updated for new field: good baseline

Including update_filter: None in REST tests keeps schema alignment. Consider adding tests that exercise non-None update_filter in the appropriate suite (as planned).

I can add a focused unit/integration test that validates conditional upsert behavior for both batch and list variants. Want me to draft it?

lib/collection/src/collection_manager/segments_updater.rs (2)

600-618: Filter inversion to find excluded IDs is correct and efficient

must = HasId(ids)must_not = Filter(condition) yields exactly the set of provided IDs that do not satisfy the condition. Good reuse of points_by_filter.

Minor nit: rename closure argument in later usage from idx to id for readability.


619-642: Conditional upsert semantics implemented as intended

  • Only existing points are tested against the condition.
  • Non-matching existing points are dropped from the op.
  • Non-existing points are inserted unconditionally.

This aligns with the described behavior in the PR summary. One small readability tweak:

-    points_op.retain_point_ids(|idx| !points_to_exclude.contains(idx));
+    points_op.retain_point_ids(|id| !points_to_exclude.contains(id));
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d748f4c and 8206510.

📒 Files selected for processing (19)
  • docs/grpc/docs.md (4 hunks)
  • docs/redoc/master/openapi.json (3 hunks)
  • lib/api/src/grpc/proto/points.proto (4 hunks)
  • lib/api/src/grpc/qdrant.rs (4 hunks)
  • lib/api/src/rest/schema.rs (3 hunks)
  • lib/collection/src/collection_manager/segments_updater.rs (5 hunks)
  • lib/collection/src/operations/mod.rs (1 hunks)
  • lib/collection/src/operations/operation_effect.rs (1 hunks)
  • lib/collection/src/operations/point_ops.rs (15 hunks)
  • lib/collection/src/operations/vector_ops.rs (3 hunks)
  • lib/collection/src/shards/conversions.rs (5 hunks)
  • lib/collection/src/shards/remote_shard.rs (3 hunks)
  • lib/collection/src/shards/shard_holder/resharding.rs (2 hunks)
  • lib/collection/src/tests/sparse_vectors_validation_tests.rs (1 hunks)
  • lib/segment/src/types.rs (1 hunks)
  • lib/storage/src/rbac/ops_checks.rs (4 hunks)
  • src/common/update.rs (2 hunks)
  • src/tonic/api/update_common.rs (7 hunks)
  • tests/openapi/test_conditional_update.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
lib/collection/src/operations/operation_effect.rs (1)
lib/collection/src/collection_manager/segments_updater.rs (1)
  • conditional_upsert (619-641)
tests/openapi/test_conditional_update.py (2)
tests/openapi/helpers/helpers.py (1)
  • request_with_validation (41-116)
tests/openapi/conftest.py (2)
  • on_disk_vectors (11-12)
  • on_disk_payload (16-17)
🪛 Ruff (0.12.2)
tests/openapi/test_conditional_update.py

1-1: operator.itemgetter imported but unused

Remove unused import: operator.itemgetter

(F401)

🔇 Additional comments (21)
lib/collection/src/tests/sparse_vectors_validation_tests.rs (1)

72-73: Test update aligns with new API shape

Including update_filter: None keeps this test aligned with the new optional field and preserves existing validation behavior.

lib/collection/src/operations/operation_effect.rs (1)

53-55: Handle conditional upsert in effect-area estimation (correct)

Delegating to conditional_upsert.points_op.estimate_effect_area() is the right choice; the impacted IDs are exactly those in the inner points op.

lib/api/src/rest/schema.rs (1)

1169-1172: Add clarity to update_filter docs and alias for backward compatibility

Updated the doc comment to clearly state how update_filter behaves, and added a Serde alias so that incoming JSON using the old update_if field is still accepted:

@@ lib/api/src/rest/schema.rs:1169
-    /// If specified, only points that match this filter will be updated, others will be inserted
-    #[serde(default, skip_serializing_if = "Option::is_none")]
-    #[validate(nested)]
-    pub update_filter: Option<Filter>,
+    /// If specified, existing points that match this filter will be updated.
+    /// Existing points that do not match are left unchanged.
+    /// Non-existing points in the request will be inserted.
+    #[serde(
+        default,
+        skip_serializing_if = "Option::is_none",
+        alias = "update_if"
+    )]
+    #[validate(nested)]
+    pub update_filter: Option<Filter>,

No other occurrences of update_if or updateFilter were found in the codebase, so no further changes are needed.

lib/api/src/grpc/proto/points.proto (2)

166-167: LGTM: update_filter for vector updates

Adding an optional filter to gate vector updates is consistent and backward compatible in proto3 (unknown fields are ignored by older clients/servers).


821-822: LGTM: update_filter on batch UpdateVectors

The field placement and numbering look correct and consistent with the standalone UpdatePointVectors message.

lib/collection/src/shards/remote_shard.rs (1)

44-46: LGTM: imports updated for conditional upsert conversion

The new conversion is cleanly integrated alongside existing helpers.

lib/collection/src/shards/shard_holder/resharding.rs (2)

11-14: LGTM: imports for new conditional insert types

Imports are minimal and scoped.


522-557: Correct handling of conditional upsert in update-only mode

Destructuring ConditionalInsertOperationInternal and passing the condition into into_update_only(Some(condition)) is the right abstraction and preserves semantics during resharding.

lib/storage/src/rbac/ops_checks.rs (2)

454-456: RBAC parity for conditional upsert

Treating UpsertPointsConditional the same as UpsertPoints (whole-write access) is consistent and avoids privilege escalation via filters.


1186-1203: Test adds coverage for conditional upsert RBAC

Good addition to assert whole-write access for the new discriminant.

src/common/update.rs (1)

350-355: Destructure UpdateVectors incl. update_filter: LGTM

Clean extraction of update_filter alongside points/shard_key.

src/tonic/api/update_common.rs (3)

54-55: gRPC Upsert: new update_filter plumbed through: LGTM

Field extraction looks correct.


140-141: gRPC UpdateVectors: new update_filter plumbed through: LGTM

Extraction aligns with proto.


426-430: No action needed: Filter TryFrom error type converts to tonic::Status
Verified that in lib/api/src/grpc/conversions.rs the impl TryFrom<Filter> for segment::types::Filter defines type Error = Status, ensuring any conversion error is already a tonic::Status. The upsert pass-through at lines 426–430 (and similarly at 439–440) correctly propagates update_filter, so no changes are required here.

lib/collection/src/shards/conversions.rs (2)

83-86: Explicitly setting update_filter: None is good

Clear and future-proof in case defaults change on the proto side. LGTM.


177-181: Vector update: all grpc::Filter::from usages are intentional

A repo-wide search shows all occurrences of grpc::Filter::from live in either:

  • Auto‐generated GRPC/REST schema code
  • RBAC/operation checks
  • The two conversion sites in lib/collection/src/shards/conversions.rs (lines ~116 & ~200)

Those two spots correctly map the internal filter into the outgoing proto. No other grpc::Filter::from calls should be changed.

lib/collection/src/operations/point_ops.rs (3)

571-583: Propagating update_filter into UpdateVectorsOp is correct

Nice job threading the optional filter through the vector updates for both batch and list paths.

Also applies to: 669-673


741-770: Sharding support for ConditionalInsertOperationInternal looks solid

Condition is cloned per shard; points are split correctly. LGTM.


787-798: Enum wiring and trait impls updated comprehensively

  • New UpsertPointsConditional variant added and handled in is_write_operation/point_ids/retain_point_ids.
  • split_by_shard routing added.

All consistent. LGTM.

Also applies to: 801-835, 1017-1039

lib/collection/src/collection_manager/segments_updater.rs (2)

105-128: Vector updates: conditionally gate updates by excluding non-matching IDs

This is a clean approach: compute excluded IDs and reuse existing strict update_vectors (which enforces all-processed checks). Behavior when all are excluded returns 0 without error. LGTM.


654-660: Operation routing updated to use conditional paths

New UpsertPointsConditional and conditional vector updates are correctly wired into processing. LGTM.

Also applies to: 685-687

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8206510 and 41110c2.

📒 Files selected for processing (1)
  • lib/collection/src/operations/verification/update.rs (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: test-consistency
  • GitHub Check: test-consensus-compose
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: integration-tests
  • GitHub Check: integration-tests-consensus
  • GitHub Check: e2e-tests
  • GitHub Check: lint
  • GitHub Check: storage-compat-test
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
  • GitHub Check: rust-tests (ubuntu-latest)
🔇 Additional comments (1)
lib/collection/src/operations/verification/update.rs (1)

2-3: Imports align with new conditional update support

Adding PointsBatch and PointsList is appropriate given the new pattern matches below. No issues.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 41110c2 and 014d351.

📒 Files selected for processing (3)
  • lib/api/build.rs (1 hunks)
  • lib/api/src/grpc/qdrant.rs (4 hunks)
  • lib/collection/src/shards/conversions.rs (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • lib/api/src/grpc/qdrant.rs
  • lib/collection/src/shards/conversions.rs
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: generall
PR: qdrant/qdrant#7006
File: lib/collection/src/shards/remote_shard.rs:258-268
Timestamp: 2025-08-10T18:22:33.034Z
Learning: In Qdrant, the conditional upsert feature with `update_filter` is designed to be rolled out only after a full cluster upgrade to the next minor release. Version compatibility guards in the code are not required because the feature deployment strategy ensures all nodes will support the feature before it's used, and it doesn't change existing functionality for users not using the conditional updates.
Learnt from: generall
PR: qdrant/qdrant#7006
File: src/common/update.rs:272-281
Timestamp: 2025-08-10T18:20:42.203Z
Learning: In Qdrant, conditional updates with `update_filter` always apply the condition check to a small, pre-determined set of point IDs (those being updated) and never perform a full collection scan. Therefore, strict mode validation for unindexed filtering is not necessary for conditional update operations, as the performance concerns that strict mode addresses don't apply to these targeted operations.
Learnt from: generall
PR: qdrant/qdrant#7006
File: docs/redoc/master/openapi.json:13645-13652
Timestamp: 2025-08-10T18:26:12.421Z
Learning: In qdrant, docs/redoc/master/openapi.json is auto-generated from the REST schemas. To change field docs, edit lib/api/src/rest/schema.rs (e.g., add doc comments or #[schemars(description = ...)]). Specifically, UpdateVectors.update_filter lacked a description and should state: "If specified, only update vectors for points that match this filter; points not matching the filter are left unchanged."
📚 Learning: 2025-08-10T18:26:12.421Z
Learnt from: generall
PR: qdrant/qdrant#7006
File: docs/redoc/master/openapi.json:13645-13652
Timestamp: 2025-08-10T18:26:12.421Z
Learning: In qdrant, docs/redoc/master/openapi.json is auto-generated from the REST schemas. To change field docs, edit lib/api/src/rest/schema.rs (e.g., add doc comments or #[schemars(description = ...)]). Specifically, UpdateVectors.update_filter lacked a description and should state: "If specified, only update vectors for points that match this filter; points not matching the filter are left unchanged."

Applied to files:

  • lib/api/build.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: e2e-tests
  • GitHub Check: test-consistency
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: integration-tests-consensus
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: test-consensus-compose
  • GitHub Check: integration-tests
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
  • GitHub Check: storage-compat-test
  • GitHub Check: lint

@timvisee
Copy link
Member

timvisee commented Aug 11, 2025

How about naming it payload_condition, meaning it'll only update points if the payload condition is met? It referring to being an 'conditional update'.

I understand that update_filter tries to be similar to other places we support filtering. But I don't think we should consider them to be the same. In this case, the filter isn't our primary point selector.

I find update_filter a little bit confusing. It makes me think it is the primary point selector.

Thoughts?

Edit: or update_condition, because this only affects present points.

(I've yet to review the PR)

@generall
Copy link
Member Author

The condition can be not only on payload, but also on point id or has_vector.

I would also be ok with 'update_condition'.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (2)
lib/api/src/grpc/qdrant.rs (2)

6021-6035: Batched upsert: add explicit per-id semantics to update_filter doc

Good call adding Validate + nested. The doc mirrors the ambiguity from the non-batch variant. Clarify behavior for existing-but-non-matching points.

Regenerate with this proto doc:

-        /// If specified, only points that match this filter will be updated, others will be inserted
+        /// Conditional upsert per point id:
+        /// - If an existing point with the same id matches this filter, it will be updated.
+        /// - If no point with the id exists, it will be inserted.
+        /// - If an existing point with the same id does not match this filter, it will be left unchanged.
+        /// An empty filter matches all points.

6088-6102: Batched vector update: specify non-matching and non-existing behavior

Validation looks good. The doc should explicitly state that non-matching existing points are untouched and non-existing points are not inserted.

Update proto comment so the generated docs read:

-        /// If specified, only points that match this filter will be updated
+        /// Conditional vector update. For each provided point:
+        /// - If an existing point with the same id matches this filter, its vectors will be updated.
+        /// - If an existing point with the same id does not match this filter, it will be left unchanged.
+        /// - Points that do not exist will not be inserted.
+        /// An empty filter matches all points.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 014d351 and 2c18268.

📒 Files selected for processing (2)
  • lib/api/build.rs (2 hunks)
  • lib/api/src/grpc/qdrant.rs (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • lib/api/build.rs
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: generall
PR: qdrant/qdrant#7006
File: lib/collection/src/shards/remote_shard.rs:258-268
Timestamp: 2025-08-10T18:22:33.034Z
Learning: In Qdrant, the conditional upsert feature with `update_filter` is designed to be rolled out only after a full cluster upgrade to the next minor release. Version compatibility guards in the code are not required because the feature deployment strategy ensures all nodes will support the feature before it's used, and it doesn't change existing functionality for users not using the conditional updates.
Learnt from: generall
PR: qdrant/qdrant#7006
File: src/common/update.rs:272-281
Timestamp: 2025-08-10T18:20:42.203Z
Learning: In Qdrant, conditional updates with `update_filter` always apply the condition check to a small, pre-determined set of point IDs (those being updated) and never perform a full collection scan. Therefore, strict mode validation for unindexed filtering is not necessary for conditional update operations, as the performance concerns that strict mode addresses don't apply to these targeted operations.
📚 Learning: 2025-08-10T18:26:12.421Z
Learnt from: generall
PR: qdrant/qdrant#7006
File: docs/redoc/master/openapi.json:13645-13652
Timestamp: 2025-08-10T18:26:12.421Z
Learning: In qdrant, docs/redoc/master/openapi.json is auto-generated from the REST schemas. To change field docs, edit lib/api/src/rest/schema.rs (e.g., add doc comments or #[schemars(description = ...)]). Specifically, UpdateVectors.update_filter lacked a description and should state: "If specified, only update vectors for points that match this filter; points not matching the filter are left unchanged."

Applied to files:

  • lib/api/src/grpc/qdrant.rs
📚 Learning: 2025-08-10T18:30:02.965Z
Learnt from: generall
PR: qdrant/qdrant#7006
File: lib/collection/src/operations/verification/update.rs:158-174
Timestamp: 2025-08-10T18:30:02.965Z
Learning: In Qdrant's strict mode verification code (lib/collection/src/operations/verification/update.rs), exhaustive pattern matching without `..` is intentionally used for structs like PointsBatch and PointsList. This design pattern ensures compilation fails when new fields are added, forcing developers to explicitly consider how new fields should be handled in the indexed_filter_write method. This provides visibility and compile-time safety for struct evolution.

Applied to files:

  • lib/api/src/grpc/qdrant.rs
📚 Learning: 2025-08-10T18:22:33.034Z
Learnt from: generall
PR: qdrant/qdrant#7006
File: lib/collection/src/shards/remote_shard.rs:258-268
Timestamp: 2025-08-10T18:22:33.034Z
Learning: In Qdrant, the conditional upsert feature with `update_filter` is designed to be rolled out only after a full cluster upgrade to the next minor release. Version compatibility guards in the code are not required because the feature deployment strategy ensures all nodes will support the feature before it's used, and it doesn't change existing functionality for users not using the conditional updates.

Applied to files:

  • lib/api/src/grpc/qdrant.rs
📚 Learning: 2025-08-10T18:20:42.203Z
Learnt from: generall
PR: qdrant/qdrant#7006
File: src/common/update.rs:272-281
Timestamp: 2025-08-10T18:20:42.203Z
Learning: In Qdrant, conditional updates with `update_filter` always apply the condition check to a small, pre-determined set of point IDs (those being updated) and never perform a full collection scan. Therefore, strict mode validation for unindexed filtering is not necessary for conditional update operations, as the performance concerns that strict mode addresses don't apply to these targeted operations.

Applied to files:

  • lib/api/src/grpc/qdrant.rs
🧬 Code Graph Analysis (1)
lib/api/src/grpc/qdrant.rs (2)
lib/api/src/rest/schema.rs (3)
  • validate (49-58)
  • validate (131-140)
  • validate (1248-1253)
lib/api/src/grpc/validate.rs (8)
  • validate (14-14)
  • validate (19-21)
  • validate (29-31)
  • validate (39-41)
  • validate (49-54)
  • validate (58-64)
  • validate (68-74)
  • validate (78-85)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: e2e-tests
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: integration-tests-consensus
  • GitHub Check: test-consistency
  • GitHub Check: test-consensus-compose
  • GitHub Check: integration-tests
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: lint
  • GitHub Check: storage-compat-test

Comment on lines +4367 to +4372
/// If specified, only points that match this filter will be updated, others will be inserted
#[prost(message, optional, tag = "6")]
#[validate(nested)]
pub update_filter: ::core::option::Option<Filter>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

❓ Verification inconclusive

Clarify conditional upsert semantics; current doc is ambiguous

Doc says “only points that match this filter will be updated, others will be inserted,” which is misleading for existing-but-non-matching points. Explicitly document per-id behavior.

Apply this change (in the proto so it regenerates here):

-    /// If specified, only points that match this filter will be updated, others will be inserted
+    /// Conditional upsert per point id:
+    /// - If an existing point with the same id matches this filter, it will be updated.
+    /// - If no point with the id exists, it will be inserted.
+    /// - If an existing point with the same id does not match this filter, it will be left unchanged.
+    /// An empty filter matches all points.

Note: Validation is correctly wired via #[validate(nested)] — no change needed there.

To confirm the rename is complete and no stale names remain, you can run:


🏁 Script executed:

#!/bin/bash
set -euo pipefail
# Ensure we don't have legacy names lingering
rg -n 'update_if|updateIf'

Length of output: 48


Enhance conditional upsert documentation for all update_filter fields

The nested validation is correctly applied via #[validate(nested)], but the doc comments should clearly spell out per‐ID semantics to avoid ambiguity. Please update the update_filter descriptions in your .proto definitions (so they propagate into lib/api/src/grpc/qdrant.rs) as follows:

• Affected locations

  • UpsertPoints.update_filter
  • UpdatePointVectors.update_filter
  • PointsUpdateOperation.PointStructList.update_filter
  • PointsUpdateOperation.UpdateVectors.update_filter

• Proposed diff for each update_filter field:

-    /// If specified, only points that match this filter will be updated, others will be inserted
+    /// Conditional upsert per point id:
+    /// - If an existing point with the same id matches this filter, it will be updated.
+    /// - If no point with that id exists, it will be inserted.
+    /// - If an existing point with the same id does not match this filter, it will be left unchanged.
+    /// An empty filter matches all points.

• Stale‐name check confirmed:

rg -n 'update_if|updateIf'

→ no matches found

🤖 Prompt for AI Agents
In lib/api/src/grpc/qdrant.rs around lines 4367 to 4370, the generated doc
comment for the update_filter field is ambiguous about per-ID semantics; update
the source .proto definitions for the following
fields—UpsertPoints.update_filter, UpdatePointVectors.update_filter,
PointsUpdateOperation.PointStructList.update_filter, and
PointsUpdateOperation.UpdateVectors.update_filter—to include a clear description
that the filter is applied per-point ID on update (i.e., when an incoming point
matches an existing point ID the filter decides whether that specific point is
updated; if it does not match the filter the point will be inserted), keep the
nested validation attribute, and regenerate the Rust bindings so the clarified
doc comments propagate into lib/api/src/grpc/qdrant.rs.

Comment on lines +4453 to +4458
/// If specified, only points that match this filter will be updated
#[prost(message, optional, tag = "6")]
#[validate(nested)]
pub update_filter: ::core::option::Option<Filter>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Make vector-update behavior explicit for non-matching and non-existing points

The brief doc doesn’t state what happens if a point doesn’t match the filter or doesn’t exist. Please make it explicit to avoid misuse.

Update the source proto comment so it regenerates here:

-    /// If specified, only points that match this filter will be updated
+    /// Conditional vector update. For each provided point:
+    /// - If an existing point with the same id matches this filter, its vectors will be updated.
+    /// - If an existing point with the same id does not match this filter, it will be left unchanged.
+    /// - Points that do not exist will not be inserted.
+    /// An empty filter matches all points.

Validation via #[validate(nested)] is correct.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// If specified, only points that match this filter will be updated
#[prost(message, optional, tag = "6")]
#[validate(nested)]
pub update_filter: ::core::option::Option<Filter>,
/// Conditional vector update. For each provided point:
/// - If an existing point with the same id matches this filter, its vectors will be updated.
/// - If an existing point with the same id does not match this filter, it will be left unchanged.
/// - Points that do not exist will not be inserted.
/// An empty filter matches all points.
#[prost(message, optional, tag = "6")]
#[validate(nested)]
pub update_filter: ::core::option::Option<Filter>,
🤖 Prompt for AI Agents
In lib/api/src/grpc/qdrant.rs around lines 4453 to 4456, the protobuf field
comment for update_filter is ambiguous about behavior when a point does not
match the filter or when a point id does not exist; update the source proto
comment to explicitly state the semantics (e.g., "If a point does not match the
filter, it will not be updated; non-existing point IDs are silently ignored /
result in no-op" — pick the actual intended behavior), keep the
#[validate(nested)] attribute, and then regenerate the Rust code from the
updated proto so the clarified comment appears here.

Copy link
Member

@timvisee timvisee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Did quite a bit of testing, everything looks good.

Only have some minor style remarks, no logic bugs:

coderabbitai[bot]

This comment was marked as resolved.

@timvisee timvisee added the do not merge Pull requests blocked on an external event label Aug 11, 2025
Comment on lines +120 to +123
let point_ids: Vec<_> = points.iter().map(|point| point.id).collect();

let points_to_exclude =
select_excluded_by_filter_ids(segments, point_ids, filter_condition, hw_counter)?;
Copy link
Contributor

@coszio coszio Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that there is no feedback to the user in case some update was rejected. IMO, one of the main benefits of this feature is to let the user choose what to do when there is a collision.

For instance, in If-Match http header, a 412 Precondition Failed error is the suggested error.

Without this error, if a user would want to verify that the request was actually applied, assuming wait=True, they would need to poll for the points and verify the condition themselves. And even in this scenario, it would be difficult to implement something like a counter which is based solely on the counter itself.

The problem with returning an error is that a batch update would need to be rejected entirely if the condition is not met (even partially). So it would change the semantic to: "perform this update if all of the points fulfill this condition", which I think is fine.

Copy link
Member

@timvisee timvisee Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would be nice to have some form of feedback. And I like the idea of HTTP 412 (well, for certain use cases).

However, we currently have no way to reliably check the filter as precondition. We don't have a transactional system to make this happen, and we should not lock all segments during a check-and-apply phase.

The problem with returning an error is that a batch update would need to be rejected entirely if the condition is not met (even partially). So it would change the semantic to: "perform this update if all of the points fulfill this condition", which I think is fine.

Yes. This is already a problem with an upsert that contains just two points.

We could implement some sort of feedback for wait=true requests, where it would return the point IDs that were updated. But that remains quite lacking.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it would change the semantic to: "perform this update if all of the points fulfill this condition", which I think is fine.

This breaks the use-case. IMO points should be partially updated.

coderabbitai[bot]

This comment was marked as resolved.

@generall generall merged commit c188f5b into dev Aug 15, 2025
16 checks passed
@generall generall deleted the update-if branch August 15, 2025 13:47
ffuugoo added a commit that referenced this pull request Aug 18, 2025
timvisee added a commit that referenced this pull request Aug 26, 2025
* implement condition parameter for upsert operation

* fmt

* update api schema

* implement conditional update for update-vectors op

* rename to update_filter for consistency

* add tests

* explicilty ignore strict mode for conditional updates

* rabbit review fix

* Also assert vector element before, confirm we don't normalize

* Simplify filter creation using new helper

* Rename merge_with_ids to with_point_ids

---------

Co-authored-by: timvisee <tim@visee.me>
Co-authored-by: Tim Visée <tim+github@visee.me>
@timvisee timvisee removed the do not merge Pull requests blocked on an external event label Nov 14, 2025
@timvisee timvisee mentioned this pull request Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants