Skip to content

[MMR] expose as query variant in rest and grpc#6797

Merged
coszio merged 14 commits intodevfrom
expose-mmr
Jul 11, 2025
Merged

[MMR] expose as query variant in rest and grpc#6797
coszio merged 14 commits intodevfrom
expose-mmr

Conversation

@coszio
Copy link
Contributor

@coszio coszio commented Jul 2, 2025

Builds on top of #6786

Exposes MMR reranking as a query variant in REST and gRPC interfaces:

Looks like this:

POST /collections/{collection_name}/points/query
{
  "query": { 
    "nearest": "<vector input>", // any vector input (inference, id, raw vector)
    // ⬇️ new optional parameter! 
    "mmr": {
      "diversity": 0.7 // coefficient 0-1
      "candidates_limit": 50 // limit for nearest neighbors search before reranking
    } 
  },
  "limit": 10,
}

This PR also includes a very needed refactor to the way we convert from ShardQueryRequest to PlannedQuery. This was done here since I was already fixing a bug introduced during latest MMR interface refactoring 🐛 🔨 .

@coszio coszio added this to the MMR milestone Jul 3, 2025
@coszio coszio force-pushed the mmr-in-collection branch from 941c1fa to 1c2145a Compare July 4, 2025 21:31
@coszio coszio force-pushed the mmr-in-collection branch from 988089e to 94248b1 Compare July 7, 2025 20:53
Base automatically changed from mmr-in-collection to dev July 7, 2025 21:18
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 8, 2025

📝 Walkthrough
## Walkthrough

This change introduces support for Maximal Marginal Relevance (MMR) re-ranking in nearest neighbor search queries across the API, protobuf definitions, validation, REST and gRPC schemas, and internal query handling. New message types and fields for MMR are added to the API schemas and protobufs, including optional parameters for diversity (trade-off between relevance and diversity) and candidates_limit. The query planning and collection logic are updated to handle the new MMR-based search variant, including rescoring and result filtering with score thresholding integrated into the MMR algorithm. Documentation and comments are updated to use "Maximal" instead of "Maximum" Marginal Relevance. No existing fields or message definitions are removed; the changes are additive.

## Possibly related issues

- qdrant/qdrant-client#1017: This issue requests MMR support in the qdrant-client library; while this PR focuses on server-side MMR integration, it addresses complementary objectives related to MMR query handling.

## Possibly related PRs

- qdrant/qdrant#6786: Adds internal support for MMR scoring at the collection level with new internal protobuf messages and query handling, complementing this PR's API and query structure changes for MMR.
- qdrant/qdrant#6768: Implements the internal MMR algorithm and rescoring logic at the collection and shard levels, directly related to this PR’s API and query structure additions for exposing MMR functionality.

## Suggested reviewers

- timvisee
- generall

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d2050db and 03a4ed3.

📒 Files selected for processing (13)
  • docs/redoc/master/openapi.json (1 hunks)
  • lib/api/build.rs (1 hunks)
  • lib/api/src/grpc/proto/points_internal_service.proto (2 hunks)
  • lib/api/src/grpc/qdrant.rs (4 hunks)
  • lib/api/src/rest/schema.rs (2 hunks)
  • lib/collection/src/collection/mmr.rs (16 hunks)
  • lib/collection/src/collection/query.rs (4 hunks)
  • lib/collection/src/operations/universal_query/collection_query.rs (10 hunks)
  • lib/collection/src/operations/universal_query/planned_query.rs (4 hunks)
  • lib/collection/src/operations/universal_query/shard_query.rs (7 hunks)
  • src/common/inference/query_requests_grpc.rs (2 hunks)
  • src/common/inference/query_requests_rest.rs (3 hunks)
  • tests/openapi/test_query.py (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (12)
  • lib/api/src/grpc/proto/points_internal_service.proto
  • lib/api/build.rs
  • src/common/inference/query_requests_grpc.rs
  • tests/openapi/test_query.py
  • lib/api/src/rest/schema.rs
  • src/common/inference/query_requests_rest.rs
  • lib/api/src/grpc/qdrant.rs
  • lib/collection/src/collection/query.rs
  • lib/collection/src/operations/universal_query/shard_query.rs
  • lib/collection/src/collection/mmr.rs
  • docs/redoc/master/openapi.json
  • lib/collection/src/operations/universal_query/collection_query.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-consistency
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-low-resources
  • GitHub Check: lint
  • GitHub Check: integration-tests-consensus
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: test-consensus-compose
  • GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
  • GitHub Check: integration-tests
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: storage-compat-test
🔇 Additional comments (7)
lib/collection/src/operations/universal_query/planned_query.rs (7)

89-93: Good improvement: Early validation of prefetch depth.

Moving the depth check to the beginning of the method provides early validation and prevents unnecessary processing for invalid requests.


110-119: Correct MMR vector handling.

The with_vector adjustment for MMR queries correctly merges the existing with_vector setting with the vector used in the MMR query, ensuring the required vectors are available for the MMR algorithm.


121-142: Excellent refactoring: Separation of concerns.

The split into root_plan_without_prefetches and root_plan_with_prefetches methods improves code organization and readability by clearly separating the logic for different query types.


149-181: Clean implementation of non-prefetch query handling.

The root_plan_without_prefetches method properly handles simple queries by creating a single source without rescore parameters, which is appropriate for queries that don't require merging multiple sources.


203-218: Correct MMR conversion to nearest neighbor rescoring.

The MMR handling correctly converts the MMR query to a nearest neighbor rescoring query with the candidates_limit, which aligns with the MMR algorithm's two-stage process: first retrieving candidates, then applying diversity reranking.


306-416: Well-structured centralization of leaf source creation.

The leaf_source_from_scoring_query function effectively centralizes the logic for creating leaf sources from different ScoringQuery variants. The function handles all query types appropriately:

  • Vector queries create CoreSearchRequest entries
  • MMR queries convert to nearest neighbor searches with candidates_limit
  • OrderBy and Sample queries create scroll requests
  • Fusion and Formula queries correctly return errors when used without prefetches

375-398: Consistent MMR leaf source implementation.

The MMR handling in leaf_source_from_scoring_query correctly creates a nearest neighbor search with the candidates_limit, maintaining consistency with the MMR conversion logic in the prefetch handling path.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (12)
lib/collection/src/collection/mmr.rs (1)

237-249: Doc/function naming mismatch – consider aligning terminology

Great to see the wording updated to “Maximal Marginal Relevance”.
However, the public function is still named maximum_marginal_relevance, which now contradicts the docs and the rest of the project (all proto/REST updates use maximal). Renaming the fn (plus its callers & tests) will avoid future confusion.

-pub fn maximum_marginal_relevance(
+pub fn maximal_marginal_relevance(

If the rename is too noisy for this PR, at least add a #[deprecated(note = "use maximal_marginal_relevance")] wrapper and introduce the correctly-named alias.

src/common/inference/batch_processing.rs (1)

316-320: Unnecessary field initialisation

NearestQuery::mmr defaults to None, so the explicit mmr: None is redundant in this test construction.
Feel free to keep it for clarity though.

tools/generate_grpc_docs.sh (1)

3-10: Shell script portability & temp-dir location

  1. set -ex will dump every command into CI logs; if logs contain secrets (e.g., Docker creds) consider gating -x behind a verbose flag.
  2. mktemp -d ./qdrant_docs.XXXXXXXXXX creates the dir in the repo root. Parallel runs (e.g., CI matrix) could still collide. Using $TMPDIR is safer.
- TEMPD=$(mktemp -d ./qdrant_docs.XXXXXXXXXX)
+ TEMPD=$(mktemp -d "${TMPDIR:-/tmp}/qdrant_docs.XXXXXXXXXX")
src/common/inference/query_requests_rest.rs (1)

199-216: REST MMR conversion logic is correct but could be simplified.

The logic correctly branches between NearestWithMmr and Nearest variants based on the presence of the mmr field. However, the long module path on line 208 could be simplified.

Consider adding an import alias to simplify the code:

 use collection::operations::universal_query::collection_query::{
-    CollectionPrefetch, CollectionQueryGroupsRequest, CollectionQueryRequest, Mmr, Query,
+    CollectionPrefetch, CollectionQueryGroupsRequest, CollectionQueryRequest, Mmr, NearestWithMmr, Query,
     VectorInputInternal, VectorQuery,
 };

Then update line 208:

-                Ok(Query::Vector(VectorQuery::NearestWithMmr(
-                    collection::operations::universal_query::collection_query::NearestWithMmr {
-                        nearest: vector,
-                        mmr,
-                    },
-                )))
+                Ok(Query::Vector(VectorQuery::NearestWithMmr(NearestWithMmr {
+                    nearest: vector,
+                    mmr,
+                })))
docs/redoc/master/openapi.json (1)

14670-14677: Missing default & questionable nullable on lambda

lambda is documented as “default 0.5”, yet no default field is present.
Also, making the value nullable contradicts the statement that it “must be in the range [0, 1]”.

           "lambda": {
             ...
-            "nullable": true
+            "default": 0.5
           },

If null is truly acceptable, adjust the docstring to say so and keep the constraint logic consistent.

lib/api/src/grpc/proto/points.proto (1)

643-650: Clarify optionality & presence semantics of mmr field

The top-level comment says the MMR block is optional, but the field is declared as a plain Mmr mmr = 2; (no optional).
While message-typed fields are implicitly optional in proto3, readers may read the comment and look for the optional keyword.
Consider adding the keyword for consistency with the scalar fields directly below, e.g.

-    Mmr mmr = 2;
+    optional Mmr mmr = 2;

This keeps the public schema self-documenting and avoids the “is it required?” question.

lib/api/src/grpc/qdrant.rs (1)

5597-5599: Fix inconsistent comment formatting.

The comment formatting uses /// / instead of the standard /// format used elsewhere in the codebase.

-    /// / The lambda parameter for the MMR algorithm.
-    /// / Determines the balance between diversity and relevance.
-    /// /
-    /// / A higher value favors relevance (similarity to the query vector), while a lower value favors diversity.
+    /// The lambda parameter for the MMR algorithm.
+    /// Determines the balance between diversity and relevance.
+    ///
+    /// A higher value favors relevance (similarity to the query vector), while a lower value favors diversity.
docs/grpc/docs.md (4)

194-202: Fix unordered-list indentation in TOC

The two newly-added lines are indented with four spaces, whereas the surrounding list items use two. This is what triggers the markdown-lint MD007 warnings.

-    - [Mmr](#qdrant-Mmr)
-    - [NearestInputWithMmr](#qdrant-NearestInputWithMmr)
+  - [Mmr](#qdrant-Mmr)
+  - [NearestInputWithMmr](#qdrant-NearestInputWithMmr)

3390-3401: Table row is split → broken pipes & linter error (MD055)

The candidate_limit row is rendered as two separate lines, so the table loses its trailing pipe and breaks:

| candidate_limit | [uint32](#uint32) | optional | The maximum number of candidates to consider for re-ranking.
If not specified, the `limit` value is used. |

Join the description onto one line (or insert <br>), then keep the closing |:

-| candidate_limit | [uint32](#uint32) | optional | The maximum number of candidates to consider for re-ranking.
-If not specified, the `limit` value is used. |
+| candidate_limit | [uint32](#uint32) | optional | The maximum number of candidates to consider for re-ranking.<br/>If not specified, the `limit` value is used. |

3395-3399: Remove stray “/” characters in the lambda description

The slashes render literally in Markdown and confuse the sentence:

-...parameter for the MMR algorithm. / Determines the balance between diversity and relevance. / / A higher value...
+...parameter for the MMR algorithm. Determines the balance between diversity and relevance. A higher value...

3514-3524: Minor wording tweak for NearestInputWithMmr description

Consider “re-rank using the same vector” → “re-rank using that vector” to avoid ambiguity and keep wording tight.

No functional issues; purely editorial.

lib/collection/src/operations/universal_query/collection_query.rs (1)

129-139: Consider using copied() instead of map(|x| **x) for clarity.

The implementation is correct, but you can make it slightly more idiomatic.

 pub fn get_referenced_ids(&self) -> Vec<PointIdType> {
     match self {
         Self::Vector(vector_query) => vector_query
             .get_referenced_ids()
             .iter()
-            .map(|x| **x)
+            .copied()
             .collect(),
         Self::Fusion(_) | Self::OrderBy(_) | Self::Formula(_) | Self::Sample(_) => Vec::new(),
     }
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 137d71a and 3a256e9.

📒 Files selected for processing (22)
  • docs/grpc/docs.md (4 hunks)
  • docs/redoc/master/openapi.json (1 hunks)
  • lib/api/build.rs (1 hunks)
  • lib/api/src/grpc/proto/points.proto (2 hunks)
  • lib/api/src/grpc/proto/points_internal_service.proto (1 hunks)
  • lib/api/src/grpc/qdrant.rs (3 hunks)
  • lib/api/src/grpc/validate.rs (1 hunks)
  • lib/api/src/rest/conversions.rs (1 hunks)
  • lib/api/src/rest/schema.rs (2 hunks)
  • lib/collection/src/collection/mmr.rs (1 hunks)
  • lib/collection/src/collection/query.rs (1 hunks)
  • lib/collection/src/common/fetch_vectors.rs (2 hunks)
  • lib/collection/src/common/retrieve_request_trait.rs (1 hunks)
  • lib/collection/src/operations/universal_query/collection_query.rs (10 hunks)
  • lib/collection/src/operations/universal_query/planned_query.rs (4 hunks)
  • lib/collection/src/operations/universal_query/shard_query.rs (3 hunks)
  • lib/collection/src/shards/local_shard/query.rs (2 hunks)
  • src/common/inference/batch_processing.rs (1 hunks)
  • src/common/inference/batch_processing_grpc.rs (1 hunks)
  • src/common/inference/query_requests_grpc.rs (2 hunks)
  • src/common/inference/query_requests_rest.rs (3 hunks)
  • tools/generate_grpc_docs.sh (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
lib/api/src/grpc/validate.rs (2)
Learnt from: generall
PR: qdrant/qdrant#6274
File: lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs:233-233
Timestamp: 2025-03-29T17:43:28.174Z
Learning: In the qdrant codebase, `debug_assert!` is preferred over runtime checks that would panic in release builds to ensure production stability, even when validating conditions like vector sorting.
Learnt from: xzfc
PR: qdrant/qdrant#6245
File: lib/segment/src/index/hnsw_index/point_scorer.rs:116-121
Timestamp: 2025-04-22T23:17:41.734Z
Learning: The method `is_none_or` exists in the Qdrant codebase and compiles correctly, despite not being part of standard Rust's Option type. Code reviews should verify compilation issues before reporting them.
🧬 Code Graph Analysis (3)
lib/collection/src/common/retrieve_request_trait.rs (1)
lib/collection/src/operations/universal_query/collection_query.rs (2)
  • get_lookup_collection (426-428)
  • get_lookup_collection (535-537)
src/common/inference/query_requests_grpc.rs (1)
src/common/inference/query_requests_rest.rs (1)
  • convert_vector_input_with_inferred (148-191)
lib/collection/src/operations/universal_query/collection_query.rs (1)
lib/collection/src/common/retrieve_request_trait.rs (1)
  • get_referenced_ids (135-139)
🪛 markdownlint-cli2 (0.17.2)
docs/grpc/docs.md

194-194: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


195-195: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


196-196: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


197-197: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


198-198: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


199-199: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


200-200: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


201-201: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


202-202: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


3398-3398: Table pipe style
Expected: leading_and_trailing; Actual: leading_only; Missing trailing pipe

(MD055, table-pipe-style)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: test-consensus-compose
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-consistency
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: integration-tests
  • GitHub Check: integration-tests-consensus
  • GitHub Check: storage-compat-test
  • GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: lint
🔇 Additional comments (38)
lib/api/src/grpc/proto/points_internal_service.proto (1)

275-283: 👍 Terminology fix is consistent with API docs
No further action required.

lib/api/src/rest/conversions.rs (1)

146-150: Loss of future MMR data

QueryInterface::Nearest can’t currently carry an MMR payload, so forcing mmr: None is fine.
If a NearestWithMmr variant is added later, this arm will silently drop the data. Consider a separate enum variant or:

QueryInterface::Nearest(nearest @ VectorInput { .., mmr }) => {
    Query::Nearest(NearestQuery { nearest, mmr })
}

Not blocking, just a heads-up for forward compatibility.

lib/api/src/grpc/validate.rs (1)

342-342: LGTM! Validation pattern is consistent.

The new match arm for NearestWithMmr follows the established validation pattern correctly.

src/common/inference/batch_processing_grpc.rs (1)

165-171: LGTM! Proper handling of optional vector input.

The implementation correctly handles the new NearestWithMmr variant by extracting the optional nearest field and processing it through the existing vector input collection logic. The use of transpose()? properly handles the error propagation.

lib/collection/src/common/retrieve_request_trait.rs (1)

108-115: LGTM! Refactoring improves efficiency.

The simplification to directly clone self.referenced_ids instead of reconstructing them from the vector query is a good optimization. This suggests the referenced IDs are now pre-computed and stored, which is more efficient.

lib/collection/src/shards/local_shard/query.rs (2)

427-427: LGTM! Correct terminology used.

The change from "Maximum" to "Maximal" Marginal Relevance is correct terminology.


471-472: Confirm intentional change: MMR results use retain like in collection/query.rs

The change in lib/collection/src/shards/local_shard/query.rs now mirrors the pattern in lib/collection/src/collection/query.rs, where:

  • Fused results (which are already sorted) use .take_while(|p| p.score >= score_threshold)
  • MMR results use .retain(|p| p.score >= score_threshold)

This ensures we only assume sorted input for the fused branch and apply a full‐scan filter for MMR rescoring. No further action needed.

lib/api/build.rs (1)

293-296: LGTM! Appropriate validation constraints for MMR parameters.

The validation rules are well-designed:

  • lambda range [0.0, 1.0] is correct for MMR's relevance/diversity trade-off parameter
  • candidate_limit max of 16,384 provides reasonable resource protection
  • Empty constraints for complex fields correctly defer to nested validation
lib/collection/src/operations/universal_query/shard_query.rs (2)

69-69: Documentation terminology updated correctly.

Good change to use "Maximal" instead of "Maximum" for consistency with standard MMR terminology.


123-124: MMR intermediate-results logic – validated
Returning false for ScoringQuery::Mmr(_) in needs_intermediate_results() is correct. The plan in collection_query.rs (lines 387–392) first gathers the nearest‐neighbor lists across all shards, then performs a single mmr_rescore on the aggregated results. No intermediate‐results merging per shard is needed.

lib/api/src/rest/schema.rs (2)

478-484: MMR integration into NearestQuery looks good.

The optional mmr field maintains backward compatibility while enabling MMR reranking functionality. The documentation clearly explains the purpose.


537-556: Mmr struct validation is well-designed.

The validation constraints are appropriate:

  • lambda range [0,1] matches MMR algorithm requirements
  • candidate_limit maximum of 16,384 prevents expensive queries
  • Optional fields with defaults provide good UX

The documentation clearly explains the parameters' effects on relevance vs diversity trade-off.

src/common/inference/query_requests_grpc.rs (2)

8-9: Import additions are correct.

The new imports Mmr and NearestWithMmr are properly added to support the MMR functionality.


295-311: gRPC MMR conversion implementation is correct.

The implementation properly:

  • Validates required fields with clear error messages
  • Reuses existing vector conversion logic
  • Safely converts candidate_limit from u32 to usize
  • Creates the appropriate NearestWithMmr internal variant

The error handling is consistent with other variants in the same function.

src/common/inference/query_requests_rest.rs (2)

5-5: Import addition is correct.

Adding the Mmr import to support the new MMR functionality.


449-449: Test update is appropriate.

The test correctly includes the new mmr: None field to match the updated NearestQuery struct schema.

lib/collection/src/common/fetch_vectors.rs (3)

411-413: LGTM: Function signature simplified

The removal of lifetime annotations from the function signature is a good simplification that makes the function easier to work with by using owned data instead of references.


417-421: LGTM: Improved query handling generalization

The change from specifically checking for Query::Vector variant to using the generic Query::get_referenced_ids method is a good design improvement that:

  • Makes the code more extensible to support new query variants (like MMR)
  • Follows the principle of polymorphism by using a common interface
  • Reduces coupling between this function and specific query types

425-425: LGTM: Consistent with owned data pattern

The change to use owned referenced_ids in CollectionQueryResolveRequest is consistent with the function signature changes and aligns with the overall refactoring to use owned data instead of references.

lib/collection/src/collection/query.rs (1)

408-412: LGTM: Efficient in-place filtering optimization

The refactoring from conditional vector reconstruction to using retain is a good performance optimization that:

  • Avoids unnecessary memory allocations by filtering in-place
  • Maintains the same filtering semantics (removing points below score threshold)
  • Follows idiomatic Rust patterns for efficient vector filtering
  • Improves code readability by eliminating conditional logic

This change aligns well with the overall MMR integration while optimizing performance.

lib/api/src/grpc/proto/points.proto (2)

652-664: Fill the missing tag #1 and reserve it to avoid future collisions

The Mmr message jumps straight to tags 2 and 3, leaving tag 1 unused and un-reserved:

optional float  lambda          = 2;
optional uint32 candidate_limit = 3;

Even though Protobuf allows gaps, forgetting to reserve the gap opens a foot-gun: someone can later reuse =1, changing the wire format.
Add an explicit reservation (or re-number before the message ships):

 message Mmr {
-    optional float  lambda          = 2;
-    optional uint32 candidate_limit = 3;
+    // tag 1 intentionally unused – reserved for backward compatibility
+    reserved 1;
+
+    optional float  lambda          = 2; // 0.0–1.0
+    optional uint32 candidate_limit = 3; // defaults to `limit`
 }

Would you like a follow-up patch across all generated .rs code to reflect the new reservation?


676-677: Confirm back-compatibility of new Query.variant value = 9

Adding nearest_with_mmr = 9 looks good (no collision).
Please double-check that every Protobuf consumer (REST ↔ gRPC bridges, SDKs, etc.) handles the unknown variant gracefully to avoid silent failures when older clients parse the enum.

lib/api/src/grpc/qdrant.rs (5)

5580-5590: LGTM! Well-structured MMR input definition.

The NearestInputWithMmr struct is properly designed with appropriate validation annotations and clear documentation. The combination of nearest vector input with optional MMR parameters provides the expected flexibility for the feature.


5601-5609: LGTM! Appropriate validation constraints for MMR parameters.

The validation ranges are well-chosen:

  • Lambda range [0.0, 1.0] correctly constrains the relevance-diversity trade-off parameter
  • Candidate limit max of 16,384 provides reasonable upper bounds while preventing excessive resource usage

5650-5652: LGTM! Proper variant addition with consistent naming.

The new NearestWithMmr variant is correctly added with:

  • Proper protobuf tag (9)
  • Consistent naming convention
  • Correct reference to the NearestInputWithMmr struct
  • Accurate documentation using "Maximal Marginal Relevance"

10162-10162: LGTM! Consistent terminology update.

The comment correctly uses "Maximal Marginal Relevance" instead of "Maximum Marginal Relevance", aligning with the PR's objective to standardize the terminology.


5616-5616: Protobuf tag consistency confirmed for NearestWithMmr

The #[prost(oneof…, tags = "1, 2, 3, 4, 5, 6, 7, 8, 9")] list correctly includes tag 9, and the NearestWithMmr variant is defined with #[prost(message, tag = "9")]. No changes are needed.

docs/grpc/docs.md (1)

4004-4005: Good addition – new query variant clearly documented

The new field is correctly marked as optional and its purpose is clear. 👍

lib/collection/src/operations/universal_query/collection_query.rs (5)

26-26: LGTM!

The default lambda value of 0.5 is a reasonable choice for MMR, providing a balanced trade-off between relevance and diversity.


61-63: Good refactoring for better ownership semantics.

Storing owned referenced_ids instead of query references improves memory management and avoids potential lifetime issues.


189-192: Good API design with optional fields.

Making lambda and candidate_limit optional allows for sensible defaults while giving users flexibility to override when needed.


387-392: Sensible default values for MMR parameters.

Using request_limit as the default candidate_limit ensures sufficient candidates for the final result, while the lambda default of 0.5 provides balanced relevance/diversity trade-off.


509-515: Clean refactoring using the new helper method.

Good use of get_referenced_ids() method, making the code more maintainable and consistent.

lib/collection/src/operations/universal_query/planned_query.rs (5)

89-94: Good practice: Early validation with clear error message.

Moving the depth check to the beginning follows the fail-fast principle and provides immediate feedback for invalid inputs.


110-119: Correct handling of with_vector for MMR queries.

The code properly merges the with_vector flag with MMR's using field to ensure vectors are included when needed.


121-147: Excellent refactoring for improved readability.

Splitting the logic into root_plan_without_prefetches and root_plan_with_prefetches makes the code much clearer and easier to maintain.


202-219: Correct MMR implementation pattern.

The code properly converts MMR queries to nearest neighbor searches for candidate selection, which aligns with the standard MMR algorithm flow. The comment helpfully explains that final MMR computation happens at the collection level.


306-416: Well-structured helper function with comprehensive query type handling.

The leaf_source_from_scoring_query function provides a clean abstraction for creating leaf sources from different query types. The MMR handling correctly creates a nearest neighbor search with the appropriate candidate limit.

Comment on lines +14653 to +14663
"mmr": {
"description": "Perform MMR (Maximal Marginal Relevance) reranking after search, using the same vector in this query to calculate relevance.",
"anyOf": [
{
"$ref": "#/components/schemas/Mmr"
},
{
"nullable": true
}
]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

mmr definition risks OpenAPI-schema validation errors

Using

"anyOf": [
  { "$ref": "#/components/schemas/Mmr" },
  { "nullable": true }
]

is non-standard:

  1. { "nullable": true } is not a valid schema on its own – it lacks a type.
  2. The typical pattern (OAS 3.0.x) is either
    mmr:
      $ref: '#/components/schemas/Mmr'
      nullable: true   # ← sibling of $ref
    or (OAS 3.1)
    oneOf:
      - $ref: '#/components/schemas/Mmr'
      - type: 'null'
  3. Several generators (e.g. openapi-generator, swagger-codegen) will fail on the current construct.

Please switch to one of the two canonical forms above, depending on the targeted OAS version.

🤖 Prompt for AI Agents
In docs/redoc/master/openapi.json around lines 14653 to 14663, the "mmr" schema
uses an invalid "anyOf" with a nullable object lacking a type, which risks
OpenAPI validation errors. To fix this, replace the "anyOf" array with either a
direct $ref and a sibling "nullable: true" property if targeting OpenAPI 3.0.x,
or use "oneOf" with the $ref and a "type: 'null'" schema if targeting OpenAPI
3.1. This change aligns with standard OpenAPI schema patterns and ensures
compatibility with code generators.

Comment on lines +14678 to +14683
"candidate_limit": {
"description": "The maximum number of candidates to consider for re-ranking.\n\nIf not specified, the `limit` value is used.",
"type": "integer",
"format": "uint",
"maximum": 16384,
"minimum": 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

format: "uint" is not part of the OpenAPI spec

OpenAPI officially recognises int32, int64, float, and double.
A custom uint format will confuse tooling and code generators.

-            "format": "uint",
+            "format": "int64",   # or omit `format` entirely

If unsigned semantics are important, document it in the description and enforce it with minimum: 0.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In docs/redoc/master/openapi.json around lines 14678 to 14683, the
"candidate_limit" property uses a non-standard format "uint" which is not
recognized by OpenAPI and may cause issues with tooling. Remove the "format":
"uint" line entirely, keep the "type": "integer" and "minimum": 0 to enforce
unsigned semantics, and update the description to clearly state that the value
must be non-negative.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
docs/grpc/docs.md (2)

194-202: Fix unordered-list indentation for new TOC entries

The inserted Mmr and NearestInputWithMmr bullet points are indented 4 spaces instead of the 2-space level used by the rest of the list, tripping markdownlint (MD007).

-    - [Mmr](#qdrant-Mmr)
-    - [NearestInputWithMmr](#qdrant-NearestInputWithMmr)
+  - [Mmr](#qdrant-Mmr)
+  - [NearestInputWithMmr](#qdrant-NearestInputWithMmr)

3519-3528: Clarify optionality & semantics of mmr field

The prose states “Perform MMR …”, which reads like a mandatory action, yet the proto defines mmr as an optional message (so nearest_with_mmr without an mmr block would still compile). Suggest wording:

-| mmr | [Mmr](#qdrant-Mmr) |  | Perform MMR (Maximal Marginal Relevance) reranking after search, using the same vector in this query to calculate relevance. |
+| mmr | [Mmr](#qdrant-Mmr) | optional | If supplied, reranks the `nearest` results with Maximal Marginal Relevance using this query vector for relevance computation. |

This also aligns the description column with other optional fields.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b347365 and e67a1de.

📒 Files selected for processing (9)
  • docs/grpc/docs.md (4 hunks)
  • docs/redoc/master/openapi.json (1 hunks)
  • lib/api/build.rs (1 hunks)
  • lib/api/src/grpc/proto/points.proto (2 hunks)
  • lib/api/src/grpc/qdrant.rs (3 hunks)
  • lib/api/src/rest/schema.rs (2 hunks)
  • lib/collection/src/operations/universal_query/collection_query.rs (10 hunks)
  • src/common/inference/query_requests_grpc.rs (2 hunks)
  • src/common/inference/query_requests_rest.rs (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
  • lib/api/build.rs
  • src/common/inference/query_requests_grpc.rs
  • src/common/inference/query_requests_rest.rs
  • lib/api/src/rest/schema.rs
  • lib/api/src/grpc/proto/points.proto
  • lib/api/src/grpc/qdrant.rs
  • lib/collection/src/operations/universal_query/collection_query.rs
  • docs/redoc/master/openapi.json
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
docs/grpc/docs.md

194-194: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


195-195: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


196-196: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


197-197: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


198-198: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


199-199: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


200-200: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


201-201: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


202-202: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


3397-3397: Table pipe style
Expected: leading_and_trailing; Actual: leading_only; Missing trailing pipe

(MD055, table-pipe-style)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
  • GitHub Check: test-consistency
  • GitHub Check: integration-tests-consensus
  • GitHub Check: integration-tests
  • GitHub Check: storage-compat-test
  • GitHub Check: lint
🔇 Additional comments (1)
docs/grpc/docs.md (1)

4008-4009: Verify mutual-exclusion logic among query variants

nearest_with_mmr co-exists alongside nearest, recommend, etc. At the API level these search variants appear to be mutually exclusive, but the proto does not enforce oneof. Ensure the backend rejects payloads that set both nearest and nearest_with_mmr, or—better—move the related fields into a oneof to obtain compile-time safety.

Please confirm the planner conversion added in the Rust code returns a clear validation error for such mixed queries.

Copy link
Member

@timvisee timvisee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Can you also add a simple integration test?

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/openapi/test_query.py (1)

485-544: Consider adding more comprehensive MMR test coverage

The current test validates basic MMR functionality but could benefit from additional test cases:

  1. Testing different diversity values (0.0, 1.0) to validate parameter boundaries
  2. Testing different candidate_limit values to ensure the parameter works correctly
  3. Testing error cases (invalid diversity values, negative candidate_limit)
  4. Testing with payload/vector retrieval to ensure MMR works with full response data

Consider adding additional test cases:

def test_mmr_parameters_validation(collection_name):
    # Test boundary values for diversity
    for diversity in [0.0, 1.0]:
        response = request_with_validation(
            api="/collections/{collection_name}/points/query",
            method="POST",
            path_params={"collection_name": collection_name},
            body={
                "query": {
                    "nearest": [0.35, 0.08, 0.11, 0.47],
                    "mmr": {"diversity": diversity}
                },
            },
        )
        assert response.ok, f"Diversity {diversity} should be valid"
    
    # Test different candidate_limit values
    for candidate_limit in [5, 20, 100]:
        response = request_with_validation(
            api="/collections/{collection_name}/points/query",
            method="POST",
            path_params={"collection_name": collection_name},
            body={
                "query": {
                    "nearest": [0.35, 0.08, 0.11, 0.47],
                    "mmr": {"candidate_limit": candidate_limit}
                },
            },
        )
        assert response.ok, f"Candidate limit {candidate_limit} should be valid"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e67a1de and 6ac7de5.

📒 Files selected for processing (4)
  • lib/collection/src/collection/mmr.rs (14 hunks)
  • lib/collection/src/collection/query.rs (2 hunks)
  • lib/collection/src/shards/local_shard/query.rs (2 hunks)
  • tests/openapi/test_query.py (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • lib/collection/src/collection/query.rs
  • lib/collection/src/shards/local_shard/query.rs
  • lib/collection/src/collection/mmr.rs
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/openapi/test_query.py (2)
tests/openapi/helpers/helpers.py (3)
  • reciprocal_rank_fusion (120-145)
  • distribution_based_score_fusion (147-174)
  • request_with_validation (41-116)
tests/openapi/test_query_full.py (1)
  • get_id (996-997)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-consistency
  • GitHub Check: integration-tests-consensus
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-consensus-compose
  • GitHub Check: integration-tests
  • GitHub Check: storage-compat-test
  • GitHub Check: lint
🔇 Additional comments (3)
tests/openapi/test_query.py (3)

397-397: LGTM: Whitespace cleanup

The whitespace changes improve code readability and consistency.

Also applies to: 409-409, 411-411, 426-426, 429-429, 435-435, 449-449, 461-461, 463-463, 478-478, 484-484


522-526: LGTM: Payload and vector validation

Good validation that ensures the response structure is correct when payload and vector data are not requested.


527-544: LGTM: Default parameters validation

Good test coverage for ensuring default MMR parameters work correctly and produce consistent results.

This was referenced Jul 10, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
docs/grpc/docs.md (1)

3390-3405: Table row for Mmr fields is split across multiple lines – breaks GitHub rendering
diversity and candidates_limit descriptions are wrapped onto new lines, so the parser thinks each continuation line is a brand-new table row → the whole table is malformed (MD055).

-| diversity | [float](#float) | optional | Tunable parameter for the MMR algorithm. Determines the balance between diversity and relevance.
-
-A higher value favors diversity (dissimilarity to selected results), while a lower value favors relevance (similarity to the query vector).
-
-Must be in the range [0, 1]. Default value is 0.5. |
-| candidates_limit | [uint32](#uint32) | optional | The maximum number of candidates to consider for re-ranking.
-
-If not specified, the `limit` value is used. |
+| diversity | [float](#float) | optional | Tunable parameter ∈ [0, 1] (default 0.5). Higher → diversity, lower → relevance. |
+| candidates_limit | [uint32](#uint32) | optional | Max neighbours considered before re-ranking (defaults to outer `limit`). |
🧹 Nitpick comments (2)
docs/grpc/docs.md (2)

194-202: Fix list-indentation to silence markdownlint (MD007)
The two newly-added bullet points are indented 4 spaces instead of the expected 2, tripping the linter.

-    - [Mmr](#qdrant-Mmr)
-    - [NearestInputWithMmr](#qdrant-NearestInputWithMmr)
+  - [Mmr](#qdrant-Mmr)
+  - [NearestInputWithMmr](#qdrant-NearestInputWithMmr)

3524-3528: Mark mmr field as optional for consistency
In NearestInputWithMmr, mmr is in fact optional in the protobuf – document it as such so the table stays truthful.

-| mmr | [Mmr](#qdrant-Mmr) |  | Perform MMR (Maximal Marginal Relevance) reranking after search, using the same vector in this query to calculate relevance. |
+| mmr | [Mmr](#qdrant-Mmr) | optional | Perform MMR (Maximal Marginal Relevance) re-ranking after the nearest-neighbour search. |
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 450411d and d2050db.

📒 Files selected for processing (7)
  • docs/grpc/docs.md (4 hunks)
  • docs/redoc/master/openapi.json (1 hunks)
  • lib/api/src/grpc/proto/points.proto (2 hunks)
  • lib/api/src/grpc/qdrant.rs (3 hunks)
  • lib/api/src/rest/schema.rs (2 hunks)
  • src/common/inference/query_requests_grpc.rs (2 hunks)
  • src/common/inference/query_requests_rest.rs (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • src/common/inference/query_requests_rest.rs
  • src/common/inference/query_requests_grpc.rs
  • lib/api/src/grpc/proto/points.proto
  • lib/api/src/grpc/qdrant.rs
  • lib/api/src/rest/schema.rs
  • docs/redoc/master/openapi.json
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
docs/grpc/docs.md

194-194: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


195-195: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


196-196: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


197-197: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


198-198: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


199-199: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


200-200: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


201-201: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


202-202: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


3397-3397: Table pipe style
Expected: leading_and_trailing; Actual: leading_only; Missing trailing pipe

(MD055, table-pipe-style)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-consistency
  • GitHub Check: test-consensus-compose
  • GitHub Check: integration-tests
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: integration-tests-consensus
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: lint
  • GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
  • GitHub Check: storage-compat-test
🔇 Additional comments (1)
docs/grpc/docs.md (1)

4008-4009: LGTM – new nearest_with_mmr query variant is documented clearly
Field is named consistently with the message definition and has a concise description. No action required.

@coszio coszio merged commit 6307510 into dev Jul 11, 2025
18 checks passed
@coszio coszio deleted the expose-mmr branch July 11, 2025 14:39
generall added a commit that referenced this pull request Jul 17, 2025
* expose mmr in rest and grpc

* fix referenced vectors in mmr queries

* rename to MmrInput in grpc for consistency

* mmr is now an optional parameter of nearest query

* fix score_threshold at local shard

* fix conversion of rest -> collection

* fix and refactor planned query

* nits

* rename `lambda` to `diversity` in interface

* handle score_threshold within mmr calculation

* add openapi test

* fix candidate limit at collection level

* candidate_limit -> candidates_limit

* finish rename to candidates_limit

---------

Co-authored-by: generall <andrey@vasnetsov.com>
@coderabbitai coderabbitai bot mentioned this pull request Jul 21, 2025
9 tasks
@coderabbitai coderabbitai bot mentioned this pull request Aug 18, 2025
9 tasks
@coderabbitai coderabbitai bot mentioned this pull request Sep 6, 2025
@coderabbitai coderabbitai bot mentioned this pull request Oct 3, 2025
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants