Skip to content

[MMR] resolve at collection level#6786

Merged
coszio merged 9 commits intodevfrom
mmr-in-collection
Jul 7, 2025
Merged

[MMR] resolve at collection level#6786
coszio merged 9 commits intodevfrom
mmr-in-collection

Conversation

@coszio
Copy link
Contributor

@coszio coszio commented Jul 1, 2025

Builds on top of #6768

  • Adds MMR as a scoring query in ShardQueryRequest and CollectionQueryRequest.
  • Plans the query to include the MMR vector if it is MMR is the top query, so that it gets resolved at collection level without re-fetching
  • Handles MMR at collection level
    • MMR vector gets removed afterwards if it wasn't requested

MMR is still not exposed at this point.

@coszio coszio requested review from generall and timvisee July 1, 2025 21:00
Base automatically changed from mmr to dev July 7, 2025 20:51
@coszio coszio force-pushed the mmr-in-collection branch from 988089e to 94248b1 Compare July 7, 2025 20:53
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 7, 2025

📝 Walkthrough

Walkthrough

This change introduces comprehensive support for Maximum Marginal Relevance (MMR) scoring across the codebase. It adds a new MmrInternal message and related fields to the gRPC protocol, updates internal query representations to include MMR parameters, and extends query processing pipelines to handle MMR queries. The collection and shard query logic is updated to support asynchronous MMR result merging, rescoring, and candidate limit handling. New query variants and structures are introduced for MMR in the universal query modules, with corresponding updates to access checks and vector extraction utilities. Test cases are updated to include the new candidate limit parameter for MMR queries.

Possibly related PRs

  • [MMR] MMR internals #6768: Implements the core internal MMR algorithm and utilities, which the current PR integrates and builds upon to provide system-wide MMR support.

Suggested reviewers

  • timvisee
  • generall
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
lib/api/src/grpc/proto/points_internal_service.proto (1)

267-271: Consider widening candidate_limit to uint64 and document parameter boundaries
limit fields elsewhere in the proto (CoreSearchPoints.limit, QueryShardPoints.limit, …) are uint64. Using uint32 here silently caps the upper bound to 4 294 967 295 and introduces an inconsistent public contract. Unless there is a hard implementation cap, switch to uint64 for symmetry, or add an explicit comment clarifying the intentional 32-bit restriction.
Also, add a brief note that lambda must be in [0.0, 1.0] to avoid undefined behaviour.

lib/api/src/grpc/qdrant.rs (1)

10058-10067: LGTM! Well-structured protobuf message for MMR parameters.

The MmrInternal struct is properly defined with appropriate field types and protobuf tags. The optional vector field, lambda parameter for diversity/relevance trade-off, and candidate_limit for performance tuning are all sensible for MMR implementation.

Consider adding validation for the lambda parameter (typically should be in [0.0, 1.0] range) and candidate_limit (should be > 0) at the application layer if not already handled elsewhere.

lib/collection/src/operations/universal_query/planned_query.rs (1)

87-87: Enhance the TODO comment to be more specific.

The current TODO is too vague. Consider being more specific about what needs simplification.

-    // ToDo: This function needs serious simplification: split and refactor
+    // TODO: This function is complex due to multiple query types and prefetch handling. Consider:
+    // 1. Extract query type handling into separate methods
+    // 2. Separate prefetch logic from main query logic
+    // 3. Create a query builder pattern for different query types
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 10de8cc and 94248b1.

📒 Files selected for processing (12)
  • lib/api/src/grpc/proto/points_internal_service.proto (2 hunks)
  • lib/api/src/grpc/qdrant.rs (3 hunks)
  • lib/collection/src/collection/mmr.rs (7 hunks)
  • lib/collection/src/collection/query.rs (5 hunks)
  • lib/collection/src/operations/query_enum.rs (1 hunks)
  • lib/collection/src/operations/universal_query/collection_query.rs (8 hunks)
  • lib/collection/src/operations/universal_query/planned_query.rs (6 hunks)
  • lib/collection/src/operations/universal_query/shard_query.rs (7 hunks)
  • lib/collection/src/shards/local_shard/query.rs (1 hunks)
  • lib/segment/src/data_types/vectors.rs (1 hunks)
  • lib/segment/src/types.rs (2 hunks)
  • lib/storage/src/rbac/ops_checks.rs (2 hunks)
🧰 Additional context used
🧠 Learnings (4)
lib/segment/src/types.rs (1)

undefined

<retrieved_learning>
Learnt from: timvisee
PR: #6644
File: lib/gridstore/src/blob.rs:41-57
Timestamp: 2025-06-06T07:52:38.478Z
Learning: In the Blob trait implementations for gridstore (lib/gridstore/src/blob.rs), the maintainer prefers using assert! for input validation over Result-based error handling, as they don't expect invalid inputs in real scenarios and changing to Result would require updating the entire trait interface.
</retrieved_learning>

lib/collection/src/operations/universal_query/shard_query.rs (5)
Learnt from: xzfc
PR: qdrant/qdrant#6501
File: lib/segment/src/index/hnsw_index/links_container.rs:158-158
Timestamp: 2025-05-08T16:39:07.893Z
Learning: In the Qdrant codebase, allocation failures are generally not handled (allowed to panic) except in very few specific cases. Most `unwrap()` calls on allocation operations are intentional.
Learnt from: timvisee
PR: qdrant/qdrant#6546
File: lib/segment/src/index/field_index/full_text_index/text_index.rs:497-518
Timestamp: 2025-05-19T09:25:54.285Z
Learning: In the Qdrant project, using `.expect()` and `.unwrap()` is considered acceptable in test code since test failures should be loud and obvious, and proper error propagation adds unnecessary complexity to test code.
Learnt from: timvisee
PR: qdrant/qdrant#6644
File: lib/gridstore/src/blob.rs:41-57
Timestamp: 2025-06-06T07:52:38.478Z
Learning: In the Blob trait implementations for gridstore (lib/gridstore/src/blob.rs), the maintainer prefers using assert! for input validation over Result-based error handling, as they don't expect invalid inputs in real scenarios and changing to Result would require updating the entire trait interface.
Learnt from: IvanPleshkov
PR: qdrant/qdrant#6663
File: lib/quantization/src/encoded_vectors_binary.rs:270-276
Timestamp: 2025-06-18T09:23:14.977Z
Learning: Pattern matching on `&Option<T>` with `Some(x)` is valid and idiomatic in Rust. The bound variable `x` will have type `&T` and can be used to access fields through automatic dereferencing. No need to use `.as_ref()` in such cases.
Learnt from: generall
PR: qdrant/qdrant#6274
File: lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs:233-233
Timestamp: 2025-03-29T17:43:28.174Z
Learning: In the qdrant codebase, `debug_assert!` is preferred over runtime checks that would panic in release builds to ensure production stability, even when validating conditions like vector sorting.
lib/collection/src/operations/universal_query/planned_query.rs (1)
Learnt from: coszio
PR: qdrant/qdrant#6565
File: lib/posting_list/src/iterator.rs:0-0
Timestamp: 2025-05-26T14:50:13.183Z
Learning: The `advance_until_greater_or_equal` method in `PostingIterator` in `lib/posting_list/src/iterator.rs` is intentionally designed with different semantics than `next()` and can output the same value multiple times. This is intended behavior for a search method that finds the first element with ID >= target_id.
lib/collection/src/collection/query.rs (2)
Learnt from: xzfc
PR: qdrant/qdrant#6245
File: lib/segment/src/vector_storage/async_raw_scorer.rs:48-56
Timestamp: 2025-04-22T23:19:51.232Z
Learning: In the AsyncRawScorerImpl implementation, the unwrap() call on read_vectors_async results is intentional, with an explanatory comment noting that this experimental feature is meant to crash rather than silently fall back to synchronous implementation.
Learnt from: xzfc
PR: qdrant/qdrant#6245
File: lib/segment/src/vector_storage/async_raw_scorer.rs:48-56
Timestamp: 2025-04-22T23:19:51.232Z
Learning: In AsyncRawScorerImpl, the unwrap() call after read_vectors_async is intentional. The io_uring feature is experimental, and the code is designed to panic rather than silently fall back to a synchronous implementation if it fails, directing users to use the default IO implementation instead.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: storage-compat-test
  • GitHub Check: lint
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-consensus-compose
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: rust-tests-no-rocksdb (ubuntu-latest)
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: integration-tests
  • GitHub Check: integration-tests-consensus
🔇 Additional comments (21)
lib/collection/src/operations/query_enum.rs (1)

69-69: Good documentation improvement!

The added documentation comment clearly explains the purpose of the QueryEnum and will help developers understand its role in segment-level vector queries.

lib/storage/src/rbac/ops_checks.rs (2)

17-18: Correct import additions for MMR support.

The imports properly bring in the necessary types for handling the new NearestWithMmr variant.


154-156: Proper access control implementation for MMR queries.

The implementation correctly handles the new NearestWithMmr variant by:

  • Extracting the underlying nearest query for validation
  • Ignoring MMR parameters (which don't affect access control)
  • Following the same pattern as other query types

This ensures MMR queries are subject to the same access control rules as regular nearest neighbor queries.

lib/collection/src/collection/mmr.rs (1)

512-512: Consistent test updates for new candidate_limit parameter.

All test cases have been properly updated to include the candidate_limit: 100 parameter in MmrInternal struct initializations. The value 100 is appropriate for test scenarios and maintains consistency across all test functions.

Also applies to: 556-556, 614-614, 655-655, 694-694, 757-757, 804-804

lib/segment/src/types.rs (2)

17-17: LGTM: Import addition supports the new merge functionality.

The itertools::Itertools import is appropriately added to support the unique() method used in the new WithVector::merge implementation.


3004-3023: Well-implemented additive merge logic for WithVector enum.

The merge method correctly implements additive merging semantics:

  1. Boolean precedence: Bool(true) takes precedence over any other variant, which is correct for "include all vectors" semantics
  2. Selector combination: When merging two selectors, it properly chains them, removes duplicates with unique(), and collects into a new vector
  3. Mixed cases: Appropriately handles mixed scenarios by preferring selectors over Bool(false)
  4. Complete coverage: All possible enum variant combinations are handled

The implementation aligns well with the MMR functionality described in the AI summary, where vector selection flags from different query stages need to be merged during query planning.

lib/api/src/grpc/proto/points_internal_service.proto (1)

275-282: Verify downstream handling of the new mmr score variant
Adding MmrInternal mmr = 6 extends the oneof score. Double-check that every match arm / match / switch on score in query-planning, shard execution, and collection merging now covers the Mmr case to avoid unreachable-branch panics.

lib/api/src/grpc/qdrant.rs (2)

10100-10101: Proper extension of protobuf oneof tags.

The addition of tag "6" to the existing oneof tags is correctly implemented, maintaining backward compatibility while extending the query scoring options.


10124-10126: MMR scoring variant correctly integrated.

The new Mmr variant is properly added to the Score enum with the correct tag (6) and references the MmrInternal struct appropriately. The integration follows the established pattern of other scoring variants.

lib/segment/src/data_types/vectors.rs (1)

524-542: No changes needed: Single/MultiDense only apply to the default vector name

Verified across the codebase that VectorStructInternal::Single and MultiDense are only ever used with DEFAULT_VECTOR_NAME, so consuming the entire Option and returning None for non-default names cannot occur in practice. This behavior is consistent with other usages (e.g., in point_ops.rs and operations/types.rs), and no preservation logic is required.

lib/collection/src/shards/local_shard/query.rs (2)

386-398: LGTM!

The MMR rescoring integration is clean and follows the existing pattern for other scoring types.


428-477: Well-structured MMR rescoring implementation.

The method properly handles timeout management, vector retrieval, and score threshold filtering. Good implementation!

lib/collection/src/operations/universal_query/planned_query.rs (2)

117-148: Excellent MMR query planning implementation.

The approach of converting MMR queries to nearest neighbor queries at the shard level while deferring actual MMR calculation to the collection level is well-designed. The vector merging logic ensures proper vector retrieval.


398-421: Consistent MMR handling in prefetch recursion.

Good to see the same MMR handling pattern applied here as in the main query planning logic.

lib/collection/src/operations/universal_query/shard_query.rs (3)

69-80: Well-documented MMR configuration struct.

The MmrInternal struct is properly designed with clear documentation for each field. The lambda parameter documentation is particularly helpful in explaining the diversity vs relevance trade-off.


123-126: Correct identification of MMR needing intermediate results.

MMR indeed requires intermediate results from all shards to properly balance relevance and diversity across the entire result set.


596-611: Proper gRPC to internal MMR conversion.

The conversion correctly validates the required vector field and properly converts all MMR parameters.

lib/collection/src/operations/universal_query/collection_query.rs (2)

168-179: Clean MMR query structure design.

The separation of the nearest vector query and MMR parameters into distinct structs provides good modularity and clarity.


363-376: Correct MMR query conversion.

The conversion properly extracts all MMR parameters and creates the appropriate ScoringQuery::Mmr variant.

lib/collection/src/collection/query.rs (2)

11-11: LGTM! Necessary imports for MMR functionality.

The imports for VectorStructInternal and mmr_from_points_with_vector are required for the new MMR scoring implementation.

Also applies to: 18-18


334-431: Well-structured refactoring to support async MMR processing.

The refactoring properly transforms the method to async and cleanly integrates MMR support. The implementation correctly:

  • Handles vector stripping based on the with_vector configuration
  • Applies score thresholds appropriately
  • Defers offset/limit application until after fusion/MMR processing

Note: The score threshold handling differs between fusion (using take_while) and MMR (using filter). This appears intentional since fusion results are pre-sorted by score while MMR results may not maintain score ordering.

@coszio coszio merged commit 137d71a into dev Jul 7, 2025
18 checks passed
@coszio coszio deleted the mmr-in-collection branch July 7, 2025 21:18
generall added a commit that referenced this pull request Jul 17, 2025
* add mmr as a ScoringQuery

* add mmr to CollectionQueryRequest

* handle mmr after collecting from shards

* use unwrap_or_else

* include mmr as a vector query

* ordering for MMR is None

* fix score threshold

* fmt + todo comment

* remove unused function

---------

Co-authored-by: generall <andrey@vasnetsov.com>
@coderabbitai coderabbitai bot mentioned this pull request Sep 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants