Skip to content

Add inference usage tracking to REST API#6563

Merged
n0x29a merged 15 commits intodevfrom
inference-usage-rest
May 27, 2025
Merged

Add inference usage tracking to REST API#6563
n0x29a merged 15 commits intodevfrom
inference-usage-rest

Conversation

@n0x29a
Copy link
Contributor

@n0x29a n0x29a commented May 20, 2025

No description provided.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds tracking and reporting of inference usage throughout the REST API, helper functions, and data models.

  • Batch inference routines now return optional InferenceUsage
  • All Actix handlers and helpers updated to include inference_usage in API responses
  • REST and gRPC model, conversion, and proto definitions extended to support InferenceUsage

Reviewed Changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/common/inference/infer_processing.rs Return InferenceUsage alongside batched inference results
src/actix/helpers.rs Accept and serialize inference_usage in helper response builders
src/actix/api/update_api.rs Propagate inference usage through all update endpoints
src/actix/api/shards_api.rs Include inference_usage in shard-key endpoints
src/actix/api/service_api.rs Report inference usage in service-level endpoints
src/actix/api/search_api.rs Attach inference usage to search responses
src/actix/api/retrieve_api.rs Add inference usage support to retrieval endpoints
src/actix/api/recommend_api.rs Pass inference usage through recommendation endpoints
src/actix/api/query_api.rs Integrate inference usage in query and batch-query handlers
src/actix/api/local_shard_api.rs Report inference usage in local shard point operations
src/actix/api/facet_api.rs Include inference usage in facet endpoint
src/actix/api/discovery_api.rs Add inference usage to discovery endpoints
src/actix/api/count_api.rs Serialize inference usage in count endpoints
src/actix/api/collections_api.rs Extend collection management endpoints to include inference usage
lib/api/src/rest/models.rs Define InferenceUsage and include it in Usage struct
lib/api/src/rest/conversions.rs Implement From<InferenceUsage> conversion for gRPC usage
lib/api/src/grpc/qdrant.rs Add gRPC InferenceUsage and ModelUsage message definitions
lib/api/src/grpc/proto/points.proto Extend protobuf Usage and add InferenceUsage messages
lib/api/src/grpc/ops.rs Update Usage struct and usage_or_none to handle inference usage
lib/api/src/grpc/conversions.rs Initialize inference field in generated Usage conversions
Comments suppressed due to low confidence (3)

src/actix/api/query_api.rs:147

  • Currently selecting only the first inference usage in batch queries may underreport total usage; consider aggregating usages across all requests (e.g., summing tokens per model) instead of taking only the first.
let first_inference_usage: Option<InferenceUsage> = inference_usages.into_iter().flatten().next();

lib/api/src/grpc/conversions.rs:2247

  • [nitpick] The variable name usage here shadows the struct field and can be misleading; consider renaming the parameter to hardware_usage for clarity.
hardware: usage,

src/common/inference/infer_processing.rs:26

  • New inference usage is now returned from from_objects; consider adding unit tests to verify that InferenceUsage is correctly propagated for both non-empty and empty inference responses.
)-> Result<(Self, Option<InferenceUsage>), StorageError> {

@coderabbitai
Copy link
Contributor

coderabbitai bot commented May 20, 2025

📝 Walkthrough

Walkthrough

This change introduces a new mechanism for tracking inference usage metrics alongside existing hardware usage metrics throughout the codebase. The Usage struct and related API models, gRPC messages, REST models, and OpenAPI schemas are extended to include an optional inference field, which records per-model token usage via new InferenceUsage and ModelUsage types. Response handling, conversion logic, and API endpoints are updated to propagate and merge inference usage data as needed. Conversion and update functions now return both their primary result and optional inference usage, and documentation is updated to reflect these protocol and schema changes.

Possibly related PRs

  • Refactor: move HardwareUsage to the Usage map #6532: Refactored the protobuf and code to wrap the previously standalone HardwareUsage inside a new Usage struct; this PR is directly related as the current changes build upon that structural update by further extending Usage to include inference usage support.

Suggested reviewers

  • timvisee
  • agourlay

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0ecfc5f and 9eef544.

📒 Files selected for processing (2)
  • src/actix/api/update_api.rs (6 hunks)
  • src/common/update.rs (12 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/actix/api/update_api.rs
  • src/common/update.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-consistency
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-consensus-compose
  • GitHub Check: integration-tests-consensus
  • GitHub Check: lint
  • GitHub Check: integration-tests
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: storage-compat-test
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: rust-tests (ubuntu-latest)
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
lib/api/src/grpc/qdrant.rs (1)

6573-6592: Well-structured changes for inference usage tracking.

The implementation extends the Usage struct with a new optional inference field and adds two supporting structures (InferenceUsage and ModelUsage) that follow protobuf conventions and Rust best practices. The model-based token tracking provides granular insight into inference resource consumption.

Consider adding documentation comments to these new structures to explain their purpose and usage patterns to future developers.

 #[derive(serde::Serialize)]
 #[allow(clippy::derive_partial_eq_without_eq)]
 #[derive(Clone, PartialEq, ::prost::Message)]
+/// Tracks inference usage metrics across different models
 pub struct InferenceUsage {
     #[prost(map = "string, message", tag = "1")]
+    /// Maps model names to their respective usage metrics
     pub model: ::std::collections::HashMap<::prost::alloc::string::String, ModelUsage>,
 }
 #[derive(serde::Serialize)]
 #[allow(clippy::derive_partial_eq_without_eq)]
 #[derive(Clone, PartialEq, ::prost::Message)]
+/// Represents usage metrics for a specific model
 pub struct ModelUsage {
     #[prost(uint64, tag = "1")]
+    /// Count of tokens processed by the model
     pub tokens: u64,
src/common/inference/infer_processing.rs (1)

3-4: API models leaked into the core crate – consider reversing dependency

Importing InferenceUsage from api::rest::models creates an upward dependency from a low-level “common” module to a high-level REST layer.
This breaks the usual layering (core ➔ transport) and may cause cyclic-compile headaches when another transport (e.g., gRPC) needs the same struct.

Recommend introducing a transport-agnostic InferenceUsage definition in a shared crate (or move it to common::types) and add thin converter traits in the REST/grpc layers instead.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0903e92 and 5b3b7d7.

📒 Files selected for processing (29)
  • .gitignore (1 hunks)
  • lib/api/src/grpc/conversions.rs (1 hunks)
  • lib/api/src/grpc/ops.rs (2 hunks)
  • lib/api/src/grpc/proto/points.proto (1 hunks)
  • lib/api/src/grpc/qdrant.rs (1 hunks)
  • lib/api/src/rest/conversions.rs (1 hunks)
  • lib/api/src/rest/models.rs (3 hunks)
  • src/actix/api/collections_api.rs (6 hunks)
  • src/actix/api/count_api.rs (2 hunks)
  • src/actix/api/discovery_api.rs (4 hunks)
  • src/actix/api/facet_api.rs (2 hunks)
  • src/actix/api/local_shard_api.rs (5 hunks)
  • src/actix/api/query_api.rs (8 hunks)
  • src/actix/api/recommend_api.rs (6 hunks)
  • src/actix/api/retrieve_api.rs (6 hunks)
  • src/actix/api/search_api.rs (10 hunks)
  • src/actix/api/service_api.rs (3 hunks)
  • src/actix/api/shards_api.rs (2 hunks)
  • src/actix/api/update_api.rs (25 hunks)
  • src/actix/helpers.rs (6 hunks)
  • src/common/inference/infer_processing.rs (3 hunks)
  • src/common/inference/query_requests_grpc.rs (2 hunks)
  • src/common/inference/query_requests_rest.rs (10 hunks)
  • src/common/inference/service.rs (4 hunks)
  • src/common/inference/update_requests.rs (12 hunks)
  • src/common/update.rs (10 hunks)
  • src/tonic/api/points_api.rs (2 hunks)
  • src/tonic/api/query_common.rs (15 hunks)
  • src/tonic/api/update_common.rs (3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (12)
src/actix/api/discovery_api.rs (1)
src/actix/helpers.rs (2)
  • process_response_error (70-96)
  • process_response (47-68)
src/actix/api/count_api.rs (1)
src/actix/helpers.rs (2)
  • process_response_error (70-96)
  • process_response (47-68)
src/actix/api/local_shard_api.rs (2)
src/actix/api/update_api.rs (1)
  • process_response (448-448)
src/actix/helpers.rs (2)
  • process_response (47-68)
  • process_response_error (70-96)
src/actix/api/collections_api.rs (1)
src/actix/helpers.rs (1)
  • process_response (47-68)
src/actix/api/facet_api.rs (1)
src/actix/helpers.rs (2)
  • process_response_error (70-96)
  • process_response (47-68)
src/actix/api/search_api.rs (1)
src/actix/helpers.rs (2)
  • process_response_error (70-96)
  • process_response (47-68)
src/tonic/api/points_api.rs (1)
lib/api/src/grpc/ops.rs (1)
  • usage_or_none (36-45)
src/actix/api/recommend_api.rs (1)
src/actix/helpers.rs (2)
  • process_response_error (70-96)
  • process_response (47-68)
src/tonic/api/query_common.rs (1)
lib/api/src/grpc/ops.rs (1)
  • usage_or_none (36-45)
src/actix/api/retrieve_api.rs (1)
src/actix/helpers.rs (2)
  • process_response_error (70-96)
  • process_response (47-68)
src/actix/api/update_api.rs (3)
src/actix/helpers.rs (2)
  • process_response_error (70-96)
  • process_response (47-68)
src/common/update.rs (3)
  • do_upsert_points (232-281)
  • check_strict_mode (125-172)
  • do_batch_update_points (567-700)
lib/storage/src/content_manager/collection_verification.rs (1)
  • check_strict_mode (65-80)
src/common/update.rs (1)
src/common/inference/update_requests.rs (3)
  • convert_batch (158-209)
  • convert_point_struct (17-156)
  • convert_point_vectors (211-293)
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-consistency
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: integration-tests-consensus
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: lint
  • GitHub Check: test-low-resources
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: test-consensus-compose
  • GitHub Check: integration-tests
  • GitHub Check: storage-compat-test
  • GitHub Check: Basic TLS/HTTPS tests
🔇 Additional comments (86)
.gitignore (1)

19-19: Adding .env to gitignore is good practice.

This change correctly excludes environment configuration files from version control, which is important for keeping sensitive or environment-specific configuration private.

src/actix/api/facet_api.rs (2)

43-43: Correctly updated error handling to support inference usage tracking.

The additional None parameter aligns with the updated signature of process_response_error which now accepts an optional InferenceUsage parameter for tracking inference metrics in API responses.


74-74: Properly extended process_response to support inference usage tracking.

The added None parameter correctly implements the updated function signature that now includes an optional InferenceUsage parameter, consistent with the broader changes for inference usage tracking in API responses.

src/actix/api/count_api.rs (2)

40-40: Correctly updated error handling to support inference usage tracking.

The additional None parameter properly implements the updated signature of process_response_error which now accepts an optional InferenceUsage parameter.


69-69: Properly extended process_response to support inference usage tracking.

The added None parameter correctly implements the updated function signature that now includes an optional InferenceUsage parameter, consistent with the system-wide inference usage tracking enhancements.

src/tonic/api/points_api.rs (2)

705-705: Correctly updated usage_or_none call to support inference metrics.

The additional None parameter properly implements the updated signature of usage_or_none which now accepts an optional InferenceUsage parameter for the search_matrix_pairs response.


731-731: Correctly updated usage_or_none call to support inference metrics.

The additional None parameter properly implements the updated signature of usage_or_none which now accepts an optional InferenceUsage parameter for the search_matrix_offsets response.

src/common/inference/query_requests_grpc.rs (4)

68-68: Code updated to extract inner BatchAccumInferred data from tuple return type

The code now passes &inferred.0 instead of &inferred, accessing the first element of the tuple. This aligns with the broader changes that wrap inference results with usage information.


75-75: Code updated to extract inner BatchAccumInferred data from tuple return type

This line has been updated to access the inner batch data using &inferred.0, matching the pattern in the other modified calls in this file.


148-148: Code updated to extract inner BatchAccumInferred data from tuple return type

Similar update to the previous ones, accessing the first element of the tuple return type to get the batch data.


152-152: Code updated to extract inner BatchAccumInferred data from tuple return type

Code consistently updated to access the first element of the tuple return value.

src/actix/api/shards_api.rs (2)

45-45: Parameter added for inference usage tracking

The process_response call has been updated to include a fourth parameter None for the optional inference usage. This matches the updated function signature in src/actix/helpers.rs and aligns with the broader inference usage tracking implementation.


75-75: Parameter added for inference usage tracking

Similar to line 45, this process_response call has been updated to include the additional None parameter for inference usage.

src/actix/api/discovery_api.rs (4)

43-43: Parameter added for inference usage tracking

The call to process_response_error has been updated to include the fourth parameter None for inference usage, matching the updated function signature.


79-79: Parameter added for inference usage tracking

This process_response call now includes the fourth parameter None for inference usage tracking, maintaining consistency with the rest of the API layer changes.


103-103: Parameter added for inference usage tracking

Similar to line 43, this process_response_error call has been updated with the additional None parameter for inference usage.


136-136: Parameter added for inference usage tracking

This process_response call has been updated to include the fourth parameter None for inference usage, consistent with other API endpoints.

lib/api/src/rest/conversions.rs (2)

10-11: Added imports for inference usage types

Necessary imports added for the new inference usage conversion implementation.


14-27: Added conversion from REST to gRPC InferenceUsage

This implementation provides a clean translation from the REST API's InferenceUsage type to the gRPC equivalent, properly mapping the model usage data between the two formats. The implementation correctly iterates through each model in the source and constructs the corresponding entry in the target.

lib/api/src/grpc/conversions.rs (1)

2246-2249: Correctly implementing inference usage tracking in gRPC conversion.

The addition of the inference field to the Usage struct in the conversion from PointsOperationResponseInternal to PointsOperationResponse properly implements the inference usage tracking feature, with a default value of None when no inference data is available.

src/actix/api/local_shard_api.rs (5)

75-75: Updated process_response call to support inference usage tracking.

The call has been properly updated to include the new inference_usage parameter as None, which aligns with the updated function signature in the helpers module.


104-104: Updated process_response_error call to support inference usage tracking.

The error handler correctly includes None for the new inference usage parameter, maintaining consistency with the updated function signature.


151-151: Updated process_response call to support inference usage tracking.

The call has been properly updated to include the new inference_usage parameter as None, which aligns with the updated function signature in the helpers module.


178-178: Updated process_response_error call to support inference usage tracking.

The error handler correctly includes None for the new inference usage parameter, maintaining consistency with the updated function signature.


222-222: Updated process_response call to support inference usage tracking.

The call has been properly updated to include the new inference_usage parameter as None, which aligns with the updated function signature in the helpers module.

src/actix/api/service_api.rs (3)

73-73: Updated process_response_error call to support inference usage tracking.

The error handler correctly includes None for the new inference usage parameter, maintaining consistency with the updated function signature.


185-185: Updated process_response call to support inference usage tracking.

The call has been properly updated to include the new inference_usage parameter as None, which aligns with the updated function signature in the helpers module.


201-201: Updated process_response call to support inference usage tracking.

The call has been properly updated to include the new inference_usage parameter as None, which aligns with the updated function signature in the helpers module.

lib/api/src/grpc/proto/points.proto (2)

1174-1174: Appropriate extension to Usage message for inference tracking

The addition of the optional inference field of type InferenceUsage to the Usage message properly extends the usage tracking capability to include inference metrics alongside hardware metrics.


1178-1188: Well-structured message definitions for inference tracking

The new message definitions provide a clean structure for tracking inference usage:

  • InferenceUsage with a map from model names to usage details
  • ModelUsage with a tokens count field

This approach allows for granular tracking per model while maintaining a simple data structure.

src/actix/api/recommend_api.rs (6)

52-52: Updated error processing to support inference usage

Function call correctly updated to match the new signature of process_response_error which now takes an additional parameter for optional inference usage.


88-88: Updated response processing to support inference usage

The call to helpers::process_response has been properly extended with a None parameter for inference usage, aligning with the updated function signature while maintaining the existing behavior.


143-143: Updated error handling in batch processing to include inference usage parameter

Function call correctly updated to match the new signature of process_response_error which now takes an additional parameter for optional inference usage.


176-176: Updated batch response processing to support inference usage

The call to helpers::process_response correctly includes the additional None parameter for inference usage data in batch operations.


203-203: Updated error handling in group recommendations to include inference usage parameter

Error handling for group recommendations now correctly includes the inference usage parameter in the process_response_error call.


231-231: Updated group recommendation response processing to support inference usage

The call to helpers::process_response for group recommendations properly includes the new inference usage parameter.

src/actix/api/collections_api.rs (6)

121-121: Updated collection creation error handling with inference usage support

The early return error case for collection creation now correctly passes the required None parameter for inference usage.


131-131: Updated collection creation response handling to support inference tracking

The successful path for collection creation now properly includes the inference usage parameter in the response processing.


154-154: Updated collection update response handling to support inference tracking

The response processing for the collection update operation correctly includes the new inference usage parameter.


174-174: Updated collection deletion response handling to support inference tracking

The response processing for collection deletion now properly includes the inference usage parameter.


192-192: Updated alias update response handling to support inference tracking

The alias update operation's response processing now correctly includes the inference usage parameter.


230-230: Updated cluster operation response handling to support inference tracking

The collection cluster update operation's response processing now properly includes the inference usage parameter.

lib/api/src/grpc/ops.rs (3)

1-1: Added import for InferenceUsage type

Correctly imported the InferenceUsage type alongside the existing HardwareUsage and Usage types.


27-32: Enhanced emptiness check to include inference usage

The is_empty method now correctly checks both hardware and inference fields, ensuring that a Usage is only considered empty when both fields are None. This is important for maintaining consistent behavior when determining whether to include usage data in responses.


36-43: Updated usage_or_none function to handle inference metrics

The function signature now properly accepts an optional InferenceUsage parameter and correctly constructs the Usage struct with both hardware and inference fields. This allows the function to create usage objects that can include inference metrics when available.

src/tonic/api/query_common.rs (2)

135-135: Support for inference usage tracking added

The change adds a second parameter (None) to the usage_or_none function call to accommodate the new inference usage tracking capability while maintaining backward compatibility.


183-183: Consistent inference usage parameter addition

The changes consistently add None as the inference usage parameter across all API response handling functions, ensuring uniform behavior throughout the codebase.

Also applies to: 241-241, 296-296, 349-349, 406-406, 460-460, 506-506, 562-562, 641-641, 699-699, 764-764, 819-819, 875-875

src/actix/api/retrieve_api.rs (2)

89-89: Error handling updated for inference tracking

The error handling paths now include the additional None parameter for inference usage in process_response_error calls, maintaining consistency with the new function signature.

Also applies to: 96-96, 145-145, 211-211


124-124: Response processing updated for inference tracking

All successful response paths now properly include the additional None parameter for inference usage in process_response calls, which will allow for future inclusion of inference metrics when available.

Also applies to: 184-184, 240-240

src/actix/api/search_api.rs (3)

51-51: Consistent error handling with inference usage support.

The additional None parameter for inference usage ensures proper alignment with the updated signature of process_response_error to support inference usage tracking.


86-86: Consistent response handling with inference usage support.

The search functionality now correctly passes None for inference usage to the updated process_response function, maintaining API consistency while preparing for potential future inference capabilities in search operations.


127-127: Uniformly updated response processing.

All search endpoints now consistently include the inference usage parameter (as None) in their response processing, ensuring uniform API behavior across the codebase.

Also applies to: 161-161, 188-188, 216-216, 243-243, 272-272, 299-299, 328-328

src/tonic/api/update_common.rs (3)

79-80: Updated tuple destructuring for inference usage support.

The code now correctly extracts the first element of the tuple returned by the update operation, containing the actual update result, while separating it from the inference usage data.


179-180: Consistent pattern for extracting update results.

The .0 accessor pattern is consistently applied to extract the primary result from the tuple returned by the update vectors operation.


826-827: Extracting points from inference results tuple.

The code now correctly accesses the points data from the tuple returned by convert_point_struct, allowing the inference usage part to be potentially used elsewhere.

lib/api/src/rest/models.rs (4)

3-3: Added necessary imports for inference usage tracking.

Imported HashMap from ahash and expanded serde imports to include Deserialize, required for the new inference tracking structures.

Also applies to: 7-7


59-59: Added inference usage tracking to the Usage struct.

The Usage struct now includes an optional inference field to track model usage statistics alongside hardware usage.


63-70: Updated is_empty method to consider inference usage.

The is_empty method now correctly checks both hardware and inference fields to determine if usage data is present, ensuring consistent behavior when deciding whether to include usage data in responses.


91-101: Added new structures for inference usage tracking.

The new InferenceUsage and ModelUsage structs provide a standardized way to represent and serialize inference model usage statistics, such as token counts, in API responses.

src/actix/api/update_api.rs (8)

45-46: Updated error response to include inference usage parameter.

The error handling for strict mode verification now properly includes the None inference usage parameter when calling process_response_error.


48-48: Improved variable naming for clarity.

Using operation_inner instead of overwriting the original operation variable improves code clarity by clearly distinguishing between the original JSON wrapper and its extracted contents.


58-62: Updated function naming to reflect new return value.

Renaming the variable to result_with_usage clearly indicates that the function now returns both the update result and inference usage information.


70-80: Proper handling of inference usage in responses.

The code now correctly destructures the tuple returned by do_upsert_points to extract both the update result and inference usage, propagating this information through to the API response.


427-444: Improved batch update implementation with inference support.

The batch update operation now consistently handles inference usage tracking while maintaining clear variable naming with the _inner suffix pattern, ensuring that both the original request data and its extracted contents are clearly distinguished.


460-482: Proper aggregation of inference usage across batch operations.

The batch update function now correctly propagates inference usage information from individual operations, allowing clients to track and monitor model usage across multiple operations in a single request.


515-519: Consistent handling for operations without inference usage.

Operations like index creation that don't produce inference usage metrics consistently pass None for the inference usage parameter, ensuring API response consistency.


542-542: Consistent response handling for index operations.

The index deletion endpoint now properly passes None for inference usage, aligning with the updated function signature.

src/common/inference/service.rs (4)

7-7: Appropriate import for inference usage tracking.

The new import correctly brings in the InferenceUsage model required for this feature.


53-57: Good design for tracking inference usage.

Making InferenceResponse public and adding an optional usage field properly encapsulates both the embeddings result and usage metrics without requiring them to always be present.


172-177: Correctly updated return type for inference usage.

The return type now includes the full InferenceResponse which contains both embeddings and optional usage information.


217-228: Simplified deserialization logic.

The implementation now directly deserializes into InferenceResponse without additional extraction or validation steps, while maintaining the same error handling for HTTP status codes.

src/common/update.rs (3)

241-262: Correctly propagates inference usage through the upsert operation.

The function now properly extracts and returns optional inference usage from conversion functions. The pattern matching with destructuring is clean and effective.


328-352: Consistent propagation of inference usage in vector updates.

The vector update function follows the same pattern as upsert operations, maintaining consistency in how inference usage is propagated through the system.


576-698: Well-implemented batch operation inference usage collection.

This implementation:

  1. Correctly handles multiple operations that might return different usage data
  2. Takes the first non-None inference usage it encounters
  3. Properly handles operations that don't produce usage data

The approach ensures inference usage is collected without duplicating or merging potentially incompatible usage metrics.

src/common/inference/query_requests_rest.rs (6)

1-1: Added appropriate import for the new feature.

The import of InferenceUsage from the REST models is required for the usage tracking functionality.


22-30: Well-designed wrapper structs for inference usage.

These wrapper structures provide a clean way to associate inference usage with query requests while maintaining backward compatibility with existing code.


32-91: Proper extraction and propagation of inference usage in query groups.

The function now correctly extracts inference usage from batch processing and properly wraps it with the query request in the return value.


93-146: Consistent handling of inference usage in query requests.

This implementation follows the same pattern as the query groups function, maintaining consistency across the codebase.


245-246: Good variable rename for clarity.

Renaming from context to context_pairs improves code clarity by avoiding variable shadowing and making the variable's purpose more explicit.


277-277: Improved variable naming.

Renaming from query to converted_query more accurately reflects the transformed state of the variable.

src/common/inference/update_requests.rs (6)

3-3: Added required import for inference usage tracking.

The import of InferenceUsage is necessary for the feature implementation.


17-21: Proper function signature update for propagating usage data.

The return type now includes the optional inference usage alongside the converted point structures.


45-52: Effective pattern for extracting and propagating inference usage.

The code extracts both the inferred data and usage from batch accumulation, ensuring that usage metrics are properly propagated through the conversion process.


158-209: Comprehensive approach to batch conversion with usage tracking.

This implementation:

  1. Initializes an optional inference usage
  2. For named vectors, correctly extracts usage from each conversion
  3. Takes the first non-None usage encountered
  4. Returns both the converted batch and accumulated usage

The error handling for unsupported vector types is maintained.


211-293: Consistent handling of optional inference usage in point vectors conversion.

The function continues the established pattern of extracting and propagating inference usage, maintaining consistency with other conversion functions.


372-414: Clean implementation for vectors conversion with usage tracking.

The function properly:

  1. Accumulates inference objects
  2. Performs inference to get vectors and usage
  3. Converts each vector based on whether inference was needed
  4. Returns both converted vectors and optional usage

The map/collect pattern combined with the final mapping to include usage is elegant.

Comment on lines +53 to +62
if inference_response.embeddings.is_empty() {
return Err(StorageError::service_error(
"Inference service returned no vectors. Check if models are properly loaded.",
));
}

let objects = objects_serialized.into_iter().zip(vectors).collect();
let objects = objects_serialized
.into_iter()
.zip(inference_response.embeddings)
.collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure embeddings count matches objects to avoid silent data loss

zip truncates to the shorter iterator.
If the inference service returns fewer vectors than requested objects, the current logic will silently drop the tail of objects_serialized, leading to missing vectors without an explicit error.

-if inference_response.embeddings.is_empty() {
+let embeddings_len = inference_response.embeddings.len();
+if embeddings_len == 0 {
     return Err(StorageError::service_error(
         "Inference service returned no vectors. Check if models are properly loaded.",
     ));
 }
+
+if embeddings_len != objects_serialized.len() {
+    return Err(StorageError::service_error(format!(
+        "Inference service returned {embeddings_len} vectors for {} inputs",
+        objects_serialized.len()
+    )));
+}

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/common/inference/infer_processing.rs around lines 53 to 62, the code uses
zip to combine objects_serialized and embeddings, which can silently drop
objects if embeddings are fewer. Add a check to compare the length of embeddings
and objects_serialized before zipping, and return an explicit error if they do
not match to prevent silent data loss.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
docs/redoc/master/openapi.json (2)

15255-15268: Add descriptive metadata to InferenceUsage schema
Consider enriching the InferenceUsage definition with description and example fields so API consumers understand its purpose:

"InferenceUsage": {
  "type": "object",
  "description": "Maps model names to their token usage counts",
  "required": ["models"],
  "properties": {
    "models": {
      "type": "object",
      "additionalProperties": {
        "$ref": "#/components/schemas/ModelUsage"
      }
    }
  },
  "example": {
    "models": {
      "gpt-4": { "tokens": 1200 },
      "gpt-3.5-turbo": { "tokens": 800 }
    }
  }
}

Adding these fields makes the OpenAPI docs and generated client code more self-explanatory.


15269-15281: Include examples or format clarifications in ModelUsage
The ModelUsage.tokens property is well defined, but adding an example or clarifying "type": "integer" usage will aid in client generation:

"ModelUsage": {
  "type": "object",
  "description": "Token usage metrics for a model",
  "required": ["tokens"],
  "properties": {
    "tokens": {
      "type": "integer",
      "format": "uint64",
      "minimum": 0,
      "example": 1024
    }
  }
}

This makes the schema more descriptive for downstream tools.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b3b7d7 and b16ed76.

📒 Files selected for processing (1)
  • docs/redoc/master/openapi.json (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-consensus-compose
  • GitHub Check: test-low-resources
  • GitHub Check: storage-compat-test
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-consistency
  • GitHub Check: lint
  • GitHub Check: integration-tests
  • GitHub Check: integration-tests-consensus
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: rust-tests (windows-latest)

Comment on lines +15242 to +15287
},
"inference": {
"anyOf": [
{
"$ref": "#/components/schemas/InferenceUsage"
},
{
"nullable": true
}
]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix nullability declaration for the inference field
The anyOf branch { "nullable": true } is not a valid standalone schema in JSON Schema. To explicitly allow null, replace it with a { "type": "null" } schema or move nullable: true to the parent object in OpenAPI 3.0. For example:

-"inference": {
-  "anyOf": [
-    { "$ref": "#/components/schemas/InferenceUsage" },
-    { "nullable": true }
-  ]
-}
+"inference": {
+  "anyOf": [
+    { "$ref": "#/components/schemas/InferenceUsage" },
+    { "type": "null" }
+  ]
+}

This ensures proper schema validation across clients and tools.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
},
"inference": {
"anyOf": [
{
"$ref": "#/components/schemas/InferenceUsage"
},
{
"nullable": true
}
]
}
},
"inference": {
"anyOf": [
{
"$ref": "#/components/schemas/InferenceUsage"
},
{
"type": "null"
}
]
}
🤖 Prompt for AI Agents
In docs/redoc/master/openapi.json around lines 15242 to 15252, the `anyOf` array
for the `inference` field incorrectly uses `{ "nullable": true }` as a
standalone schema, which is invalid. To fix this, replace `{ "nullable": true }`
with `{ "type": "null" }` or alternatively move the `nullable: true` property to
the parent `inference` object to correctly indicate that the field can be null,
ensuring proper schema validation.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
docs/grpc/docs.md (1)

181-182: Fix markdown list indentation for nested ToC entries
The newly added ToC bullets for InferenceUsage and InferenceUsage.ModelEntry are indented with 4 spaces, but our markdownlint config expects 2-space indents for nested lists.

-    - [InferenceUsage](#qdrant-InferenceUsage)
-    - [InferenceUsage.ModelEntry](#qdrant-InferenceUsage-ModelEntry)
+  - [InferenceUsage](#qdrant-InferenceUsage)
+  - [InferenceUsage.ModelEntry](#qdrant-InferenceUsage-ModelEntry)
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

181-181: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


182-182: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b16ed76 and 6f708a6.

📒 Files selected for processing (1)
  • docs/grpc/docs.md (4 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
docs/grpc/docs.md

181-181: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


182-182: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


183-183: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


184-184: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


185-185: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


186-186: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


187-187: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


188-188: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-consistency
  • GitHub Check: test-low-resources
  • GitHub Check: integration-tests-consensus
  • GitHub Check: lint
  • GitHub Check: test-consensus-compose
  • GitHub Check: integration-tests
  • GitHub Check: storage-compat-test
🔇 Additional comments (4)
docs/grpc/docs.md (4)

3171-3180: InferenceUsage type definition is correct
The InferenceUsage message is documented with the appropriate header and table schema. It clearly specifies the repeated model field of type InferenceUsage.ModelEntry.


3187-3195: InferenceUsage.ModelEntry documentation looks good
The InferenceUsage.ModelEntry map-entry is accurately defined with key and value columns.


3287-3295: ModelUsage schema is properly documented
The ModelUsage message and its single tokens field are correctly presented.


4814-4814: Include inference in Usage message
The Usage table is updated to include the optional inference field of type InferenceUsage, matching the design in gRPC and REST models.

@n0x29a n0x29a requested a review from generall May 20, 2025 16:44
coszio
coszio previously requested changes May 20, 2025
Copy link
Contributor

@coszio coszio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't notice before for a lack of being in the loop, but I believe there is no need to duplicate the inference handling twice (one for grpc and one for rest) since we already have CollectionQueryRequest, which acts as a direct converging type from both interfaces, exactly with this kind of purpose.

Previously, it was introduced to handle ID -> vector resolution. I think it is the same case for Object -> vector inference.

I'm fine with dismissing this change request if there a reason it is being handled directly on the API types, I just want to make it explicit.

Comment on lines +1186 to +1188
message ModelUsage {
uint64 tokens = 1;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only intended for text? how will we represent measurements about other modalities like images?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for images it will be same format:

"models": {
  "openai/clip-v1": { "tokens": 50 }
},

currently supported: text, images, object (it could be anything like sound, video, etc)

let CollectionQueryRequestWithUsage { request, usage } =
convert_query_request_from_rest(query_request, &inference_token).await?;

let pass = check_strict_mode(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strict mode should be checked before inference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will require implementing StrictModeVerification Trait for external type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've decided to implement this in a separate PR to keep the current one simple

.collect_vec();

Ok(QueryResponse { points })
Ok((QueryResponse { points }, usage))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also want to return usage even in a case when operation failed on later stages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we've decided to not implement it in this pr

@coszio coszio dismissed their stale review May 21, 2025 17:30

After discussion in chat, it makes sense to keep current approach of handling at API layer with the purpose of not introducing inference responsibility into Collection level. This dismisses my previous review 👍

@@ -55,12 +56,17 @@ pub struct ApiResponse<D> {
pub struct Usage {
#[serde(skip_serializing_if = "Option::is_none")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why skipping only for hardware?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be not enough to add same condition for inference because there is no guarantee that it will be exactly None instead of Some. Because of that there is check here

.await;

helpers::process_response(result, timing, request_hw_counter.to_rest_api())
let (result, inference_usage) = match result {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement seems to be used a lot. Can we maybe extract it into a function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines don't justify extracting a standalone function we'll never reuse, I think

hardware_usage: Option<HardwareUsage>,
inference_usage: Option<InferenceUsage>,
) -> HttpResponse {
let usage = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You created a function that seems to do exactly the same: usage_or_none
Can we call this one here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can't be used directly because usage_or_none expects grpc types

))
})?;

if inference_response.embeddings.is_empty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we need this error anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we still need it. but not it's here.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/common/inference/query_requests_rest.rs (1)

56-59: Consider standardizing error handling for usage data

There's some inconsistency in how usage data is handled in errors. Low-level functions like convert_vector_input_with_inferred don't include usage in their errors (lines 187, 196, 205), but these errors are later wrapped with usage data in parent functions. Consider standardizing this approach by either:

  1. Passing usage data to all error-generating functions, or
  2. Creating a helper function for consistent error mapping with usage data

This would reduce code duplication and ensure consistent error handling throughout the file.

Also applies to: 69-72, 128-133, 142-145, 222-228, 241-248, 253-261, 278-284, 289-295, 307-315

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c70dfe and d043cd1.

📒 Files selected for processing (7)
  • lib/collection/src/operations/types.rs (1 hunks)
  • lib/collection/src/operations/universal_query/collection_query.rs (3 hunks)
  • lib/collection/src/operations/verification/query.rs (2 hunks)
  • lib/storage/src/content_manager/errors.rs (5 hunks)
  • src/actix/api/query_api.rs (8 hunks)
  • src/common/inference/query_requests_rest.rs (15 hunks)
  • src/common/inference/update_requests.rs (13 hunks)
✅ Files skipped from review due to trivial changes (1)
  • lib/collection/src/operations/universal_query/collection_query.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/actix/api/query_api.rs
🔇 Additional comments (37)
lib/collection/src/operations/types.rs (1)

1040-1043: Excellent addition of inference usage tracking to error reporting

This change adds an optional usage field to the InferenceError variant, allowing the system to track and propagate inference usage metrics alongside error descriptions. This is valuable for comprehensive error reporting, accurate billing, and resource monitoring, even when operations fail.

lib/storage/src/content_manager/errors.rs (4)

5-5: LGTM: Import added for InferenceUsage

Adding this import properly supports the changes to the error types to include inference usage data.


40-43: Good extension of the InferenceError variant

This change maintains consistency with the corresponding change to CollectionError in the first file, ensuring that inference usage data can be propagated through the error handling chain.


52-59: Appropriate update to the inference_error constructor

The constructor method is properly updated to accept and store the new usage parameter, providing a clean API for creating inference errors with associated usage data.


159-161: Correct propagation of usage data in error mapping

This change ensures that inference usage data is not lost when errors are translated from CollectionError to StorageError, maintaining data integrity across the error handling chain.

lib/collection/src/operations/verification/query.rs (2)

7-9: LGTM: Import added for CollectionQueryRequestWithUsage

This import properly supports the new trait implementation added in this file.


139-187: Clean implementation of StrictModeVerification for the wrapper type

This implementation elegantly uses delegation to the nested request object, minimizing code duplication while ensuring consistent behavior between the two implementations. The wrapper pattern allows the system to carry inference usage data without disrupting existing logic.

src/common/inference/query_requests_rest.rs (4)

4-6: LGTM: Imports added for wrapped request types

These imports properly support the changes to return types in the conversion functions.


25-25: Appropriate return type updates for conversion functions

Changing the return types to include the wrapped request types with usage data ensures that inference usage metrics are propagated through the request processing pipeline.

Also applies to: 104-104


40-51: Good extraction of inference usage data

The code now properly extracts both the inference results and usage data from the batch processing, making this information available for further processing and error reporting.

Also applies to: 106-107


95-98: Correct wrapping of requests with usage data

The code properly wraps the original request objects with usage data at the end of the conversion functions, preserving the existing logic flow while adding the new functionality.

Also applies to: 163-166

src/common/inference/update_requests.rs (26)

3-3: LGTM: Import added for the new InferenceUsage type.

Good addition of the necessary import for the new InferenceUsage type from the REST models.


21-21: Function signature updated to return inference usage data.

The return type has been appropriately updated to include the optional InferenceUsage alongside the converted point structs.


45-52: LGTM: Properly destructuring inference results into data and usage.

Good refactoring to destructure the tuple returned from the batch inference operation into (inferred_data, usage), properly capturing both the inference results and usage metrics.


67-67: Better variable naming for clarity.

Changing the variable name from vector to vector_data improves code clarity by making it explicit that this is the data being processed, not just a reference.


76-77: Error message improvements.

The error message has been enhanced to provide more specific context about the failure scenario, and properly includes the None parameter for the optional inference usage.


87-98: Improved variable naming and error handling.

Good renaming of vector to vector_data for clarity and proper error messaging with context about document inference.


109-118: Consistent variable naming pattern applied to image processing.

Maintaining the consistent pattern of using vector_data for clarity in the image processing branch.


129-138: Consistent variable naming pattern applied to object processing.

The same pattern of using vector_data is applied to the object processing branch, maintaining consistency throughout the code.


159-159: LGTM: Return type matches updated function signature.

The return statement now correctly includes both the converted points and the optional inference usage data.


165-165: Function signature updated for batch conversion.

The return type has been appropriately updated to include the optional InferenceUsage alongside the converted batch data.


172-173: Initialize inference usage tracking.

Good initialization of the optional inference usage tracking variable that will be populated during the batch processing.


174-207: Comprehensive batch processing with inference tracking.

The batch processing logic has been significantly enhanced to track inference usage while handling different vector types. The error messages for unsupported direct processing operations provide clear guidance.


209-215: LGTM: Clean batch construction and result return.

The batch construction and result return are correctly implemented to include the inference usage data.


222-222: Function signature updated for point vectors conversion.

The return type has been appropriately updated to include the optional InferenceUsage alongside the converted point vectors.


238-245: Consistent pattern for handling inference results and usage.

The same pattern used in other functions is applied here to destructure the tuple into inference results and usage data.


266-267: Improved error message for named vectors.

The error message has been enhanced to provide more specific context about failures in named vectors, with proper inclusion of the optional inference usage parameter.


279-281: Consistent error message improvements for direct processing.

Error messages have been enhanced consistently across different vector types to provide more specific context about unsupported direct processing operations.

Also applies to: 285-287, 291-293


297-301: Clean construction of persisted point vectors.

The construction of persisted point vectors is clean and unchanged, as expected since it doesn't directly involve inference usage.


303-303: LGTM: Return type matches updated function signature.

The return statement now correctly includes both the converted point vectors and the optional inference usage data.


323-324: Consistent variable naming in internal function.

The internal function convert_point_struct_with_inferred also benefits from the improved variable naming from vector to vector_data.


330-332: Improved code formatting for vector data processing.

The code formatting for vector data processing has been improved for better readability and consistency across different vector types.

Also applies to: 345-346, 359-360


387-387: Function signature updated for vectors conversion.

The return type has been appropriately updated to include the optional InferenceUsage alongside the converted vectors.


398-405: Consistent pattern for handling inference results and usage.

The same pattern used in other functions is applied here to destructure the tuple into inference results and usage data.


407-424: Improved vector conversion with inference usage tracking.

The vector conversion logic has been enhanced to include inference usage tracking while maintaining the same overall structure and error handling.


425-425: LGTM: Return type matches updated function signature.

The return statement now correctly maps the result to include both the converted vectors and the optional inference usage data.


439-440: Consistent error handling in vector conversion.

Error handling in the vector conversion function is consistent and provides clear context about missing inferred vectors for different data types.

Also applies to: 444-446, 450-452

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

♻️ Duplicate comments (3)
lib/api/src/rest/models.rs (1)

57-59: 🛠️ Refactor suggestion

Add skip_serializing_if attribute to inference field for consistency.

The hardware field has #[serde(skip_serializing_if = "Option::is_none")] to avoid serializing null values, but the inference field is missing this attribute. This inconsistency could lead to different serialization behavior.

Apply this diff to maintain consistency:

 #[serde(skip_serializing_if = "Option::is_none")]
 pub hardware: Option<HardwareUsage>,
+#[serde(skip_serializing_if = "Option::is_none")]
 pub inference: Option<InferenceUsage>,
lib/storage/src/content_manager/toc/telemetry.rs (1)

23-35: Apply the same error handling improvement here.

This method has the same silent error handling pattern as get_telemetry_data.

lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (1)

114-117: Performance consideration for cloning sparse vectors.

Similar to the previous comment, cloning sparse vectors here could impact performance for large vectors.

🧹 Nitpick comments (25)
tests/consensus_tests/test_consensus_compaction.py (1)

132-132: Good improvement for test reliability.

Adding ?wait=true ensures the metadata operation completes before proceeding, which improves test reliability in consensus scenarios.

Consider whether the sleep(CONSENSUS_WAIT_SECONDS) on line 134 is still necessary now that we're explicitly waiting for the operation to complete with wait=true.

lib/segment/src/segment_constructor/segment_builder.rs (2)

100-109: Consider cleaner conditional compilation approach.

The conditional compilation for RocksDB-specific parameters works but could be more readable. The placement of #[cfg(feature = "rocksdb")] on individual parameters makes the function call harder to read.

Consider using conditional blocks or feature-specific wrapper functions for better readability:

-            let vector_storage = open_vector_storage(
-                #[cfg(feature = "rocksdb")]
-                &mut db_builder,
-                vector_config,
-                #[cfg(feature = "rocksdb")]
-                &Default::default(),
-                &vector_storage_path,
-                #[cfg(feature = "rocksdb")]
-                vector_name,
-            )?;
+            let vector_storage = {
+                #[cfg(feature = "rocksdb")]
+                {
+                    open_vector_storage(
+                        &mut db_builder,
+                        vector_config,
+                        &Default::default(),
+                        &vector_storage_path,
+                        vector_name,
+                    )?
+                }
+                #[cfg(not(feature = "rocksdb"))]
+                {
+                    open_vector_storage(vector_config, &vector_storage_path)?
+                }
+            };

123-132: Same readability concern with conditional compilation.

Similar to the open_vector_storage call, the conditional compilation here could be more readable with block-level conditionals rather than parameter-level attributes.

lib/storage/src/content_manager/snapshots/recover.rs (1)

172-189:

Details

❓ Verification inconclusive

Consider batch operation for payload index registration.

The current implementation registers each payload index field via separate consensus operations. While functionally correct, this could lead to performance issues with large schemas.

Consider if the consensus system supports batch operations for payload index registration:

-            for (field_name, field_schema) in schema.iter() {
-                let consensus_op =
-                    CollectionMetaOperations::CreatePayloadIndex(CreatePayloadIndex {
-                        collection_name: collection_pass.to_string(),
-                        field_name: field_name.clone(),
-                        field_schema: field_schema.clone(),
-                    });
-
-                dispatcher
-                    .submit_collection_meta_op(consensus_op, access.clone(), None)
-                    .await?;
-            }
+            // If batch operations are supported:
+            let batch_ops: Vec<_> = schema.iter().map(|(field_name, field_schema)| {
+                CollectionMetaOperations::CreatePayloadIndex(CreatePayloadIndex {
+                    collection_name: collection_pass.to_string(),
+                    field_name: field_name.clone(),
+                    field_schema: field_schema.clone(),
+                })
+            }).collect();
+            
+            dispatcher.submit_batch_collection_meta_ops(batch_ops, access.clone(), None).await?;

Let me check if batch consensus operations are available:


🏁 Script executed:

#!/bin/bash
# Check if batch consensus operations are supported in the dispatcher
ast-grep --pattern 'impl $_ for Dispatcher {
  $$$
  submit_batch_$_($$$) {
    $$$
  }
  $$$
}'

Length of output: 98


Dispatcher Lacks Batch Meta-Op API

I didn’t find any submit_batch_* method in Dispatcher, so batch‐submitting payload index operations isn’t currently supported. You have two paths forward:

  • Verify consensus support: Confirm whether the underlying consensus service can handle batch index‐creation requests.
  • Add batch API if needed: If it does, consider extending Dispatcher with a submit_batch_collection_meta_ops method before applying the diff below. Otherwise, retaining the per‐field loop is the only option today.

File: lib/storage/src/content_manager/snapshots/recover.rs
Lines: 172–189

-            for (field_name, field_schema) in schema.iter() {
-                let consensus_op =
-                    CollectionMetaOperations::CreatePayloadIndex(CreatePayloadIndex {
-                        collection_name: collection_pass.to_string(),
-                        field_name: field_name.clone(),
-                        field_schema: field_schema.clone(),
-                    });
-
-                dispatcher
-                    .submit_collection_meta_op(consensus_op, access.clone(), None)
-                    .await?;
-            }
+            // Batch operation isn’t available in Dispatcher today.
+            // If you add `submit_batch_collection_meta_ops` to Dispatcher,
+            // you can replace this loop with a single batch call:
+            //
+            // let batch_ops: Vec<_> = schema.iter().map(|(field_name, field_schema)| {
+            //     CollectionMetaOperations::CreatePayloadIndex(CreatePayloadIndex {
+            //         collection_name: collection_pass.to_string(),
+            //         field_name: field_name.clone(),
+            //         field_schema: field_schema.clone(),
+            //     })
+            // }).collect();
+            // dispatcher
+            //     .submit_batch_collection_meta_ops(batch_ops, access.clone(), None)
+            //     .await?;
+            for (field_name, field_schema) in schema.iter() {
+                dispatcher
+                    .submit_collection_meta_op(
+                        CollectionMetaOperations::CreatePayloadIndex(CreatePayloadIndex {
+                            collection_name: collection_pass.to_string(),
+                            field_name: field_name.clone(),
+                            field_schema: field_schema.clone(),
+                        }),
+                        access.clone(),
+                        None,
+                    )
+                    .await?;
+            }

Please verify consensus capabilities or implement the batch API before adopting this change.

lib/segment/Cargo.toml (1)

26-28: Important: Document the partial RocksDB disabling

The comment correctly notes that disabling the RocksDB feature only partially disables RocksDB usage. This is valuable documentation for future work.

Would you like me to help identify remaining RocksDB dependencies that need to be gated behind this feature flag?

tests/consensus_tests/test_force_delete_peer.py (1)

39-42: Simplified test flow is more maintainable.

The direct use of replicate_shard helper function makes the test more focused and easier to understand compared to the previous complex flow involving process management.

Note: The static analysis tool flags replicate_shard as potentially undefined, but based on the relevant code snippets, it's correctly defined in utils.py and imported via the star import on line 2. Consider replacing the star import with explicit imports for better code clarity and to avoid static analysis warnings:

-from .utils import *
+from .utils import start_cluster, wait_collection_exists_and_active_on_all_peers, wait_for_collection_shard_transfers_count, assert_project_root, replicate_shard
🧰 Tools
🪛 Ruff (0.11.9)

42-42: replicate_shard may be undefined, or defined from star imports

(F405)

lib/storage/src/content_manager/toc/mod.rs (1)

10-10: Good modularization of telemetry functionality.

Adding the telemetry module improves code organization by separating telemetry-related functionality into a dedicated module. This aligns with the refactoring described in the AI summary where get_telemetry_data and get_aggregated_telemetry_data methods were moved out of the main TableOfContent implementation.

The module is appropriately placed in alphabetical order within the module declarations.

Note: While the AI summary indicates that telemetry methods were removed from this file, those removals are not visible in the provided diff, so I cannot review the actual method removals or verify the refactoring is complete.

lib/segment/src/segment_constructor/rocksdb_builder.rs (1)

34-46: Clean up unused variable warning in conditional compilation.

The conditional compilation pattern correctly gates RocksDB-specific logic, but the fallback path creates unused variables that will trigger compiler warnings.

Apply this diff to eliminate the unused variable warning:

                .filter_map(|(vector_name, config)| {
                    #[cfg(feature = "rocksdb")]
                    if matches!(
                        config.storage_type,
                        crate::types::SparseVectorStorageType::OnDisk
                    ) {
                        return Some(get_vector_name_with_prefix(
                            rocksdb_wrapper::DB_VECTOR_CF,
                            vector_name,
                        ));
                    }
-                    let (_, _) = (vector_name, config);
+                    #[cfg(not(feature = "rocksdb"))]
+                    let _ = (vector_name, config);
                    None
                });
lib/storage/src/content_manager/toc/telemetry.rs (1)

8-21: Consider logging errors when collection retrieval fails.

The current implementation silently skips collections that fail to load. This could hide important issues like corrupted collections or permission problems.

Consider adding error logging to help with debugging:

 for collection_pass in &all_collections {
-    if let Ok(collection) = self.get_collection(collection_pass).await {
+    match self.get_collection(collection_pass).await {
+        Ok(collection) => {
         result.push(collection.get_telemetry_data(detail).await);
+        }
+        Err(e) => {
+            log::warn!("Failed to get telemetry data for collection {}: {}", collection_pass.name(), e);
+        }
     }
 }
tests/consensus_tests/test_partial_snapshot.py (1)

46-62: Test logic is sound but consider making it more robust.

The test correctly validates that search requests are rejected during recovery, but the polling approach could be improved for reliability.

Consider adding a small delay and timeout mechanism to make the test more robust:

 def test_partial_snapshot_read_lock(tmp_path: pathlib.Path):
     assert_project_root()
 
     write_peer, read_peer = bootstrap_peers(tmp_path, 100_000)
 
     executor = concurrent.futures.ThreadPoolExecutor(max_workers = 1)
     recover_future = executor.submit(recover_partial_snapshot_from, read_peer, write_peer)
 
     is_search_rejected = False
+    attempts = 0
+    max_attempts = 50  # Prevent infinite loop
-    while not recover_future.done():
+    while not recover_future.done() and attempts < max_attempts:
         response = try_search_random(read_peer)
         if response.status_code == 500:
             is_search_rejected = True
             break
+        attempts += 1
+        time.sleep(0.1)  # Small delay between attempts
 
     assert is_search_rejected, "Search requests have to be rejected during partial snapshot recovery"
🧰 Tools
🪛 Ruff (0.11.9)

47-47: assert_project_root may be undefined, or defined from star imports

(F405)

lib/collection/src/shards/shard_holder/mod.rs (1)

1020-1023: Consider removing the underscore prefix from the lock variable.

The _partial_snapshot_search_lock variable holds an important lock guard that ensures exclusive access during recovery. Consider removing the underscore prefix to make it clear that this variable serves an important purpose through its RAII behavior.

-        let _partial_snapshot_search_lock = self
+        let partial_snapshot_search_lock = self
             .take_partial_snapshot_search_write_lock(shard_id, recovery_type)
             .await?;
lib/posting_list/src/iterator.rs (1)

45-49: Consider using the visitor's public API instead of accessing internal fields.

The implementation is correct, but accessing visitor.list directly breaks encapsulation. Consider using self.visitor.len() instead.

 impl<H: ValueHandler> ExactSizeIterator for PostingIterator<'_, H> {
     fn len(&self) -> usize {
-        self.visitor.list.len().saturating_sub(self.offset)
+        self.visitor.len().saturating_sub(self.offset)
     }
 }
lib/posting_list/src/visitor.rs (1)

67-108: Consider adding bounds check optimization

The get_by_offset method performs bounds checking, but you could optimize the chunk boundary cases by caching the total chunk count.

lib/collection/src/shards/replica_set/partial_snapshot_meta.rs (2)

60-70: Consider more specific error types

The error returned when failing to acquire the search read lock uses a generic ServiceError. Consider creating a more specific error variant for this case to help with error handling and debugging.

-            .map_err(|_| CollectionError::ServiceError {
-                error: "shard unavailable, partial snapshot recovery is in progress".into(),
-                backtrace: None,
-            })
+            .map_err(|_| CollectionError::ShardUnavailable {
+                reason: "partial snapshot recovery in progress".into(),
+            })

107-112: Potential integer overflow in request counter

The request counter uses fetch_add without checking for overflow. While unlikely in practice, consider using fetch_add with Ordering::Relaxed and checking for overflow if the counter could potentially wrap around.

lib/segment/src/index/hnsw_index/graph_layers_builder.rs (1)

501-638: Well-implemented healing search algorithm with room for optimization.

The search_shortcuts_on_level method effectively implements the dual-graph search for finding border points between deleted and non-deleted regions. The implementation is correct and well-documented.

Consider addressing the TODO comments (lines 548-550) about using a priority queue instead of a stack for potentially better search navigation in the deleted sub-graph.

lib/segment/src/index/hnsw_index/hnsw.rs (1)

1582-1657: Complex but well-structured point migration with healing.

The migrate_point method correctly implements the healing logic for migrating points with missing neighbors. The use of search_shortcuts_on_level to find replacement links is appropriate.

Consider adding more detailed documentation about the healing heuristics and when they're applied.

lib/posting_list/src/posting_list.rs (1)

32-33: Document the purpose of #[repr(C)] on PostingChunk

The #[repr(C)] attribute suggests this struct might be used for memory-mapped files, FFI, or requires a stable memory layout. Consider adding a comment explaining why this representation is needed.

lib/posting_list/src/builder.rs (1)

41-42: Consider using stable sort for deterministic behavior

Using sort_unstable_by_key means elements with duplicate IDs may be reordered non-deterministically. If deterministic ordering is important for reproducibility or testing, consider using sort_by_key instead.

lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (2)

60-73: Consider clarifying the early return logic for non-existent vectors.

The condition at line 61 returns false when trying to undelete a non-existent vector, which might be confusing since false typically indicates the vector was not previously deleted. Consider either documenting this edge case or returning a more explicit error/option type.


158-161: Clarify the deletion logic for better readability.

The negation in line 159 is correct but confusing. Consider using a more descriptive variable name or adding a comment to clarify that we're inverting the previous state to indicate whether the deletion was successful.

     fn delete_vector(&mut self, key: PointOffsetType) -> OperationResult<bool> {
-        let is_deleted = !self.set_deleted(key, true);
-        Ok(is_deleted)
+        let was_previously_deleted = self.set_deleted(key, true);
+        let deletion_was_successful = !was_previously_deleted;
+        Ok(deletion_was_successful)
     }
lib/posting_list/src/view.rs (1)

124-126: Document safety requirements for unchecked access.

Since this method provides unchecked access, consider adding a safety comment documenting the preconditions that callers must ensure.

+    /// Returns sized values for the given chunk without bounds checking.
+    /// 
+    /// # Safety
+    /// Caller must ensure that chunk_idx is within bounds of self.chunks.
     pub(crate) fn sized_values_unchecked(&self, chunk_idx: usize) -> &[H::Sized] {
         &self.chunks[chunk_idx].sized_values
     }
lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (1)

269-284: Clean conditional compilation for storage creation!

The RocksDB-specific imports are properly scoped within the feature flag.

Consider adding a compile-time error or panic for unsupported configurations:

 match storage_type {
     #[cfg(feature = "rocksdb")]
     MultiDenseStorageType::RocksDbFloat => {
         // existing code...
     }
+    #[cfg(not(feature = "rocksdb"))]
+    MultiDenseStorageType::RocksDbFloat => {
+        panic!("RocksDB storage requested but rocksdb feature not enabled");
+    }
     MultiDenseStorageType::AppendableMmapFloat => // existing code...
 }
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (1)

86-86: Consider documenting the performance implications of cloning.

The cloned() operation on sparse vectors could be expensive for vectors with many non-zero elements. Consider adding a comment explaining why cloning is necessary here for ownership in volatile storage.

         } else {
+            // Clone is necessary as we need to own the vector in volatile storage
             *entry = vector.cloned();
         }
lib/segment/src/segment_constructor/segment_constructor_base.rs (1)

720-745: Consider documenting the migration strategy.

The logic for choosing between RocksDB and file-based mutable trackers is crucial for backward compatibility. Consider adding a comment explaining the migration path and when the RocksDB check can be removed.

     #[cfg(feature = "rocksdb")]
     {
         use crate::common::rocksdb_wrapper::DB_MAPPING_CF;
 
+        // Migration logic: Use RocksDB tracker if there are existing mappings in the database.
+        // This ensures backward compatibility with segments created before the file-based tracker.
+        // TODO(1.15 or later): remove this check and use new mutable ID tracker unconditionally
         let use_rocksdb_mutable_tracker = if let Some(db) = db_builder.read() {
-            // New ID tracker is enabled by default, but we still use the old tracker if we have
-            // any mappings stored in RocksDB
-            //
-            // TODO(1.15 or later): remove this check and use new mutable ID tracker unconditionally
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d043cd1 and 0f109ff.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (100)
  • Cargo.toml (7 hunks)
  • Dockerfile (1 hunks)
  • config/config.yaml (1 hunks)
  • docs/redoc/master/openapi.json (5 hunks)
  • lib/api/src/rest/conversions.rs (2 hunks)
  • lib/api/src/rest/models.rs (3 hunks)
  • lib/collection/Cargo.toml (1 hunks)
  • lib/collection/src/collection/snapshots.rs (1 hunks)
  • lib/collection/src/collection_manager/holders/segment_holder.rs (8 hunks)
  • lib/collection/src/collection_manager/optimizers/segment_optimizer.rs (8 hunks)
  • lib/collection/src/operations/types.rs (3 hunks)
  • lib/collection/src/operations/verification/query.rs (7 hunks)
  • lib/collection/src/problems/unindexed_field.rs (11 hunks)
  • lib/collection/src/shards/local_shard/mod.rs (2 hunks)
  • lib/collection/src/shards/replica_set/execute_read_operation.rs (2 hunks)
  • lib/collection/src/shards/replica_set/mod.rs (5 hunks)
  • lib/collection/src/shards/replica_set/partial_snapshot_meta.rs (1 hunks)
  • lib/collection/src/shards/replica_set/snapshots.rs (3 hunks)
  • lib/collection/src/shards/replica_set/telemetry.rs (2 hunks)
  • lib/collection/src/shards/shard_holder/mod.rs (2 hunks)
  • lib/collection/src/shards/telemetry.rs (2 hunks)
  • lib/collection/src/tests/mod.rs (5 hunks)
  • lib/collection/src/update_handler.rs (2 hunks)
  • lib/common/common/src/fixed_length_priority_queue.rs (2 hunks)
  • lib/common/common/src/flags.rs (3 hunks)
  • lib/gridstore/benches/bitmask_bench.rs (1 hunks)
  • lib/gridstore/benches/real_data_bench.rs (1 hunks)
  • lib/gridstore/src/bitmask/gaps.rs (1 hunks)
  • lib/posting_list/Cargo.toml (1 hunks)
  • lib/posting_list/src/builder.rs (1 hunks)
  • lib/posting_list/src/iterator.rs (1 hunks)
  • lib/posting_list/src/lib.rs (1 hunks)
  • lib/posting_list/src/posting_list.rs (1 hunks)
  • lib/posting_list/src/tests.rs (1 hunks)
  • lib/posting_list/src/value_handler.rs (1 hunks)
  • lib/posting_list/src/view.rs (1 hunks)
  • lib/posting_list/src/visitor.rs (1 hunks)
  • lib/segment/Cargo.toml (4 hunks)
  • lib/segment/benches/dynamic_mmap_flags.rs (1 hunks)
  • lib/segment/benches/mmap_bitslice_buffered_update_wrapper.rs (1 hunks)
  • lib/segment/src/fixtures/payload_fixtures.rs (1 hunks)
  • lib/segment/src/id_tracker/id_tracker_base.rs (22 hunks)
  • lib/segment/src/id_tracker/mod.rs (2 hunks)
  • lib/segment/src/id_tracker/simple_id_tracker.rs (2 hunks)
  • lib/segment/src/index/field_index/full_text_index/text_index.rs (1 hunks)
  • lib/segment/src/index/field_index/mod.rs (1 hunks)
  • lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs (2 hunks)
  • lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (2 hunks)
  • lib/segment/src/index/hnsw_index/gpu/mod.rs (2 hunks)
  • lib/segment/src/index/hnsw_index/graph_layers.rs (1 hunks)
  • lib/segment/src/index/hnsw_index/graph_layers_builder.rs (6 hunks)
  • lib/segment/src/index/hnsw_index/hnsw.rs (11 hunks)
  • lib/segment/src/index/hnsw_index/search_context.rs (1 hunks)
  • lib/segment/src/segment_constructor/rocksdb_builder.rs (3 hunks)
  • lib/segment/src/segment_constructor/segment_builder.rs (3 hunks)
  • lib/segment/src/segment_constructor/segment_constructor_base.rs (10 hunks)
  • lib/segment/src/spaces/simple_avx.rs (1 hunks)
  • lib/segment/src/spaces/simple_sse.rs (1 hunks)
  • lib/segment/src/types.rs (5 hunks)
  • lib/segment/src/vector_storage/common.rs (2 hunks)
  • lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (7 hunks)
  • lib/segment/src/vector_storage/dense/mod.rs (1 hunks)
  • lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (1 hunks)
  • lib/segment/src/vector_storage/mod.rs (1 hunks)
  • lib/segment/src/vector_storage/multi_dense/mod.rs (1 hunks)
  • lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs (1 hunks)
  • lib/segment/src/vector_storage/quantized/quantized_vectors.rs (2 hunks)
  • lib/segment/src/vector_storage/raw_scorer.rs (2 hunks)
  • lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (1 hunks)
  • lib/segment/src/vector_storage/sparse/mod.rs (1 hunks)
  • lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (1 hunks)
  • lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (1 hunks)
  • lib/segment/src/vector_storage/tests/async_raw_scorer.rs (2 hunks)
  • lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs (3 hunks)
  • lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (12 hunks)
  • lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (8 hunks)
  • lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (2 hunks)
  • lib/segment/src/vector_storage/vector_storage_base.rs (44 hunks)
  • lib/storage/src/content_manager/conversions.rs (1 hunks)
  • lib/storage/src/content_manager/errors.rs (6 hunks)
  • lib/storage/src/content_manager/snapshots/recover.rs (3 hunks)
  • lib/storage/src/content_manager/toc/create_collection.rs (1 hunks)
  • lib/storage/src/content_manager/toc/mod.rs (1 hunks)
  • lib/storage/src/content_manager/toc/telemetry.rs (1 hunks)
  • lib/storage/src/types.rs (1 hunks)
  • lib/storage/tests/integration/alias_tests.rs (1 hunks)
  • src/actix/api/query_api.rs (7 hunks)
  • src/actix/api/snapshot_api.rs (3 hunks)
  • src/actix/helpers.rs (7 hunks)
  • src/common/inference/update_requests.rs (13 hunks)
  • src/common/snapshots.rs (2 hunks)
  • src/common/telemetry_ops/collections_telemetry.rs (2 hunks)
  • src/common/update.rs (10 hunks)
  • tests/consensus_tests/test_consensus_compaction.py (1 hunks)
  • tests/consensus_tests/test_force_delete_peer.py (1 hunks)
  • tests/consensus_tests/test_partial_snapshot.py (3 hunks)
  • tests/consensus_tests/test_shard_wal_delta_transfer.py (4 hunks)
  • tests/consensus_tests/utils.py (0 hunks)
  • tests/openapi/test_service.py (1 hunks)
  • tests/openapi/test_strictmode.py (12 hunks)
💤 Files with no reviewable changes (1)
  • tests/consensus_tests/utils.py
✅ Files skipped from review due to trivial changes (15)
  • lib/storage/tests/integration/alias_tests.rs
  • lib/segment/benches/dynamic_mmap_flags.rs
  • Dockerfile
  • lib/segment/src/vector_storage/mod.rs
  • lib/gridstore/src/bitmask/gaps.rs
  • lib/segment/benches/mmap_bitslice_buffered_update_wrapper.rs
  • lib/collection/src/shards/replica_set/mod.rs
  • lib/segment/src/index/field_index/mod.rs
  • lib/segment/src/vector_storage/dense/mod.rs
  • lib/posting_list/Cargo.toml
  • lib/segment/src/vector_storage/sparse/mod.rs
  • lib/segment/src/vector_storage/multi_dense/mod.rs
  • lib/segment/src/id_tracker/id_tracker_base.rs
  • lib/collection/Cargo.toml
  • lib/common/common/src/fixed_length_priority_queue.rs
🚧 Files skipped from review as they are similar to previous changes (9)
  • lib/collection/src/operations/types.rs
  • lib/api/src/rest/conversions.rs
  • src/actix/api/query_api.rs
  • lib/storage/src/content_manager/errors.rs
  • src/actix/helpers.rs
  • src/common/update.rs
  • docs/redoc/master/openapi.json
  • src/common/inference/update_requests.rs
  • lib/collection/src/operations/verification/query.rs
🧰 Additional context used
🧠 Learnings (1)
lib/segment/src/vector_storage/vector_storage_base.rs (1)
Learnt from: generall
PR: qdrant/qdrant#6323
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:12-16
Timestamp: 2025-04-07T23:31:22.515Z
Learning: The `populate()` method in `QuantizedMmapStorage` correctly returns void instead of a `Result` because it directly implements the `Madviseable` trait which defines `populate(&self)` without a return type. Adding an unnecessary `Ok(())` return would trigger Clippy warnings about unnecessary Result wrapping.
🧬 Code Graph Analysis (17)
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (1)
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (1)
  • new_volatile_sparse_vector_storage (34-36)
lib/collection/src/shards/replica_set/telemetry.rs (1)
src/actix/api/service_api.rs (1)
  • telemetry (36-59)
lib/segment/src/index/hnsw_index/graph_layers.rs (1)
lib/segment/src/index/hnsw_index/search_context.rs (1)
  • new (16-21)
tests/consensus_tests/test_force_delete_peer.py (2)
tests/consensus_tests/fixtures.py (1)
  • upsert_random_points (83-129)
tests/consensus_tests/utils.py (1)
  • replicate_shard (793-802)
tests/consensus_tests/test_shard_wal_delta_transfer.py (1)
tests/consensus_tests/utils.py (1)
  • wait_for_collection_shard_transfers_count (616-622)
lib/collection/src/collection/snapshots.rs (3)
lib/collection/src/shards/replica_set/snapshots.rs (1)
  • try_take_partial_snapshot_recovery_lock (70-74)
lib/collection/src/shards/shard_holder/mod.rs (1)
  • try_take_partial_snapshot_recovery_lock (1072-1088)
src/common/snapshots.rs (1)
  • try_take_partial_snapshot_recovery_lock (318-337)
lib/collection/src/shards/telemetry.rs (1)
lib/segment/src/common/anonymize.rs (13)
  • anonymize (45-45)
  • anonymize (49-51)
  • anonymize (55-57)
  • anonymize (61-63)
  • anonymize (67-71)
  • anonymize (75-79)
  • anonymize (115-119)
  • anonymize (123-127)
  • anonymize (131-135)
  • anonymize (139-148)
  • anonymize (152-154)
  • anonymize (158-162)
  • anonymize (166-181)
src/actix/api/snapshot_api.rs (4)
lib/collection/src/shards/replica_set/snapshots.rs (1)
  • try_take_partial_snapshot_recovery_lock (70-74)
lib/collection/src/shards/shard_holder/mod.rs (1)
  • try_take_partial_snapshot_recovery_lock (1072-1088)
src/common/snapshots.rs (1)
  • try_take_partial_snapshot_recovery_lock (318-337)
src/actix/helpers.rs (2)
  • process_response_error (75-101)
  • time (108-114)
lib/posting_list/src/value_handler.rs (2)
lib/posting_list/src/lib.rs (1)
  • from_bytes (29-29)
lib/posting_list/src/tests.rs (1)
  • from_bytes (22-25)
lib/storage/src/content_manager/toc/mod.rs (1)
src/actix/api/service_api.rs (1)
  • telemetry (36-59)
lib/collection/src/shards/replica_set/snapshots.rs (4)
lib/collection/src/collection/snapshots.rs (1)
  • try_take_partial_snapshot_recovery_lock (340-349)
lib/collection/src/shards/shard_holder/mod.rs (1)
  • try_take_partial_snapshot_recovery_lock (1072-1088)
src/common/snapshots.rs (1)
  • try_take_partial_snapshot_recovery_lock (318-337)
lib/collection/src/shards/replica_set/partial_snapshot_meta.rs (2)
  • take_search_write_lock (56-58)
  • try_take_search_read_lock (60-70)
lib/posting_list/src/iterator.rs (3)
lib/posting_list/src/posting_list.rs (1)
  • visitor (82-85)
lib/posting_list/src/view.rs (2)
  • visitor (68-70)
  • len (200-202)
lib/posting_list/src/visitor.rs (2)
  • new (22-28)
  • len (30-32)
lib/api/src/rest/models.rs (2)
lib/segment/src/common/anonymize.rs (13)
  • anonymize (45-45)
  • anonymize (49-51)
  • anonymize (55-57)
  • anonymize (61-63)
  • anonymize (67-71)
  • anonymize (75-79)
  • anonymize (115-119)
  • anonymize (123-127)
  • anonymize (131-135)
  • anonymize (139-148)
  • anonymize (152-154)
  • anonymize (158-162)
  • anonymize (166-181)
lib/api/src/grpc/ops.rs (1)
  • is_empty (26-33)
lib/posting_list/src/posting_list.rs (4)
lib/posting_list/src/view.rs (1)
  • visitor (68-70)
lib/posting_list/src/iterator.rs (1)
  • new (16-22)
lib/posting_list/src/visitor.rs (1)
  • new (22-28)
lib/posting_list/src/builder.rs (1)
  • new (25-27)
tests/consensus_tests/test_partial_snapshot.py (4)
tests/consensus_tests/utils.py (1)
  • assert_project_root (110-113)
src/actix/api/snapshot_api.rs (1)
  • recover_partial_snapshot_from (676-779)
tests/consensus_tests/assertions.py (1)
  • assert_http_ok (6-13)
tests/consensus_tests/fixtures.py (1)
  • random_dense_vector (19-20)
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (3)
lib/segment/src/vector_storage/vector_storage_base.rs (29)
  • size_of_available_vectors_in_bytes (152-154)
  • size_of_available_vectors_in_bytes (181-181)
  • size_of_available_vectors_in_bytes (469-540)
  • distance (47-47)
  • distance (644-691)
  • datatype (49-49)
  • datatype (693-740)
  • total_vector_count (56-56)
  • total_vector_count (793-840)
  • get_vector (71-71)
  • get_vector (842-889)
  • get_vector_sequential (74-74)
  • get_vector_sequential (891-938)
  • get_vector_opt (77-77)
  • get_vector_opt (940-987)
  • insert_vector (79-84)
  • insert_vector (989-1065)
  • update_from (92-96)
  • update_from (1067-1134)
  • flusher (98-98)
  • flusher (1136-1183)
  • files (100-100)
  • files (1185-1232)
  • delete_vector (109-109)
  • delete_vector (1234-1281)
  • is_deleted_vector (112-112)
  • is_deleted_vector (1283-1330)
  • deleted_vector_bitslice (134-134)
  • deleted_vector_bitslice (1381-1428)
lib/segment/src/common/operation_error.rs (1)
  • check_process_stopped (242-247)
lib/segment/src/data_types/named_vectors.rs (1)
  • default_sparse (99-101)
lib/segment/src/segment_constructor/segment_constructor_base.rs (3)
lib/segment/src/common/operation_error.rs (2)
  • check_process_stopped (242-247)
  • service_error (84-89)
lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (1)
  • open_simple_sparse_vector_storage (41-77)
lib/segment/src/id_tracker/immutable_id_tracker.rs (1)
  • mappings_file_path (345-347)
🪛 Ruff (0.11.9)
tests/consensus_tests/test_force_delete_peer.py

42-42: replicate_shard may be undefined, or defined from star imports

(F405)

tests/consensus_tests/test_shard_wal_delta_transfer.py

205-205: wait_for_collection_shard_transfers_count may be undefined, or defined from star imports

(F405)


304-304: wait_for_collection_shard_transfers_count may be undefined, or defined from star imports

(F405)


325-325: wait_for_collection_shard_transfers_count may be undefined, or defined from star imports

(F405)


453-453: wait_for_collection_shard_transfers_count may be undefined, or defined from star imports

(F405)

tests/consensus_tests/test_partial_snapshot.py

36-36: assert_project_root may be undefined, or defined from star imports

(F405)


47-47: assert_project_root may be undefined, or defined from star imports

(F405)


128-128: assert_http_ok may be undefined, or defined from star imports

(F405)


142-142: assert_http_ok may be undefined, or defined from star imports

(F405)


148-148: random_dense_vector may be undefined, or defined from star imports

(F405)

🔇 Additional comments (160)
lib/gridstore/benches/real_data_bench.rs (1)

2-2: LGTM! Good modernization of black_box import.

The change from criterion::black_box to std::hint::black_box is a positive modernization that reduces dependency on the criterion crate. The black_box function became part of the standard library in Rust 1.66, making this the preferred approach for benchmarking.

Also applies to: 7-7

lib/gridstore/benches/bitmask_bench.rs (1)

1-1: LGTM! Consistent modernization across benchmark files.

The import change maintains consistency with the other benchmark files mentioned in the AI summary. Using std::hint::black_box is the modern standard approach and reduces external dependencies.

Also applies to: 4-4

lib/segment/src/spaces/simple_avx.rs (1)

12-15: LGTM! Clean removal of redundant unsafe blocks.

The horizontal sum implementation logic is correct and follows the standard pattern for reducing AVX vectors to scalars. Removing the redundant inner unsafe blocks improves code clarity since the entire function is already marked as unsafe.

lib/segment/src/spaces/simple_sse.rs (1)

14-16: LGTM! Consistent cleanup matching the AVX version.

The SSE horizontal sum implementation is correct and the removal of redundant inner unsafe blocks is consistent with the similar cleanup performed in the AVX version. This improves code consistency across SIMD implementations.

lib/segment/src/index/field_index/full_text_index/text_index.rs (1)

448-787: Test infrastructure looks comprehensive but unrelated to PR objectives.

This test module addition provides thorough testing for full-text index implementations across different storage backends. However, these changes appear unrelated to the PR's stated objective of "Add inference usage tracking to REST API".

The test implementation is well-structured with good coverage of different index types and scenarios.

Likely an incorrect or invalid review comment.

config/config.yaml (1)

241-244: New collection limit configuration is unrelated to PR objectives.

The max_collections parameter addition is a useful feature for limiting collection creation, but it doesn't align with the PR's stated goal of adding inference usage tracking to the REST API.

The configuration follows the established pattern and is properly documented.

Likely an incorrect or invalid review comment.

lib/storage/src/content_manager/conversions.rs (1)

48-48: Error handling addition is correct but unrelated to PR objectives.

The handling of StorageError::ShardUnavailable with tonic::Code::Unavailable is the appropriate gRPC status code for this error type. However, this change doesn't relate to the PR's stated purpose of adding inference usage tracking.

The implementation follows the existing pattern correctly.

Likely an incorrect or invalid review comment.

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (1)

219-219: LGTM! Good refactoring to consolidate shared constants.

Moving SPARSE_VECTOR_DISTANCE to the parent module eliminates duplication and provides a single source of truth for this constant across sparse vector storage implementations.

lib/storage/src/types.rs (1)

107-109: Well-implemented configuration addition.

The new max_collections field follows best practices with proper serde annotations, clear documentation, and appropriate typing using Option<usize> for optional configuration.

lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (1)

177-177: Consistent with the sparse vector storage refactoring.

This change aligns with the consolidation of SPARSE_VECTOR_DISTANCE to the parent module, maintaining consistency across sparse vector storage implementations.

tests/openapi/test_service.py (2)

87-87: LGTM! Test correctly updated for new telemetry data.

The addition of 'partial_snapshot' to the expected shard keys appropriately reflects the new partial snapshot telemetry functionality mentioned in the AI summary.


87-87: Inconsistency between PR objectives and file changes.

The PR title mentions "Add inference usage tracking to REST API" but this file contains changes related to partial snapshot telemetry, not inference usage tracking. Please verify this change belongs in this PR.

Likely an incorrect or invalid review comment.

lib/common/common/src/flags.rs (3)

25-26: LGTM! Feature flag properly implemented.

The new hnsw_healing feature flag follows the established pattern with clear documentation and appropriate conservative default.


25-26: Inconsistency with PR objectives.

This change adds HNSW healing functionality, which doesn't align with the stated PR objective of "Add inference usage tracking to REST API". Please verify this change belongs in this PR.

Likely an incorrect or invalid review comment.


35-35: Implementation correctly integrated across all required locations.

The feature flag is properly:

  • Given a conservative default value of false
  • Included in the destructuring pattern for initialization
  • Enabled when the all flag is set

Also applies to: 54-54, 61-61

lib/segment/src/vector_storage/common.rs (3)

3-4: LGTM! Proper conditional compilation for RocksDB features.

The serde imports are correctly gated behind the "rocksdb" feature flag, ensuring serialization traits are only available when needed.


17-18: Appropriate feature gating for RocksDB-specific types.

The StoredRecord<T> struct is correctly made conditional on the "rocksdb" feature, aligning with the modular build approach for RocksDB dependencies.


3-4: Inconsistency with PR objectives.

These changes relate to RocksDB feature gating, not inference usage tracking as stated in the PR objectives. Please verify this change belongs in this PR.

Also applies to: 17-18

Likely an incorrect or invalid review comment.

lib/collection/src/shards/replica_set/execute_read_operation.rs (3)

125-125: LGTM! Proper concurrency control for partial snapshot operations.

The addition of the partial snapshot search read lock appropriately prevents read operations during snapshot recovery phases, following RAII pattern with the lock guard.


160-160: Consistent locking implementation across read operation paths.

The partial snapshot search read lock is correctly applied in both local and cluster read operation code paths, ensuring consistent concurrency control.


125-125: Inconsistency with PR objectives.

These changes implement partial snapshot locking mechanisms, which don't align with the stated PR objective of "Add inference usage tracking to REST API". Please verify this change belongs in this PR.

Also applies to: 160-160

Likely an incorrect or invalid review comment.

lib/collection/src/collection/snapshots.rs (1)

340-349: Well-implemented delegation pattern.

The new method correctly delegates to the shards_holder and follows the established pattern seen in other snapshot-related methods in this file. The async signature and return type are consistent with the broader snapshot recovery locking mechanism.

lib/segment/src/segment_constructor/segment_builder.rs (1)

486-487: Proper feature flag usage for RocksDB ID tracker.

The conditional compilation for the RocksDB ID tracker match arm is correctly implemented and aligns with the broader effort to gate RocksDB-dependent code behind feature flags.

lib/segment/src/index/hnsw_index/search_context.rs (1)

16-21: Good simplification that improves separation of concerns.

Removing the entry_point parameter from the constructor and initializing empty collections is a clean improvement. This separates the creation of the search context from the insertion of the initial candidate, making the API more intuitive and flexible.

lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (2)

19-19: LGTM: Import addition for volatile storage

The import correctly adds the function needed for volatile sparse vector storage.


136-136: LGTM: Transition to volatile storage for testing

This change replaces the RocksDB-backed storage with volatile in-memory storage, which improves test isolation and removes external dependencies. The volatile storage provides the same interface, so the test logic remains unaffected.

lib/collection/src/tests/mod.rs (4)

40-40: LGTM: Import addition for disk-backed storage

The import correctly adds SaveOnDisk for the new payload schema handling approach.


243-245: LGTM: Transition to disk-backed schema with shared ownership

This change improves the architecture by using a thread-safe, disk-backed payload schema wrapped in Arc<SaveOnDisk<PayloadIndexSchema>>. This provides better persistence and shared access compared to the previous in-memory approach.


266-266: LGTM: Consistent usage of cloned Arc references

The calls correctly use .clone() to share the Arc-wrapped schema instead of passing references, which aligns with the new shared ownership model.

Also applies to: 277-277, 322-322


243-322: Note: Changes appear unrelated to PR objectives

These changes focus on payload schema handling refactoring, but the PR objectives mention "Add inference usage tracking to REST API". This might indicate that the AI summary or PR objectives don't accurately reflect the actual changes being made.

Likely an incorrect or invalid review comment.

lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs (2)

493-493: LGTM: Import addition for volatile storage

The import correctly adds the function needed for volatile dense vector storage in tests.


528-528: LGTM: Transition to volatile storage for GPU tests

This change replaces RocksDB-backed storage with volatile in-memory storage for GPU testing, which simplifies the test setup and removes external dependencies. The parameters (dimension and distance metric) are correctly preserved.

lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (2)

318-318: LGTM: Import addition for volatile storage

The import correctly adds the function needed for volatile dense vector storage in tests.


349-349: LGTM: Consistent transition to volatile storage across all tests

All test functions now use new_volatile_dense_vector_storage(4, Distance::Dot) instead of RocksDB-backed storage. This change:

  • Improves test isolation by removing external dependencies
  • Maintains the same test functionality with simplified setup
  • Uses consistent parameters (dimension=4, Distance::Dot) matching the test data
  • Follows the same refactoring pattern applied across the codebase

The test logic remains unchanged, ensuring the same coverage while improving test performance and reliability.

Also applies to: 380-380, 442-442, 558-558, 631-631, 705-705

lib/segment/src/fixtures/payload_fixtures.rs (1)

112-128: LGTM! Well-implemented test fixture function.

The random_full_text_payload function follows established patterns in the file and correctly generates random alphabetic strings for testing purposes. The implementation properly uses the random generator, respects the specified ranges, and returns the expected Vec format.

lib/collection/src/shards/telemetry.rs (2)

24-25: LGTM! Proper integration of partial snapshot telemetry.

The optional partial_snapshot field is correctly added with appropriate serialization attributes. The skip_serializing_if ensures clean JSON output when the field is not present.


78-84: LGTM! Well-designed telemetry struct.

The PartialSnapshotTelemetry struct is properly designed with:

  • Meaningful field names that clearly indicate their purpose
  • Appropriate types (usize for counts, u64 for timestamps)
  • Correct anonymization settings for telemetry data
  • All necessary derive macros for serialization and schema generation
lib/segment/src/index/hnsw_index/graph_layers.rs (1)

106-107: LGTM! Clean refactoring of SearchContext initialization.

The separation of SearchContext creation from initial candidate insertion improves code clarity. This change aligns with the updated SearchContext::new constructor that now only accepts the ef parameter, as shown in the relevant code snippet from search_context.rs.

tests/consensus_tests/test_shard_wal_delta_transfer.py (4)

205-206: LGTM! Consistent API migration to count-based waiting.

The replacement of wait_for_collection_shard_transfer_progress with wait_for_collection_shard_transfers_count follows a clear pattern: first wait for the transfer to start (count=1), then wait for completion (count=0). This approach is more reliable for testing transfer lifecycle.

🧰 Tools
🪛 Ruff (0.11.9)

205-205: wait_for_collection_shard_transfers_count may be undefined, or defined from star imports

(F405)


206-206: wait_for_collection_shard_transfers_count may be undefined, or defined from star imports

(F405)


304-305: LGTM! Consistent with the established pattern.

Same clean API migration pattern as seen in previous test functions.

🧰 Tools
🪛 Ruff (0.11.9)

304-304: wait_for_collection_shard_transfers_count may be undefined, or defined from star imports

(F405)


305-305: wait_for_collection_shard_transfers_count may be undefined, or defined from star imports

(F405)


325-326: LGTM! Maintains consistency across chain recovery test.

The API migration is consistently applied in the chain recovery scenario.

🧰 Tools
🪛 Ruff (0.11.9)

325-325: wait_for_collection_shard_transfers_count may be undefined, or defined from star imports

(F405)


326-326: wait_for_collection_shard_transfers_count may be undefined, or defined from star imports

(F405)


453-454: LGTM! Final consistent application of the API change.

The abort and retry test also follows the same reliable pattern for monitoring transfer progress.

Note: The static analysis warnings about wait_for_collection_shard_transfers_count being undefined are false positives - the function is properly imported via the star import from .utils and is defined in the utils module as shown in the relevant code snippets.

🧰 Tools
🪛 Ruff (0.11.9)

453-453: wait_for_collection_shard_transfers_count may be undefined, or defined from star imports

(F405)


454-454: wait_for_collection_shard_transfers_count may be undefined, or defined from star imports

(F405)

lib/segment/src/id_tracker/mod.rs (2)

7-8: LGTM! Proper feature gating for RocksDB dependency.

The conditional compilation correctly gates the simple_id_tracker module behind the "rocksdb" feature, which aligns with the broader pattern of isolating RocksDB-dependent code.


74-81: LGTM! Efficient conversion implementation.

The From<&ExtendedPointId> for PointIdType implementation is well-designed:

  • Takes a reference to avoid unnecessary cloning
  • Properly handles both NumId and Uuid variants
  • Follows Rust conversion conventions
lib/collection/src/shards/replica_set/telemetry.rs (1)

30-35: LGTM! Well-structured partial snapshot telemetry integration.

The implementation correctly:

  • Wraps the telemetry data in Some() to match the optional field type
  • Accesses the appropriate fields from partial_snapshot_meta
  • Follows the existing telemetry collection pattern
src/common/telemetry_ops/collections_telemetry.rs (2)

20-22: LGTM! Proper field configuration for telemetry.

The max_collections field is correctly configured:

  • Option<usize> type allows for absent values
  • skip_serializing_if = "Option::is_none" prevents null values in JSON output
  • #[anonymize(false)] appropriately excludes this configuration value from anonymization

50-54: LGTM! Clean integration of max collections telemetry.

The implementation correctly retrieves the max collections value from the TOC and includes it in the telemetry struct, following the established pattern for other telemetry fields.

lib/storage/src/content_manager/snapshots/recover.rs (1)

151-160: LGTM! Proper payload index schema loading with error handling.

The implementation correctly:

  • Uses the established SaveOnDisk pattern for loading configuration
  • Provides descriptive error messages with the file path
  • Follows the existing error handling patterns in the codebase
lib/collection/src/shards/local_shard/mod.rs (1)

409-409: LGTM: Simplified payload index schema usage

The change from explicit read-lock acquisition and schema cloning to cloning the entire Arc<SaveOnDisk<PayloadIndexSchema>> wrapper simplifies the API usage. This aligns with the broader refactoring mentioned in the summary.

lib/segment/src/vector_storage/tests/async_raw_scorer.rs (2)

16-16: Good improvement: Using volatile storage for tests

Adding the import for volatile dense vector storage improves test isolation by removing RocksDB dependencies.


62-70: Excellent test improvement: Replaced RocksDB with volatile storage

The changes to use volatile in-memory storage instead of RocksDB-backed storage provide several benefits:

  • Faster test execution (no disk I/O)
  • Better test isolation (no temporary directory management)
  • Reduced external dependencies

The test logic remains unchanged while improving reliability and performance.

lib/segment/Cargo.toml (3)

15-15: Good: Added RocksDB to default features

Including rocksdb in the default feature set maintains backward compatibility while allowing selective disabling.


138-138: Consistent benchmark feature gating

Properly requiring the rocksdb feature for benchmarks that depend on RocksDB ensures they only run when the feature is enabled. This prevents build failures when RocksDB is disabled.

Also applies to: 187-187, 192-192


133-133:

Details

❓ Verification inconclusive

Minor dependency update: io-uring version bump

Updated from 0.7.6 to 0.7.7. This appears to be a patch version update.


🌐 Web query:

What are the changes between io-uring 0.7.6 and 0.7.7? Are there any breaking changes or security fixes?

💡 Result:

Based on the search results, there is no specific information about changes between io-uring versions 0.7.6 and 0.7.7. The search results contain information about the io_uring Linux API and various other software projects, but none directly address version changes for an io-uring library at those specific versions.

The search results mention:

  1. A low-level io_uring userspace interface for Rust on GitHub[1], but without version-specific changelog information.
  2. General documentation about the io_uring Linux API[2][6].
  3. Changelogs from other projects like axum[3], netaddr[5], and iodine[8], but not for the io-uring library specifically.
  4. Performance characteristics of io_uring in Proxmox environments[4].
  5. A security issue with io_uring reported in April 2025 about a security blindspot allowing rootkit attacks[7], but this isn't tied to specific library versions.

Without specific changelog information for io-uring versions 0.7.6 and 0.7.7, I cannot provide details about changes, breaking changes, or security fixes between these versions. The search results don't contain this information.

Citations:


Verify upstream CHANGELOG for io-uring 0.7.7

This bump from 0.7.6 → 0.7.7 appears to be a patch-level update with no documented breaking changes in publicly available sources. However, I couldn’t locate a dedicated changelog entry for 0.7.7. Please review the io-uring repository’s CHANGELOG or release notes (e.g. https://github.com/tokio-rs/io-uring/blob/main/CHANGELOG.md) to confirm there are no unexpected breaking changes or security fixes before merging.

lib/segment/src/vector_storage/raw_scorer.rs (4)

57-67: Excellent: Consistent feature gating for dense vector storage

The conditional compilation properly gates:

  • RocksDB-backed storage variants (DenseSimple*) behind the rocksdb feature
  • Volatile storage variants (DenseVolatile*) behind test configuration

This pattern enables partial RocksDB disabling while maintaining test coverage.


106-109: Consistent sparse vector storage gating

The same feature gating pattern is applied to sparse vector storage, maintaining consistency across the codebase.


111-122: Complete multi-dense vector storage feature gating

All multi-dense vector storage variants are properly gated with the same consistent pattern:

  • MultiDenseSimple* variants require the rocksdb feature
  • MultiDenseVolatile* variants are test-only

This completes the feature gating for all vector storage types.


51-142: Flag inconsistency with PR objectives

The changes in this file (and the entire PR) appear to be about RocksDB feature gating and payload schema improvements, but the PR title indicates "Add inference usage tracking to REST API". There seems to be a mismatch between the stated objectives and the actual changes.

Likely an incorrect or invalid review comment.

src/actix/api/snapshot_api.rs (5)

40-40: LGTM! Import addition supports the new locking mechanism.

The import for try_take_partial_snapshot_recovery_lock is correctly added to support the partial snapshot recovery concurrency control.


613-621: Excellent concurrency control implementation.

The early acquisition of the partial snapshot recovery lock prevents concurrent partial snapshot recoveries on the same shard. The error handling immediately returns an HTTP error response if lock acquisition fails, which is appropriate for the REST API context.


624-624: Good RAII pattern for lock management.

Using _recovery_lock with an underscore prefix correctly indicates the variable is intentionally unused but held for its RAII properties. This ensures the lock is held for the entire duration of the async recovery operation.


691-704: Consistent lock acquisition pattern.

The same lock acquisition and error handling pattern is correctly applied to the recover_partial_snapshot_from endpoint, maintaining consistency across both partial snapshot recovery endpoints.


707-707: Proper lock holding during recovery.

The lock is correctly held during the async recovery operation using the same RAII pattern as the other endpoint.

tests/consensus_tests/test_force_delete_peer.py (2)

31-34: Improved mapping for convenience.

The change from peer_id_to_url to peer_url_to_id makes the subsequent code more readable by directly mapping URLs to IDs, which is how they're used in the test.


47-47: Updated comment reflects the simplified logic.

The comment correctly describes the expected behavior after the test simplification.

src/common/snapshots.rs (2)

9-9: Appropriate imports for the new functionality.

The added imports support the new try_take_partial_snapshot_recovery_lock function:

  • VerificationPass for access verification
  • Dispatcher for TOC access
  • OwnedMutexGuard for the return type

Also applies to: 17-17, 19-19


318-337: Well-implemented recovery lock acquisition function.

The function follows excellent patterns:

  1. Proper access control: Checks global manage access before proceeding
  2. Clear separation of concerns: Uses dispatcher → TOC → collection hierarchy
  3. Appropriate error propagation: Returns StorageError for consistent error handling
  4. Good return type: Option<OwnedMutexGuard<()>> allows callers to handle lock unavailability

The implementation integrates cleanly with the existing codebase architecture and provides the necessary concurrency control for partial snapshot recovery operations.

lib/segment/src/index/hnsw_index/gpu/mod.rs (2)

111-111: LGTM! Improved test isolation.

The switch from RocksDB-backed storage to volatile storage in tests improves test isolation and reduces external dependencies while maintaining the same interface for testing GPU index functionality.


134-134: LGTM! Consistent with test storage simplification.

The use of volatile dense vector storage aligns with the import change and maintains test functionality while removing RocksDB dependency.

lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs (3)

27-27: LGTM! Appropriate import change for test simplification.

Switching to volatile dense vector storage import aligns with the goal of removing RocksDB dependencies from tests.


56-58: LGTM! Simplified test storage creation.

The ram_storage function is now much cleaner and directly creates volatile storage without needing temporary directories or RocksDB setup. The unused _dir parameter maintains the function signature for compatibility.


118-118: LGTM! Consistent with storage simplification.

Using volatile storage in the main test function aligns with the other changes and maintains scoring equivalency validation while removing external dependencies.

lib/segment/src/vector_storage/quantized/quantized_vectors.rs (4)

242-265: LGTM! Proper conditional compilation for RocksDB storage variants.

The conditional compilation correctly gates RocksDB-dependent storage variants behind the rocksdb feature flag, maintaining clean separation of storage backends.


254-265: LGTM! Appropriate test-only compilation for volatile storage.

The volatile storage variants are correctly gated behind #[cfg(test)], making them available only during testing which aligns with their intended purpose.


234-341: Flag inconsistency between PR objectives and actual changes.

The PR is titled "Add inference usage tracking to REST API" but the actual changes implement conditional compilation for RocksDB features and vector storage modularization. This appears to be either a mislabeled PR or the wrong changeset.

Likely an incorrect or invalid review comment.


293-321:

Details

✅ Verification successful

Verify exhaustive pattern matching with conditional compilation.

The systematic conditional compilation of vector storage variants looks correct, but ensure that all possible enum variants are handled when different feature combinations are enabled/disabled.


🏁 Script executed:

#!/bin/bash
# Check if VectorStorageEnum definition matches the conditional compilation patterns
ast-grep --pattern 'enum VectorStorageEnum {
  $$$
}'

# Also check for any compilation warnings about unreachable patterns
rg "unreachable_patterns" lib/segment/src/vector_storage/

Length of output: 113


🏁 Script executed:

#!/bin/bash
# Locate the file defining VectorStorageEnum
echo "Enum definition file:"
rg -l "enum VectorStorageEnum" -n lib/segment/src/vector_storage

# Display enum definition and its variants
enum_file=$(rg -l "enum VectorStorageEnum" -n lib/segment/src/vector_storage)
echo -e "\nEnum variants in $enum_file:"
awk '/enum VectorStorageEnum/,/}/' "$enum_file"

# Display all match arms for VectorStorageEnum in quantized_vectors.rs
echo -e "\nMatch arms in quantized_vectors.rs:"
rg -n "VectorStorageEnum::" -A0 lib/segment/src/vector_storage/quantized/quantized_vectors.rs

Length of output: 7402


🏁 Script executed:

#!/bin/bash
# Show match block in quantized_vectors.rs for exhaustive matching
echo "Lines 200-400 of quantized_vectors.rs:"
sed -n '200,400p' lib/segment/src/vector_storage/quantized/quantized_vectors.rs

Length of output: 8888


Exhaustive matching confirmed for VectorStorageEnum
All enum variants in VectorStorageEnum are covered by the create() match—each with the correct #[cfg] guard corresponding to its definition. Rust’s compiler enforces exhaustiveness, and there are no missing or unreachable arms under any feature combination. No further changes required.

lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (4)

16-19: LGTM! Appropriate feature gating for RocksDB dependency.

The conditional import and new volatile storage import properly support testing without RocksDB dependency.


125-125: Good change to use volatile storage in the test helper.

Using volatile storage in this shared test helper function is appropriate since it's now used by both RocksDB-backed and volatile storage tests. This makes the test lighter weight and removes unnecessary RocksDB dependency from the helper.


333-333: Correct feature gating for RocksDB-dependent tests.

All tests using open_simple_dense_vector_storage are properly gated with the rocksdb feature flag.

Also applies to: 363-363, 392-392, 421-421


451-474: Well-structured volatile storage tests.

The new test functions properly mirror the RocksDB tests, ensuring equivalent test coverage for volatile storage implementation. Good reuse of existing test helpers.

lib/collection/src/update_handler.rs (1)

428-428:

Details

✅ Verification successful

API change looks good - ownership semantics updated for better concurrency.

The parameter type change from &PayloadIndexSchema to Arc<SaveOnDisk<PayloadIndexSchema>> enables shared ownership which is beneficial for concurrent access patterns. The Arc clone on line 613 is appropriate.

Let me verify that all callers of this method have been updated:

Also applies to: 613-613


🏁 Script executed:

#!/bin/bash
# Description: Verify all callers of ensure_appendable_segment_with_capacity have been updated to pass Arc

# Search for calls to ensure_appendable_segment_with_capacity
rg -A 5 "ensure_appendable_segment_with_capacity" --type rust

Length of output: 2270


API change approved: ownership semantics and callers updated

All callers of ensure_appendable_segment_with_capacity now pass an Arc<SaveOnDisk<PayloadIndexSchema>>—using .clone() where the schema is reused and the original Arc in the final invocation. No further changes are required.

• lib/collection/src/update_handler.rs: function signature and internal call updated
• lib/collection/src/tests/mod.rs: test invocations updated with .clone() or original Arc as appropriate

lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (3)

336-359: Proper conditional compilation for dense vector storage variants.

The RocksDB variants are correctly gated with the rocksdb feature flag, and the new volatile variants are appropriately restricted to test builds with #[cfg(test)]. The implementation maintains consistency across all element types.


399-402: Consistent error handling for sparse volatile storage.

The sparse volatile variant correctly returns the same "not supported" error as other sparse variants, maintaining API consistency.


406-435: Well-structured conditional compilation for multi-dense variants.

The multi-dense storage variants follow the same conditional compilation pattern as dense variants, ensuring consistency across the codebase.

lib/collection/src/shards/replica_set/snapshots.rs (3)

49-53: Good addition of concurrent request tracking for partial snapshots.

The guard properly tracks concurrent create_partial_snapshot requests only when needed (manifest is Some), enabling load balancing in the cluster manager. The guard is automatically dropped when the function exits.


70-84: Well-designed lock management methods for partial snapshot recovery.

The three new methods provide appropriate synchronization primitives:

  • try_take_partial_snapshot_recovery_lock: Prevents concurrent recoveries
  • take_search_write_lock: Blocks searches during recovery
  • try_take_search_read_lock: Allows searches to fail fast during recovery

The methods correctly delegate to the partial_snapshot_meta for actual lock management.


400-402: Proper notification of partial snapshot recovery completion.

The snapshot_recovered() call is correctly placed after successful recovery and only for partial recovery types. This allows the system to update internal state and release any recovery-related resources.

lib/api/src/rest/models.rs (3)

63-71: LGTM!

The is_empty method correctly checks both hardware and inference fields, maintaining consistency with the gRPC implementation.


73-83: LGTM!

The merge method efficiently combines inference usage data using the entry API pattern. The token counts are correctly summed for models that appear in both instances.


103-113: LGTM!

The new InferenceUsage and ModelUsage structs are well-designed with appropriate derives for serialization, comparison, and schema generation. The use of HashMap<String, ModelUsage> provides an efficient way to track usage per model.

lib/storage/src/content_manager/toc/telemetry.rs (1)

37-39: LGTM!

Simple and correct getter method for the max collections configuration.

lib/posting_list/src/value_handler.rs (3)

1-31: LGTM!

The ValueHandler trait provides a clean abstraction for handling both fixed-size and variable-size values in posting lists. The documentation clearly explains the purpose and behavior of each method.


33-50: LGTM!

The SizedHandler implementation correctly handles fixed-size values by returning them unchanged. The empty vector for var_sized_data is appropriate since no additional storage is needed.


52-102: LGTM!

The UnsizedHandler implementation is well-designed with:

  • Proper overflow checks for both individual values and total size
  • Efficient offset calculation using the windows pattern
  • Correct range handling for the last element
  • Clear error messages when limits are exceeded

The 4GB limit is reasonable for posting list data.

tests/consensus_tests/test_partial_snapshot.py (3)

35-45: LGTM! Well-designed concurrency test for recovery lock.

The test logic correctly verifies that multiple concurrent recovery requests are properly rejected during an ongoing recovery operation. The use of ThreadPoolExecutor with multiple workers and the assertion on status code 400 appropriately validates the locking mechanism.

🧰 Tools
🪛 Ruff (0.11.9)

36-36: assert_project_root may be undefined, or defined from star imports

(F405)


126-139: Good refactoring to support both blocking and non-blocking calls.

The separation of concerns between recover_partial_snapshot_from (blocking with assertion) and try_recover_partial_snapshot_from (non-blocking returning response) is well-designed and maintains backward compatibility.

🧰 Tools
🪛 Ruff (0.11.9)

128-128: assert_http_ok may be undefined, or defined from star imports

(F405)


146-155: Simple and effective helper function.

The try_search_random function appropriately uses the imported random_dense_vector() function and follows the same pattern as other HTTP request helpers in the module.

🧰 Tools
🪛 Ruff (0.11.9)

148-148: random_dense_vector may be undefined, or defined from star imports

(F405)

lib/segment/src/types.rs (4)

384-386: Correct conditional compilation for RocksDB dependency.

The feature gating of SparseVectorStorageType::OnDisk properly restricts RocksDB-dependent functionality to builds with the "rocksdb" feature enabled. This aligns with the modularization effort mentioned in the AI summary.

Also applies to: 398-399


418-427: Well-implemented feature-conditional default function.

The conditional compilation correctly handles the default sparse vector storage type based on feature availability. When RocksDB is available, it defaults to OnDisk; otherwise, it falls back to the standard default (Mmap).


2250-2260: Appropriate test-only convenience implementation.

The From<std::ops::Range<usize>> implementation for ValuesCount is properly restricted to test builds and provides a convenient way to create range-based value count filters for testing purposes.


2797-2806: Comprehensive and well-designed method for key extraction.

The targeted_key method correctly handles all Condition variants and provides a useful utility for query analysis. The implementation properly:

  • Extracts keys from field-based conditions (Field, IsEmpty, IsNull)
  • Handles nested conditions by converting to array keys
  • Recursively searches filter conditions
  • Returns None for non-key-based conditions (HasId, HasVector, CustomIdChecker)

This supports the enhanced strict mode verification mentioned in the AI summary.

lib/collection/src/collection_manager/optimizers/segment_optimizer.rs (3)

345-373: LGTM! Clean cancellation handling.

The method properly restores the original state by unwrapping proxies and handling the temp segment based on whether it contains data.


653-673: Well-structured refactoring that minimizes lock duration.

The extraction of the slow optimization part into optimize_segment_propagate_changes effectively reduces the time the write lock is held. The error handling properly restores state on failure.


718-823: Excellent separation of slow and fast optimization phases.

The method effectively isolates the CPU-intensive segment building from the critical section that requires the write lock. The optimization of tracking already removed points (lines 762-769) is a nice touch to minimize work in the critical section.

lib/collection/src/shards/shard_holder/mod.rs (2)

1072-1088: Clean conditional locking implementation.

The method correctly differentiates between full and partial recovery types, only acquiring locks when necessary for partial recovery.


1090-1107: Consistent implementation with recovery lock method.

The search write lock acquisition follows the same pattern, maintaining code consistency.

lib/posting_list/src/iterator.rs (1)

25-43: Well-implemented Iterator trait.

The use of inspect for state updates is clean, and the size_hint implementation correctly provides exact bounds.

Cargo.toml (2)

1-267: ⚠️ Potential issue

Critical: Changes don't match PR objectives

The PR is titled "Add inference usage tracking to REST API" but this Cargo.toml only shows changes related to posting lists, RocksDB features, and dependency updates. None of these changes appear to be related to inference usage tracking.

Likely an incorrect or invalid review comment.


14-14:

Details

❓ Verification inconclusive

Verify Rust 1.87 requirement

The Rust version requirement was bumped from 1.85 to 1.87. Please ensure this version is necessary and document what specific Rust 1.87 features are being used.


🌐 Web query:

What are the key features and improvements in Rust 1.87 compared to Rust 1.85?

💡 Result:

Rust 1.87, released on May 15, 2025, introduced several key features and improvements over Rust 1.85, which was released on February 20, 2025. Notable changes include:

Language Enhancements:

  • Stabilization of asm_goto Feature: The asm_goto feature has been stabilized, allowing for more flexible inline assembly in Rust code. (github.com)

  • Support for Unbounded Shift Operations: The language now supports unbounded left and right shift operations, enhancing bitwise manipulation capabilities. (github.com)

Compiler Updates:

  • SSE2 Requirement for i686 Targets: The compiler now requires SSE2 instructions for i686 targets, aligning with modern hardware capabilities. (github.com)

Platform Support:

  • Deprecation of i586-pc-windows-msvc Target: The i586-pc-windows-msvc target has been deprecated and will be removed in the next release, encouraging users to migrate to more widely supported targets. (github.com)

Standard Library Improvements:

  • Stabilization of Anonymous Pipe API: The anonymous pipe API has been stabilized, providing a more robust and efficient way to handle inter-process communication. (github.com)

  • Undeprecation of env::home_dir: The previously deprecated env::home_dir function has been undeprecated, restoring its availability for retrieving the user's home directory. (github.com)

Cargo and Tooling:

  • Terminal Integration via ANSI OSC 9;4 Sequences: Cargo now supports terminal integration through ANSI OSC 9;4 sequences, enhancing the terminal experience. (github.com)

  • Bump to OpenSSL v3: Cargo has updated its OpenSSL dependency to version 3, ensuring compatibility with the latest security standards. (github.com)

These updates in Rust 1.87 build upon the advancements introduced in Rust 1.85, including the stabilization of async closures and the 2024 Edition, further enhancing the language's performance, safety, and developer experience. (blog.rust-lang.org)

Citations:


Document and Confirm Rust 1.87 Features

The Rust MSRV was bumped from 1.85 to 1.87 in Cargo.toml (line 14). Please verify which of the following 1.87 enhancements your code now depends on and document them in your upgrade notes or changelog:

  • Stabilized asm_goto feature for inline assembly
  • Unbounded left/right shift operations
  • SSE2 requirement for i686 targets
  • Stabilized anonymous pipe API in the standard library
  • Undeprecation of env::home_dir
  • ANSI OSC 9;4 terminal integration via Cargo
  • Bump of OpenSSL dependency to v3

If none of these features are actually used, consider reverting the MSRV back to 1.85.

lib/posting_list/src/visitor.rs (2)

37-45: Good optimization with chunk caching

The lazy decompression and caching of chunks is a good performance optimization that avoids redundant decompression operations.


1-118: ⚠️ Potential issue

File unrelated to PR objectives

This new posting list visitor implementation is unrelated to the PR objective of "Add inference usage tracking to REST API". The code appears to be part of a different feature.

Likely an incorrect or invalid review comment.

lib/collection/src/collection_manager/holders/segment_holder.rs (3)

36-2116: ⚠️ Potential issue

Architectural change unrelated to PR objectives

This file updates the payload index schema handling to use Arc<SaveOnDisk<PayloadIndexSchema>> instead of direct references. While this is a good architectural improvement for concurrency and persistence, it's completely unrelated to the PR objective of adding inference usage tracking.

Likely an incorrect or invalid review comment.


998-999: Consistent pattern for schema parameter updates

The updates to use Arc<SaveOnDisk<PayloadIndexSchema>> are consistently applied across all relevant methods. The pattern of acquiring a read lock before accessing the schema is properly implemented.

Also applies to: 1075-1076, 1108-1109, 1153-1154


1139-1142:

Details

❓ Verification inconclusive

Verify read lock scope

The read lock on payload_index_schema is held while iterating and creating field indexes. Ensure this doesn't cause contention issues if the schema has many entries.


🏁 Script executed:

#!/bin/bash
# Check for other places where payload_index_schema read lock might be held for extended periods
rg -A 10 "payload_index_schema\.read\(\)" 

Length of output: 5809


Narrow RwLock guard scope on payload_index_schema

Currently in
lib/collection/src/collection_manager/holders/segment_holder.rs:L1139-1142

let payload_schema_lock = payload_index_schema.read();
for (key, schema) in payload_schema_lock.schema.iter() {
    segment.create_field_index(0, key, Some(schema), &hw_counter)?;
}

the read lock is held across every create_field_index call, which may block writers if the schema is large or index creation is slow. By contrast, other modules (e.g. search.rs, sharding_keys.rs) do:

let schema = self.payload_index_schema.read().schema.clone();

dropping the guard immediately after cloning.

To reduce contention, consider narrowing the lock’s scope, for example:

// clone the entire schema, then drop the guard
let schemas = payload_index_schema.read().schema.clone();
for (key, schema) in schemas.iter() {
    segment.create_field_index(0, key, Some(schema), &hw_counter)?;
}

or explicitly dropping payload_schema_lock before calling create_field_index.

lib/collection/src/shards/replica_set/partial_snapshot_meta.rs (2)

7-27: Excellent API flow documentation

The ASCII diagram clearly illustrates the locking sequence and state transitions during recovery. This is very helpful for understanding the concurrency model.


1-118: ⚠️ Potential issue

New concurrency control structure unrelated to PR objectives

This new partial snapshot metadata management is unrelated to the PR objective of "Add inference usage tracking to REST API". The implementation appears to be for a different feature entirely.

Likely an incorrect or invalid review comment.

lib/segment/src/index/hnsw_index/graph_layers_builder.rs (2)

354-358: Good parameter naming improvement!

The rename from levels to links_by_level better describes the data structure - a vector containing links organized by level.


387-388: Clean separation of construction and initialization.

The change to create SearchContext without an initial entry point and then add it via process_candidate provides better API flexibility.

lib/segment/src/index/hnsw_index/hnsw.rs (3)

1440-1442: Verify the feature flag logic for healing.

The code returns None when hnsw_healing is enabled in multiple places. This seems counterintuitive - shouldn't healing be performed when the flag is enabled?

Please verify that this logic is correct, or if it should be inverted.

Also applies to: 1488-1490, 1501-1503, 1516-1518


387-414: Efficient parallel point migration implementation.

The migration logic correctly partitions points and processes them in parallel using the thread pool. Good use of HwMeasurementAcc::disposable() for internal operations.


1527-1535: Reasonable threshold for healing decision.

The 10% missing points threshold is a good heuristic to decide between healing and rebuilding from scratch. The debug logging provides useful information for monitoring the decision.

lib/segment/src/id_tracker/simple_id_tracker.rs (2)

132-133: Good simplification using standard trait conversions.

Replacing custom helper methods with direct From trait usage improves code consistency and leverages Rust's standard conversion patterns.

Also applies to: 137-138


304-341: Complete set of conversion implementations.

The From trait implementations provide bidirectional conversions between ExtendedPointId and StoredPointId, handling both owned and borrowed variants properly. The panic for string conversions is appropriate since strings are internal-only.

lib/posting_list/src/lib.rs (1)

1-54: PR objectives mismatch: This posting list implementation doesn't align with "Add inference usage tracking to REST API"

The PR is titled "Add inference usage tracking to REST API" (#6563), but these files implement a compressed posting list data structure, which appears to be unrelated to the stated objectives. The AI summary mentions adding InferenceUsage fields to REST/gRPC APIs, but the actual code changes are about posting lists.

Please clarify if:

  1. This is the correct PR for these changes
  2. The PR title/objectives need to be updated
  3. These files were accidentally included

Likely an incorrect or invalid review comment.

lib/posting_list/src/tests.rs (1)

87-105: Excellent test coverage with edge cases!

The test suite comprehensively covers various lengths including boundaries around CHUNK_LEN, empty lists, and large lists. This property-based testing approach with the reference model validation is well-designed.

lib/posting_list/src/posting_list.rs (1)

45-59: Well-implemented edge case handling

The get_compressed_size method correctly handles the last chunk by using the total data length. This is a clean implementation.

lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (1)

22-56: LGTM! Well-structured volatile storage implementation.

The struct definition and constructors are properly designed for in-memory volatile storage with appropriate fields for vector data, deletion tracking, and distance metrics.

lib/posting_list/src/view.rs (2)

1-209: Well-designed posting list view implementation.

The overall design with lifetimes, phantom types, and separation of concerns between view and storage is excellent. The binary search implementations and iteration support are correctly implemented.


153-191:

Details

✅ Verification successful

Consider validating assumptions in release builds.

The method relies on debug_assert for range validation (lines 174-175), which won't run in release builds. Since the method is public within the crate, consider either:

  1. Promoting these to regular assertions
  2. Clearly documenting these as safety preconditions
  3. Returning None if the assumptions are violated

🏁 Script executed:

#!/bin/bash
# Show the definition of the public view struct to inspect its fields
rg -n '^pub struct' -A5 lib/posting_list/src/view.rs

# Show the find_chunk method with surrounding context
rg -n 'pub fn find_chunk' -C5 lib/posting_list/src/view.rs

Length of output: 1176


🏁 Script executed:

#!/bin/bash
# Locate debug_assert calls in find_chunk
rg -n 'debug_assert' -C3 lib/posting_list/src/view.rs

Length of output: 488


Precondition is already documented; no further checks needed

The doc comment on find_chunk (line 152) explicitly states that callers must supply an id within the posting list’s range, and the debug_assert!s enforce this during development without impacting release-build performance. Keeping the current design is acceptable.

lib/segment/src/vector_storage/vector_storage_base.rs (3)

13-26: Excellent feature-gated integration of volatile storage variants.

The conditional compilation directives are properly applied, ensuring:

  • RocksDB dependencies are optional via the "rocksdb" feature
  • Volatile storage is only available in test configurations
  • All storage types follow a consistent naming pattern

Also applies to: 186-265


542-590: Correct handling of populate() for volatile storage.

The volatile storage variants correctly skip population since they are not memory-mapped. The empty blocks with comments clearly indicate this intentional no-op behavior.


13-1429:

Details

❓ Verification inconclusive

Comprehensive and consistent integration of volatile storage throughout the codebase.

All trait methods and helper functions have been systematically updated to handle the new volatile storage variants. The feature gating pattern is consistently applied throughout.

Run the following script to verify all match expressions are complete:


🏁 Script executed:

#!/bin/bash
# Description: Verify that all match expressions on VectorStorageEnum handle all variants

# Search for match expressions on VectorStorageEnum and check if they handle volatile variants
ast-grep --pattern 'match $_ {
  $$$
  VectorStorageEnum::$_($_) => $_,
  $$$
}'

Length of output: 82


Manual verification required for complete match coverage

I attempted to confirm via AST grep that every match on VectorStorageEnum handles all variants, but the command yielded no results. To ensure no storage variant is left unhandled, please perform one of the following checks:

• List all enum variants and verify they appear in each match arm:

rg -o 'VectorStorageEnum::[A-Za-z][A-Za-z0-9_]+' -h lib/segment/src/vector_storage/vector_storage_base.rs | sort -u

• Inspect each impl VectorStorage for VectorStorageEnum (and the helper methods:

  • try_multi_vector_config
  • default_vector
  • size_of_available_vectors_in_bytes
  • populate
  • clear_cache
  • All trait methods in the VectorStorage impl)
    to ensure every variant above is matched.

All other changes look consistent and correctly include the new volatile and feature-gated variants. Please verify coverage of the VectorStorageEnum match expressions.

tests/openapi/test_strictmode.py (8)

165-165: Good naming improvement!

The function rename to test_strict_mode_unindexed_filter_keyword_read_validation better describes the test's specific focus on keyword filters.


220-303: Comprehensive test coverage for integer filter strict mode validation!

The test thoroughly covers all scenarios including index creation with different capabilities (lookup vs range), which is essential for validating the strict mode behavior with integer payloads.


1456-1475: Proper validation of formula expression with index requirement!

The addition of index creation for discount_price field correctly validates that formula expressions respect strict mode indexing requirements.


1796-1797: Consistent error message format updates.

The removal of "Forbidden:" prefix from error messages makes them more direct and concise.

Also applies to: 1811-1811, 1841-1841


1967-1967: Improved error messages with actionable guidance!

The new error message format provides clear help text, making it easier for users to understand what index type is needed.

Also applies to: 1982-1982, 2011-2011, 2026-2026


2121-2121: Comprehensive error messages for full scan restrictions!

The error messages clearly explain why the request is forbidden and provide multiple resolution options.

Also applies to: 2175-2175


2178-2235: Good test coverage for full scan with HNSW index toggling!

The test effectively validates that disabling HNSW indexing (m=0) triggers strict mode restrictions when search_allow_exact is false.


2236-2349: Excellent comprehensive test for multitenant indexing scenarios!

The test thoroughly validates the interaction between strict mode, various index types (keyword, multitenant, HNSW payload-based), and filtered queries. The progression through different configurations ensures proper behavior in complex multitenant setups.

lib/collection/src/problems/unindexed_field.rs (5)

17-17: Good use of strum for enum iteration!

The addition of strum and the rename to all_indexes() returning FieldIndexType better reflects the function's purpose.

Also applies to: 164-166


168-255: Excellent refactoring for more precise index capability representation!

The change from returning schemas to returning specific FieldIndexType values provides better granularity and makes the code's intent clearer. The distinction between capabilities (e.g., IntMatch vs IntRange) is particularly valuable.


529-542: Well-designed enum for index capabilities!

The FieldIndexType enum provides excellent granularity in representing different index capabilities, particularly the separation of match and range operations for numeric types.


544-591: Robust capability mapping implementation!

The schema_capabilities function correctly handles all schema types and their parametrized variants. The special handling for integer indexes with configurable lookup/range capabilities is particularly well implemented, including the debug assertion to ensure at least one capability is enabled.


610-628: Good test coverage for integer index capabilities!

The test properly validates that integer indexes with both lookup and range enabled correctly map to both IntMatch and IntRange capabilities.

lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (4)

26-30: Appropriate use of conditional compilation for RocksDB!

Making the RocksDB variant conditional on the feature flag allows for cleaner test builds when RocksDB is not needed.


78-124: Comprehensive match coverage for all storage variants!

The match statement properly handles all storage variants with appropriate conditional compilation directives. The consistent iteration logic across multi-dense variants ensures uniform behavior.


296-297: Proper use of cfg_attr for conditional test cases!

The conditional inclusion of RocksDB test cases using cfg_attr ensures tests adapt correctly to the build configuration.

Also applies to: 322-323, 350-351


375-434: Comprehensive test coverage for volatile storage!

The three test functions provide complete coverage for volatile multi-dense vector storage, mirroring the functionality tested for persistent storage types. The direct instantiation approach is appropriate for in-memory storage.

lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (4)

38-54: LGTM! Deletion flag management is well-implemented.

The method correctly prevents marking non-existent vectors as undeleted and properly maintains the deleted count with appropriate bounds checking.


157-169: Good validation with debug assertion.

The debug assertion for sorted sparse vector indices helps catch potential bugs during development.


171-187: Batch update implementation looks good.

Proper use of cancellation checking and error propagation throughout the update process.


197-204: ⚠️ Potential issue

Fix inverted deletion logic.

The logic for determining is_deleted is inverted. The set_deleted method returns the previous deleted state, so if it returns false (wasn't deleted), then we successfully deleted it and is_deleted should be true.

     fn delete_vector(&mut self, key: PointOffsetType) -> OperationResult<bool> {
-        let is_deleted = !self.set_deleted(key, true);
+        let was_deleted = self.set_deleted(key, true);
+        let is_deleted = !was_deleted;
         if is_deleted {
             let old_vector = self.get_sparse_opt(key).ok().flatten();
             self.update_stored(key, true, old_vector.as_ref());
         }
         Ok(is_deleted)
     }

Likely an incorrect or invalid review comment.

lib/segment/src/segment_constructor/segment_constructor_base.rs (3)

13-28: Well-structured conditional imports.

The RocksDB-specific imports are properly feature-gated, maintaining clean separation of concerns.


104-122: Good error handling for disabled RocksDB.

The function properly handles the case when Memory storage type is requested but RocksDB is disabled, providing a clear error message.


730-737: Good error handling for RocksDB operations.

Proper error conversion with descriptive context and safe fallback to default value.

lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs (4)

23-29: Clear metadata structure with helpful documentation.

The comment clarifying that fields count vectors (not dimensions) is valuable for understanding the data structure.


107-122: Consistent implementation with sparse storage.

The deletion flag management follows the same pattern as the sparse storage implementation, which is good for code consistency.


135-140: Good validation with descriptive error message.

The check for oversized vectors provides a clear, actionable error message including both the attempted size and the maximum allowed size.


302-305: ⚠️ Potential issue

Fix inverted deletion logic.

Same issue as in sparse storage - the logic for determining is_deleted is inverted.

     fn delete_vector(&mut self, key: PointOffsetType) -> OperationResult<bool> {
-        let is_deleted = !self.set_deleted(key, true);
+        let was_deleted = self.set_deleted(key, true);
+        let is_deleted = !was_deleted;
         Ok(is_deleted)
     }

Likely an incorrect or invalid review comment.

@n0x29a n0x29a force-pushed the inference-usage-rest branch from 0f109ff to 83df404 Compare May 23, 2025 09:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
docs/grpc/docs.md (5)

181-182: Fix list indentation in the Table of Contents
The new list items for InferenceUsage and InferenceUsage.ModelEntry are indented with 4 spaces, which violates the project's markdown lint rule (MD007 expects 2 spaces).
Apply this diff:

-    - [InferenceUsage](#qdrant-InferenceUsage)
-    - [InferenceUsage.ModelEntry](#qdrant-InferenceUsage-ModelEntry)
+  - [InferenceUsage](#qdrant-InferenceUsage)
+  - [InferenceUsage.ModelEntry](#qdrant-InferenceUsage-ModelEntry)
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

181-181: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


182-182: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


3177-3181: Add description for InferenceUsage.model field
The table entry for model in InferenceUsage is missing a description. Consider adding a brief explanation such as "Mapping of model name to its usage metrics".


3193-3196: Add descriptions for InferenceUsage.ModelEntry fields
Both key and value entries lack descriptions. Suggested descriptions:

  • key: "Model name"
  • value: "Usage metrics for the model"

3293-3296: Add description for ModelUsage.tokens
The tokens field is missing a description. Consider: "Number of tokens processed by the model."


4814-4815: Add description for Usage.inference field
The new inference field in the Usage message should include a description. For example: "Inference usage metrics including per-model token counts".

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0f109ff and 83df404.

📒 Files selected for processing (36)
  • .gitignore (1 hunks)
  • docs/grpc/docs.md (4 hunks)
  • docs/redoc/master/openapi.json (1 hunks)
  • lib/api/src/grpc/conversions.rs (1 hunks)
  • lib/api/src/grpc/ops.rs (2 hunks)
  • lib/api/src/grpc/proto/points.proto (1 hunks)
  • lib/api/src/grpc/qdrant.rs (1 hunks)
  • lib/api/src/rest/conversions.rs (2 hunks)
  • lib/api/src/rest/models.rs (3 hunks)
  • lib/collection/src/operations/types.rs (1 hunks)
  • lib/collection/src/operations/universal_query/collection_query.rs (3 hunks)
  • lib/collection/src/operations/verification/query.rs (2 hunks)
  • lib/storage/src/content_manager/errors.rs (5 hunks)
  • src/actix/api/collections_api.rs (6 hunks)
  • src/actix/api/count_api.rs (2 hunks)
  • src/actix/api/discovery_api.rs (4 hunks)
  • src/actix/api/facet_api.rs (2 hunks)
  • src/actix/api/local_shard_api.rs (5 hunks)
  • src/actix/api/query_api.rs (7 hunks)
  • src/actix/api/recommend_api.rs (6 hunks)
  • src/actix/api/retrieve_api.rs (6 hunks)
  • src/actix/api/search_api.rs (10 hunks)
  • src/actix/api/service_api.rs (3 hunks)
  • src/actix/api/shards_api.rs (2 hunks)
  • src/actix/api/snapshot_api.rs (2 hunks)
  • src/actix/api/update_api.rs (25 hunks)
  • src/actix/helpers.rs (5 hunks)
  • src/common/inference/infer_processing.rs (3 hunks)
  • src/common/inference/query_requests_grpc.rs (2 hunks)
  • src/common/inference/query_requests_rest.rs (15 hunks)
  • src/common/inference/service.rs (4 hunks)
  • src/common/inference/update_requests.rs (13 hunks)
  • src/common/update.rs (10 hunks)
  • src/tonic/api/points_api.rs (2 hunks)
  • src/tonic/api/query_common.rs (15 hunks)
  • src/tonic/api/update_common.rs (3 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/actix/api/snapshot_api.rs
🚧 Files skipped from review as they are similar to previous changes (33)
  • .gitignore
  • src/actix/api/facet_api.rs
  • src/common/inference/query_requests_grpc.rs
  • src/actix/api/local_shard_api.rs
  • src/actix/api/count_api.rs
  • src/actix/api/discovery_api.rs
  • lib/collection/src/operations/universal_query/collection_query.rs
  • src/tonic/api/points_api.rs
  • src/actix/api/shards_api.rs
  • src/actix/api/retrieve_api.rs
  • src/actix/api/recommend_api.rs
  • src/actix/api/service_api.rs
  • lib/api/src/grpc/ops.rs
  • lib/api/src/rest/conversions.rs
  • lib/collection/src/operations/types.rs
  • src/actix/api/collections_api.rs
  • src/tonic/api/query_common.rs
  • src/actix/api/search_api.rs
  • lib/api/src/grpc/conversions.rs
  • src/actix/api/query_api.rs
  • src/actix/api/update_api.rs
  • lib/storage/src/content_manager/errors.rs
  • src/tonic/api/update_common.rs
  • lib/api/src/rest/models.rs
  • lib/collection/src/operations/verification/query.rs
  • lib/api/src/grpc/qdrant.rs
  • src/common/inference/infer_processing.rs
  • src/common/inference/service.rs
  • src/common/update.rs
  • src/common/inference/query_requests_rest.rs
  • src/actix/helpers.rs
  • lib/api/src/grpc/proto/points.proto
  • docs/redoc/master/openapi.json
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
docs/grpc/docs.md

181-181: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


182-182: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


183-183: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


184-184: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


185-185: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


186-186: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


187-187: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


188-188: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: integration-tests-consensus
  • GitHub Check: test-consensus-compose
  • GitHub Check: integration-tests
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: storage-compat-test
  • GitHub Check: lint
🔇 Additional comments (8)
docs/grpc/docs.md (3)

3171-3174: Confirm anchor and heading formatting
The anchor and heading for InferenceUsage are correctly added following established conventions.


3187-3192: Confirm anchor and heading formatting
The anchor and heading for InferenceUsage.ModelEntry follow the existing pattern and are correctly placed.


3288-3291: Confirm anchor and heading formatting for ModelUsage
The ModelUsage section is correctly added and consistent with other message definitions.

src/common/inference/update_requests.rs (5)

182-187: LGTM! Usage aggregation issue has been properly addressed.

The inference usage merging logic correctly accumulates usage data from multiple named vector batches, addressing the concern raised in the previous review. The implementation properly merges each batch_usage into the accumulator instead of only capturing the first non-None value.


67-67: Improved variable naming for clarity.

The renaming from vector to vector_data in named vector processing provides better clarity about what the variable represents, especially when dealing with nested vector structures.

Also applies to: 326-327, 412-414


76-78: Enhanced error messages with explicit context.

The error messages now provide specific context about where the inference failure occurred (e.g., "for named vector", "for document", "for image", "for object"), making debugging easier. The explicit None parameter for the optional error detail is also consistent with the error constructor signature.

Also applies to: 93-95, 113-115, 133-135


21-21: Consistent inference usage tracking across all conversion functions.

All conversion functions now properly return tuples containing both the converted data and optional InferenceUsage, enabling comprehensive usage tracking throughout the conversion pipeline.

Also applies to: 165-165, 225-225, 390-390


192-210: Appropriate error handling for unsupported direct processing.

The explicit error handling for direct Document/Image/Object processing in top-level batch vectors correctly guides users to use named vectors instead, preventing potential confusion about supported usage patterns.

@n0x29a n0x29a force-pushed the inference-usage-rest branch from 83df404 to 1dc2bc7 Compare May 23, 2025 10:14
@timvisee
Copy link
Member

In a call we agreed that the current approach isn't very sustainable. Returning usage in all ok/err returns adds a lot of boilerplate.

We agreed to try a different approach instead, one that follows the design of our current hardware usage counters. Both concepts seem quite similar. That way we don't need to return usages everywhere, and we handle ok/error statuses in exactly the same way.

@n0x29a n0x29a force-pushed the inference-usage-rest branch from 1dc2bc7 to 7eb2ef3 Compare May 26, 2025 10:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds inference usage tracking to the REST API endpoints by updating the calls to process_response and process_response_error throughout the codebase, as well as integrating new models for inference usage into the GRPC and REST models.

  • Updated process_response and process_response_error calls to accept an extra argument for inference usage.
  • Added new types and conversion routines to support inference usage in both REST and GRPC APIs.
  • Extended the OpenAPI and GRPC schemas and documentation to include inference tracking data.

Reviewed Changes

Copilot reviewed 32 out of 32 changed files in this pull request and generated 2 comments.

File Description
src/actix/api/*.rs Updated API endpoints to pass an extra inference usage parameter (defaulted to None).
lib/api/src/rest/models.rs Introduced new InferenceUsage and ModelUsage types along with corresponding changes.
lib/api/src/grpc/* Extended GRPC models, conversions, and proto definitions to include inference usage.
docs/redoc/master/openapi.json Updated documentation and schema definitions to document the new inference tracking.
Comments suppressed due to low confidence (1)

src/actix/api/query_api.rs:90

  • Verify that the extraction and consistent passing of the inference_usage value is handled uniformly across all API endpoints to avoid unexpected behavior.
let (result, inference_usage) = match result {

@n0x29a n0x29a merged commit 1dfebb5 into dev May 27, 2025
17 checks passed
@n0x29a n0x29a deleted the inference-usage-rest branch May 27, 2025 09:48
@coderabbitai coderabbitai bot mentioned this pull request May 28, 2025
9 tasks
@coderabbitai coderabbitai bot mentioned this pull request Jun 17, 2025
3 tasks
generall added a commit that referenced this pull request Jul 17, 2025
* Add inference usage tracking to REST API

---------

Co-authored-by: Andrey Vasnetsov <andrey@vasnetsov.com>
@coderabbitai coderabbitai bot mentioned this pull request Aug 13, 2025
@coderabbitai coderabbitai bot mentioned this pull request Nov 25, 2025
9 tasks
This was referenced Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants