Add inference usage tracking to REST API#6563
Conversation
There was a problem hiding this comment.
Pull Request Overview
Adds tracking and reporting of inference usage throughout the REST API, helper functions, and data models.
- Batch inference routines now return optional
InferenceUsage - All Actix handlers and helpers updated to include
inference_usagein API responses - REST and gRPC model, conversion, and proto definitions extended to support
InferenceUsage
Reviewed Changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/common/inference/infer_processing.rs | Return InferenceUsage alongside batched inference results |
| src/actix/helpers.rs | Accept and serialize inference_usage in helper response builders |
| src/actix/api/update_api.rs | Propagate inference usage through all update endpoints |
| src/actix/api/shards_api.rs | Include inference_usage in shard-key endpoints |
| src/actix/api/service_api.rs | Report inference usage in service-level endpoints |
| src/actix/api/search_api.rs | Attach inference usage to search responses |
| src/actix/api/retrieve_api.rs | Add inference usage support to retrieval endpoints |
| src/actix/api/recommend_api.rs | Pass inference usage through recommendation endpoints |
| src/actix/api/query_api.rs | Integrate inference usage in query and batch-query handlers |
| src/actix/api/local_shard_api.rs | Report inference usage in local shard point operations |
| src/actix/api/facet_api.rs | Include inference usage in facet endpoint |
| src/actix/api/discovery_api.rs | Add inference usage to discovery endpoints |
| src/actix/api/count_api.rs | Serialize inference usage in count endpoints |
| src/actix/api/collections_api.rs | Extend collection management endpoints to include inference usage |
| lib/api/src/rest/models.rs | Define InferenceUsage and include it in Usage struct |
| lib/api/src/rest/conversions.rs | Implement From<InferenceUsage> conversion for gRPC usage |
| lib/api/src/grpc/qdrant.rs | Add gRPC InferenceUsage and ModelUsage message definitions |
| lib/api/src/grpc/proto/points.proto | Extend protobuf Usage and add InferenceUsage messages |
| lib/api/src/grpc/ops.rs | Update Usage struct and usage_or_none to handle inference usage |
| lib/api/src/grpc/conversions.rs | Initialize inference field in generated Usage conversions |
Comments suppressed due to low confidence (3)
src/actix/api/query_api.rs:147
- Currently selecting only the first inference usage in batch queries may underreport total usage; consider aggregating usages across all requests (e.g., summing tokens per model) instead of taking only the first.
let first_inference_usage: Option<InferenceUsage> = inference_usages.into_iter().flatten().next();
lib/api/src/grpc/conversions.rs:2247
- [nitpick] The variable name
usagehere shadows the struct field and can be misleading; consider renaming the parameter tohardware_usagefor clarity.
hardware: usage,
src/common/inference/infer_processing.rs:26
- New inference usage is now returned from
from_objects; consider adding unit tests to verify thatInferenceUsageis correctly propagated for both non-empty and empty inference responses.
)-> Result<(Self, Option<InferenceUsage>), StorageError> {
📝 WalkthroughWalkthroughThis change introduces a new mechanism for tracking inference usage metrics alongside existing hardware usage metrics throughout the codebase. The Possibly related PRs
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms (13)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
lib/api/src/grpc/qdrant.rs (1)
6573-6592: Well-structured changes for inference usage tracking.The implementation extends the
Usagestruct with a new optionalinferencefield and adds two supporting structures (InferenceUsageandModelUsage) that follow protobuf conventions and Rust best practices. The model-based token tracking provides granular insight into inference resource consumption.Consider adding documentation comments to these new structures to explain their purpose and usage patterns to future developers.
#[derive(serde::Serialize)] #[allow(clippy::derive_partial_eq_without_eq)] #[derive(Clone, PartialEq, ::prost::Message)] +/// Tracks inference usage metrics across different models pub struct InferenceUsage { #[prost(map = "string, message", tag = "1")] + /// Maps model names to their respective usage metrics pub model: ::std::collections::HashMap<::prost::alloc::string::String, ModelUsage>, } #[derive(serde::Serialize)] #[allow(clippy::derive_partial_eq_without_eq)] #[derive(Clone, PartialEq, ::prost::Message)] +/// Represents usage metrics for a specific model pub struct ModelUsage { #[prost(uint64, tag = "1")] + /// Count of tokens processed by the model pub tokens: u64,src/common/inference/infer_processing.rs (1)
3-4: API models leaked into the core crate – consider reversing dependencyImporting
InferenceUsagefromapi::rest::modelscreates an upward dependency from a low-level “common” module to a high-level REST layer.
This breaks the usual layering (core ➔ transport) and may cause cyclic-compile headaches when another transport (e.g., gRPC) needs the same struct.Recommend introducing a transport-agnostic
InferenceUsagedefinition in a shared crate (or move it tocommon::types) and add thin converter traits in the REST/grpc layers instead.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (29)
.gitignore(1 hunks)lib/api/src/grpc/conversions.rs(1 hunks)lib/api/src/grpc/ops.rs(2 hunks)lib/api/src/grpc/proto/points.proto(1 hunks)lib/api/src/grpc/qdrant.rs(1 hunks)lib/api/src/rest/conversions.rs(1 hunks)lib/api/src/rest/models.rs(3 hunks)src/actix/api/collections_api.rs(6 hunks)src/actix/api/count_api.rs(2 hunks)src/actix/api/discovery_api.rs(4 hunks)src/actix/api/facet_api.rs(2 hunks)src/actix/api/local_shard_api.rs(5 hunks)src/actix/api/query_api.rs(8 hunks)src/actix/api/recommend_api.rs(6 hunks)src/actix/api/retrieve_api.rs(6 hunks)src/actix/api/search_api.rs(10 hunks)src/actix/api/service_api.rs(3 hunks)src/actix/api/shards_api.rs(2 hunks)src/actix/api/update_api.rs(25 hunks)src/actix/helpers.rs(6 hunks)src/common/inference/infer_processing.rs(3 hunks)src/common/inference/query_requests_grpc.rs(2 hunks)src/common/inference/query_requests_rest.rs(10 hunks)src/common/inference/service.rs(4 hunks)src/common/inference/update_requests.rs(12 hunks)src/common/update.rs(10 hunks)src/tonic/api/points_api.rs(2 hunks)src/tonic/api/query_common.rs(15 hunks)src/tonic/api/update_common.rs(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (12)
src/actix/api/discovery_api.rs (1)
src/actix/helpers.rs (2)
process_response_error(70-96)process_response(47-68)
src/actix/api/count_api.rs (1)
src/actix/helpers.rs (2)
process_response_error(70-96)process_response(47-68)
src/actix/api/local_shard_api.rs (2)
src/actix/api/update_api.rs (1)
process_response(448-448)src/actix/helpers.rs (2)
process_response(47-68)process_response_error(70-96)
src/actix/api/collections_api.rs (1)
src/actix/helpers.rs (1)
process_response(47-68)
src/actix/api/facet_api.rs (1)
src/actix/helpers.rs (2)
process_response_error(70-96)process_response(47-68)
src/actix/api/search_api.rs (1)
src/actix/helpers.rs (2)
process_response_error(70-96)process_response(47-68)
src/tonic/api/points_api.rs (1)
lib/api/src/grpc/ops.rs (1)
usage_or_none(36-45)
src/actix/api/recommend_api.rs (1)
src/actix/helpers.rs (2)
process_response_error(70-96)process_response(47-68)
src/tonic/api/query_common.rs (1)
lib/api/src/grpc/ops.rs (1)
usage_or_none(36-45)
src/actix/api/retrieve_api.rs (1)
src/actix/helpers.rs (2)
process_response_error(70-96)process_response(47-68)
src/actix/api/update_api.rs (3)
src/actix/helpers.rs (2)
process_response_error(70-96)process_response(47-68)src/common/update.rs (3)
do_upsert_points(232-281)check_strict_mode(125-172)do_batch_update_points(567-700)lib/storage/src/content_manager/collection_verification.rs (1)
check_strict_mode(65-80)
src/common/update.rs (1)
src/common/inference/update_requests.rs (3)
convert_batch(158-209)convert_point_struct(17-156)convert_point_vectors(211-293)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-consistency
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: integration-tests-consensus
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: lint
- GitHub Check: test-low-resources
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: integration-tests
- GitHub Check: storage-compat-test
- GitHub Check: Basic TLS/HTTPS tests
🔇 Additional comments (86)
.gitignore (1)
19-19: Adding .env to gitignore is good practice.This change correctly excludes environment configuration files from version control, which is important for keeping sensitive or environment-specific configuration private.
src/actix/api/facet_api.rs (2)
43-43: Correctly updated error handling to support inference usage tracking.The additional
Noneparameter aligns with the updated signature ofprocess_response_errorwhich now accepts an optionalInferenceUsageparameter for tracking inference metrics in API responses.
74-74: Properly extended process_response to support inference usage tracking.The added
Noneparameter correctly implements the updated function signature that now includes an optionalInferenceUsageparameter, consistent with the broader changes for inference usage tracking in API responses.src/actix/api/count_api.rs (2)
40-40: Correctly updated error handling to support inference usage tracking.The additional
Noneparameter properly implements the updated signature ofprocess_response_errorwhich now accepts an optionalInferenceUsageparameter.
69-69: Properly extended process_response to support inference usage tracking.The added
Noneparameter correctly implements the updated function signature that now includes an optionalInferenceUsageparameter, consistent with the system-wide inference usage tracking enhancements.src/tonic/api/points_api.rs (2)
705-705: Correctly updated usage_or_none call to support inference metrics.The additional
Noneparameter properly implements the updated signature ofusage_or_nonewhich now accepts an optionalInferenceUsageparameter for the search_matrix_pairs response.
731-731: Correctly updated usage_or_none call to support inference metrics.The additional
Noneparameter properly implements the updated signature ofusage_or_nonewhich now accepts an optionalInferenceUsageparameter for the search_matrix_offsets response.src/common/inference/query_requests_grpc.rs (4)
68-68: Code updated to extract innerBatchAccumInferreddata from tuple return typeThe code now passes
&inferred.0instead of&inferred, accessing the first element of the tuple. This aligns with the broader changes that wrap inference results with usage information.
75-75: Code updated to extract innerBatchAccumInferreddata from tuple return typeThis line has been updated to access the inner batch data using
&inferred.0, matching the pattern in the other modified calls in this file.
148-148: Code updated to extract innerBatchAccumInferreddata from tuple return typeSimilar update to the previous ones, accessing the first element of the tuple return type to get the batch data.
152-152: Code updated to extract innerBatchAccumInferreddata from tuple return typeCode consistently updated to access the first element of the tuple return value.
src/actix/api/shards_api.rs (2)
45-45: Parameter added for inference usage trackingThe
process_responsecall has been updated to include a fourth parameterNonefor the optional inference usage. This matches the updated function signature insrc/actix/helpers.rsand aligns with the broader inference usage tracking implementation.
75-75: Parameter added for inference usage trackingSimilar to line 45, this
process_responsecall has been updated to include the additionalNoneparameter for inference usage.src/actix/api/discovery_api.rs (4)
43-43: Parameter added for inference usage trackingThe call to
process_response_errorhas been updated to include the fourth parameterNonefor inference usage, matching the updated function signature.
79-79: Parameter added for inference usage trackingThis
process_responsecall now includes the fourth parameterNonefor inference usage tracking, maintaining consistency with the rest of the API layer changes.
103-103: Parameter added for inference usage trackingSimilar to line 43, this
process_response_errorcall has been updated with the additionalNoneparameter for inference usage.
136-136: Parameter added for inference usage trackingThis
process_responsecall has been updated to include the fourth parameterNonefor inference usage, consistent with other API endpoints.lib/api/src/rest/conversions.rs (2)
10-11: Added imports for inference usage typesNecessary imports added for the new inference usage conversion implementation.
14-27: Added conversion from REST to gRPC InferenceUsageThis implementation provides a clean translation from the REST API's
InferenceUsagetype to the gRPC equivalent, properly mapping the model usage data between the two formats. The implementation correctly iterates through each model in the source and constructs the corresponding entry in the target.lib/api/src/grpc/conversions.rs (1)
2246-2249: Correctly implementing inference usage tracking in gRPC conversion.The addition of the
inferencefield to theUsagestruct in the conversion fromPointsOperationResponseInternaltoPointsOperationResponseproperly implements the inference usage tracking feature, with a default value ofNonewhen no inference data is available.src/actix/api/local_shard_api.rs (5)
75-75: Updated process_response call to support inference usage tracking.The call has been properly updated to include the new
inference_usageparameter asNone, which aligns with the updated function signature in the helpers module.
104-104: Updated process_response_error call to support inference usage tracking.The error handler correctly includes
Nonefor the new inference usage parameter, maintaining consistency with the updated function signature.
151-151: Updated process_response call to support inference usage tracking.The call has been properly updated to include the new
inference_usageparameter asNone, which aligns with the updated function signature in the helpers module.
178-178: Updated process_response_error call to support inference usage tracking.The error handler correctly includes
Nonefor the new inference usage parameter, maintaining consistency with the updated function signature.
222-222: Updated process_response call to support inference usage tracking.The call has been properly updated to include the new
inference_usageparameter asNone, which aligns with the updated function signature in the helpers module.src/actix/api/service_api.rs (3)
73-73: Updated process_response_error call to support inference usage tracking.The error handler correctly includes
Nonefor the new inference usage parameter, maintaining consistency with the updated function signature.
185-185: Updated process_response call to support inference usage tracking.The call has been properly updated to include the new
inference_usageparameter asNone, which aligns with the updated function signature in the helpers module.
201-201: Updated process_response call to support inference usage tracking.The call has been properly updated to include the new
inference_usageparameter asNone, which aligns with the updated function signature in the helpers module.lib/api/src/grpc/proto/points.proto (2)
1174-1174: Appropriate extension to Usage message for inference trackingThe addition of the optional
inferencefield of typeInferenceUsageto theUsagemessage properly extends the usage tracking capability to include inference metrics alongside hardware metrics.
1178-1188: Well-structured message definitions for inference trackingThe new message definitions provide a clean structure for tracking inference usage:
InferenceUsagewith a map from model names to usage detailsModelUsagewith a tokens count fieldThis approach allows for granular tracking per model while maintaining a simple data structure.
src/actix/api/recommend_api.rs (6)
52-52: Updated error processing to support inference usageFunction call correctly updated to match the new signature of
process_response_errorwhich now takes an additional parameter for optional inference usage.
88-88: Updated response processing to support inference usageThe call to
helpers::process_responsehas been properly extended with aNoneparameter for inference usage, aligning with the updated function signature while maintaining the existing behavior.
143-143: Updated error handling in batch processing to include inference usage parameterFunction call correctly updated to match the new signature of
process_response_errorwhich now takes an additional parameter for optional inference usage.
176-176: Updated batch response processing to support inference usageThe call to
helpers::process_responsecorrectly includes the additionalNoneparameter for inference usage data in batch operations.
203-203: Updated error handling in group recommendations to include inference usage parameterError handling for group recommendations now correctly includes the inference usage parameter in the
process_response_errorcall.
231-231: Updated group recommendation response processing to support inference usageThe call to
helpers::process_responsefor group recommendations properly includes the new inference usage parameter.src/actix/api/collections_api.rs (6)
121-121: Updated collection creation error handling with inference usage supportThe early return error case for collection creation now correctly passes the required
Noneparameter for inference usage.
131-131: Updated collection creation response handling to support inference trackingThe successful path for collection creation now properly includes the inference usage parameter in the response processing.
154-154: Updated collection update response handling to support inference trackingThe response processing for the collection update operation correctly includes the new inference usage parameter.
174-174: Updated collection deletion response handling to support inference trackingThe response processing for collection deletion now properly includes the inference usage parameter.
192-192: Updated alias update response handling to support inference trackingThe alias update operation's response processing now correctly includes the inference usage parameter.
230-230: Updated cluster operation response handling to support inference trackingThe collection cluster update operation's response processing now properly includes the inference usage parameter.
lib/api/src/grpc/ops.rs (3)
1-1: Added import for InferenceUsage typeCorrectly imported the
InferenceUsagetype alongside the existingHardwareUsageandUsagetypes.
27-32: Enhanced emptiness check to include inference usageThe
is_emptymethod now correctly checks both hardware and inference fields, ensuring that aUsageis only considered empty when both fields areNone. This is important for maintaining consistent behavior when determining whether to include usage data in responses.
36-43: Updated usage_or_none function to handle inference metricsThe function signature now properly accepts an optional
InferenceUsageparameter and correctly constructs theUsagestruct with both hardware and inference fields. This allows the function to create usage objects that can include inference metrics when available.src/tonic/api/query_common.rs (2)
135-135: Support for inference usage tracking addedThe change adds a second parameter (
None) to theusage_or_nonefunction call to accommodate the new inference usage tracking capability while maintaining backward compatibility.
183-183: Consistent inference usage parameter additionThe changes consistently add
Noneas the inference usage parameter across all API response handling functions, ensuring uniform behavior throughout the codebase.Also applies to: 241-241, 296-296, 349-349, 406-406, 460-460, 506-506, 562-562, 641-641, 699-699, 764-764, 819-819, 875-875
src/actix/api/retrieve_api.rs (2)
89-89: Error handling updated for inference trackingThe error handling paths now include the additional
Noneparameter for inference usage inprocess_response_errorcalls, maintaining consistency with the new function signature.Also applies to: 96-96, 145-145, 211-211
124-124: Response processing updated for inference trackingAll successful response paths now properly include the additional
Noneparameter for inference usage inprocess_responsecalls, which will allow for future inclusion of inference metrics when available.Also applies to: 184-184, 240-240
src/actix/api/search_api.rs (3)
51-51: Consistent error handling with inference usage support.The additional
Noneparameter for inference usage ensures proper alignment with the updated signature ofprocess_response_errorto support inference usage tracking.
86-86: Consistent response handling with inference usage support.The search functionality now correctly passes
Nonefor inference usage to the updatedprocess_responsefunction, maintaining API consistency while preparing for potential future inference capabilities in search operations.
127-127: Uniformly updated response processing.All search endpoints now consistently include the inference usage parameter (as
None) in their response processing, ensuring uniform API behavior across the codebase.Also applies to: 161-161, 188-188, 216-216, 243-243, 272-272, 299-299, 328-328
src/tonic/api/update_common.rs (3)
79-80: Updated tuple destructuring for inference usage support.The code now correctly extracts the first element of the tuple returned by the update operation, containing the actual update result, while separating it from the inference usage data.
179-180: Consistent pattern for extracting update results.The
.0accessor pattern is consistently applied to extract the primary result from the tuple returned by the update vectors operation.
826-827: Extracting points from inference results tuple.The code now correctly accesses the points data from the tuple returned by
convert_point_struct, allowing the inference usage part to be potentially used elsewhere.lib/api/src/rest/models.rs (4)
3-3: Added necessary imports for inference usage tracking.Imported
HashMapfrom ahash and expanded serde imports to includeDeserialize, required for the new inference tracking structures.Also applies to: 7-7
59-59: Added inference usage tracking to the Usage struct.The
Usagestruct now includes an optionalinferencefield to track model usage statistics alongside hardware usage.
63-70: Updated is_empty method to consider inference usage.The
is_emptymethod now correctly checks both hardware and inference fields to determine if usage data is present, ensuring consistent behavior when deciding whether to include usage data in responses.
91-101: Added new structures for inference usage tracking.The new
InferenceUsageandModelUsagestructs provide a standardized way to represent and serialize inference model usage statistics, such as token counts, in API responses.src/actix/api/update_api.rs (8)
45-46: Updated error response to include inference usage parameter.The error handling for strict mode verification now properly includes the
Noneinference usage parameter when callingprocess_response_error.
48-48: Improved variable naming for clarity.Using
operation_innerinstead of overwriting the originaloperationvariable improves code clarity by clearly distinguishing between the original JSON wrapper and its extracted contents.
58-62: Updated function naming to reflect new return value.Renaming the variable to
result_with_usageclearly indicates that the function now returns both the update result and inference usage information.
70-80: Proper handling of inference usage in responses.The code now correctly destructures the tuple returned by
do_upsert_pointsto extract both the update result and inference usage, propagating this information through to the API response.
427-444: Improved batch update implementation with inference support.The batch update operation now consistently handles inference usage tracking while maintaining clear variable naming with the
_innersuffix pattern, ensuring that both the original request data and its extracted contents are clearly distinguished.
460-482: Proper aggregation of inference usage across batch operations.The batch update function now correctly propagates inference usage information from individual operations, allowing clients to track and monitor model usage across multiple operations in a single request.
515-519: Consistent handling for operations without inference usage.Operations like index creation that don't produce inference usage metrics consistently pass
Nonefor the inference usage parameter, ensuring API response consistency.
542-542: Consistent response handling for index operations.The index deletion endpoint now properly passes
Nonefor inference usage, aligning with the updated function signature.src/common/inference/service.rs (4)
7-7: Appropriate import for inference usage tracking.The new import correctly brings in the
InferenceUsagemodel required for this feature.
53-57: Good design for tracking inference usage.Making
InferenceResponsepublic and adding an optionalusagefield properly encapsulates both the embeddings result and usage metrics without requiring them to always be present.
172-177: Correctly updated return type for inference usage.The return type now includes the full
InferenceResponsewhich contains both embeddings and optional usage information.
217-228: Simplified deserialization logic.The implementation now directly deserializes into
InferenceResponsewithout additional extraction or validation steps, while maintaining the same error handling for HTTP status codes.src/common/update.rs (3)
241-262: Correctly propagates inference usage through the upsert operation.The function now properly extracts and returns optional inference usage from conversion functions. The pattern matching with destructuring is clean and effective.
328-352: Consistent propagation of inference usage in vector updates.The vector update function follows the same pattern as upsert operations, maintaining consistency in how inference usage is propagated through the system.
576-698: Well-implemented batch operation inference usage collection.This implementation:
- Correctly handles multiple operations that might return different usage data
- Takes the first non-None inference usage it encounters
- Properly handles operations that don't produce usage data
The approach ensures inference usage is collected without duplicating or merging potentially incompatible usage metrics.
src/common/inference/query_requests_rest.rs (6)
1-1: Added appropriate import for the new feature.The import of
InferenceUsagefrom the REST models is required for the usage tracking functionality.
22-30: Well-designed wrapper structs for inference usage.These wrapper structures provide a clean way to associate inference usage with query requests while maintaining backward compatibility with existing code.
32-91: Proper extraction and propagation of inference usage in query groups.The function now correctly extracts inference usage from batch processing and properly wraps it with the query request in the return value.
93-146: Consistent handling of inference usage in query requests.This implementation follows the same pattern as the query groups function, maintaining consistency across the codebase.
245-246: Good variable rename for clarity.Renaming from
contexttocontext_pairsimproves code clarity by avoiding variable shadowing and making the variable's purpose more explicit.
277-277: Improved variable naming.Renaming from
querytoconverted_querymore accurately reflects the transformed state of the variable.src/common/inference/update_requests.rs (6)
3-3: Added required import for inference usage tracking.The import of
InferenceUsageis necessary for the feature implementation.
17-21: Proper function signature update for propagating usage data.The return type now includes the optional inference usage alongside the converted point structures.
45-52: Effective pattern for extracting and propagating inference usage.The code extracts both the inferred data and usage from batch accumulation, ensuring that usage metrics are properly propagated through the conversion process.
158-209: Comprehensive approach to batch conversion with usage tracking.This implementation:
- Initializes an optional inference usage
- For named vectors, correctly extracts usage from each conversion
- Takes the first non-None usage encountered
- Returns both the converted batch and accumulated usage
The error handling for unsupported vector types is maintained.
211-293: Consistent handling of optional inference usage in point vectors conversion.The function continues the established pattern of extracting and propagating inference usage, maintaining consistency with other conversion functions.
372-414: Clean implementation for vectors conversion with usage tracking.The function properly:
- Accumulates inference objects
- Performs inference to get vectors and usage
- Converts each vector based on whether inference was needed
- Returns both converted vectors and optional usage
The map/collect pattern combined with the final mapping to include usage is elegant.
| if inference_response.embeddings.is_empty() { | ||
| return Err(StorageError::service_error( | ||
| "Inference service returned no vectors. Check if models are properly loaded.", | ||
| )); | ||
| } | ||
|
|
||
| let objects = objects_serialized.into_iter().zip(vectors).collect(); | ||
| let objects = objects_serialized | ||
| .into_iter() | ||
| .zip(inference_response.embeddings) | ||
| .collect(); |
There was a problem hiding this comment.
Ensure embeddings count matches objects to avoid silent data loss
zip truncates to the shorter iterator.
If the inference service returns fewer vectors than requested objects, the current logic will silently drop the tail of objects_serialized, leading to missing vectors without an explicit error.
-if inference_response.embeddings.is_empty() {
+let embeddings_len = inference_response.embeddings.len();
+if embeddings_len == 0 {
return Err(StorageError::service_error(
"Inference service returned no vectors. Check if models are properly loaded.",
));
}
+
+if embeddings_len != objects_serialized.len() {
+ return Err(StorageError::service_error(format!(
+ "Inference service returned {embeddings_len} vectors for {} inputs",
+ objects_serialized.len()
+ )));
+}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/common/inference/infer_processing.rs around lines 53 to 62, the code uses
zip to combine objects_serialized and embeddings, which can silently drop
objects if embeddings are fewer. Add a check to compare the length of embeddings
and objects_serialized before zipping, and return an explicit error if they do
not match to prevent silent data loss.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
docs/redoc/master/openapi.json (2)
15255-15268: Add descriptive metadata toInferenceUsageschema
Consider enriching theInferenceUsagedefinition withdescriptionandexamplefields so API consumers understand its purpose:"InferenceUsage": { "type": "object", "description": "Maps model names to their token usage counts", "required": ["models"], "properties": { "models": { "type": "object", "additionalProperties": { "$ref": "#/components/schemas/ModelUsage" } } }, "example": { "models": { "gpt-4": { "tokens": 1200 }, "gpt-3.5-turbo": { "tokens": 800 } } } }Adding these fields makes the OpenAPI docs and generated client code more self-explanatory.
15269-15281: Include examples or format clarifications inModelUsage
TheModelUsage.tokensproperty is well defined, but adding anexampleor clarifying"type": "integer"usage will aid in client generation:"ModelUsage": { "type": "object", "description": "Token usage metrics for a model", "required": ["tokens"], "properties": { "tokens": { "type": "integer", "format": "uint64", "minimum": 0, "example": 1024 } } }This makes the schema more descriptive for downstream tools.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/redoc/master/openapi.json(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-consensus-compose
- GitHub Check: test-low-resources
- GitHub Check: storage-compat-test
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-consistency
- GitHub Check: lint
- GitHub Check: integration-tests
- GitHub Check: integration-tests-consensus
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (windows-latest)
| }, | ||
| "inference": { | ||
| "anyOf": [ | ||
| { | ||
| "$ref": "#/components/schemas/InferenceUsage" | ||
| }, | ||
| { | ||
| "nullable": true | ||
| } | ||
| ] | ||
| } |
There was a problem hiding this comment.
Fix nullability declaration for the inference field
The anyOf branch { "nullable": true } is not a valid standalone schema in JSON Schema. To explicitly allow null, replace it with a { "type": "null" } schema or move nullable: true to the parent object in OpenAPI 3.0. For example:
-"inference": {
- "anyOf": [
- { "$ref": "#/components/schemas/InferenceUsage" },
- { "nullable": true }
- ]
-}
+"inference": {
+ "anyOf": [
+ { "$ref": "#/components/schemas/InferenceUsage" },
+ { "type": "null" }
+ ]
+}This ensures proper schema validation across clients and tools.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| }, | |
| "inference": { | |
| "anyOf": [ | |
| { | |
| "$ref": "#/components/schemas/InferenceUsage" | |
| }, | |
| { | |
| "nullable": true | |
| } | |
| ] | |
| } | |
| }, | |
| "inference": { | |
| "anyOf": [ | |
| { | |
| "$ref": "#/components/schemas/InferenceUsage" | |
| }, | |
| { | |
| "type": "null" | |
| } | |
| ] | |
| } |
🤖 Prompt for AI Agents
In docs/redoc/master/openapi.json around lines 15242 to 15252, the `anyOf` array
for the `inference` field incorrectly uses `{ "nullable": true }` as a
standalone schema, which is invalid. To fix this, replace `{ "nullable": true }`
with `{ "type": "null" }` or alternatively move the `nullable: true` property to
the parent `inference` object to correctly indicate that the field can be null,
ensuring proper schema validation.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
docs/grpc/docs.md (1)
181-182: Fix markdown list indentation for nested ToC entries
The newly added ToC bullets forInferenceUsageandInferenceUsage.ModelEntryare indented with 4 spaces, but our markdownlint config expects 2-space indents for nested lists.- - [InferenceUsage](#qdrant-InferenceUsage) - - [InferenceUsage.ModelEntry](#qdrant-InferenceUsage-ModelEntry) + - [InferenceUsage](#qdrant-InferenceUsage) + - [InferenceUsage.ModelEntry](#qdrant-InferenceUsage-ModelEntry)🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
181-181: Unordered list indentation
Expected: 2; Actual: 4(MD007, ul-indent)
182-182: Unordered list indentation
Expected: 2; Actual: 4(MD007, ul-indent)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/grpc/docs.md(4 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
docs/grpc/docs.md
181-181: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
182-182: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
183-183: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
184-184: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
185-185: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
186-186: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
187-187: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
188-188: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-consistency
- GitHub Check: test-low-resources
- GitHub Check: integration-tests-consensus
- GitHub Check: lint
- GitHub Check: test-consensus-compose
- GitHub Check: integration-tests
- GitHub Check: storage-compat-test
🔇 Additional comments (4)
docs/grpc/docs.md (4)
3171-3180: InferenceUsage type definition is correct
TheInferenceUsagemessage is documented with the appropriate header and table schema. It clearly specifies the repeatedmodelfield of typeInferenceUsage.ModelEntry.
3187-3195: InferenceUsage.ModelEntry documentation looks good
TheInferenceUsage.ModelEntrymap-entry is accurately defined withkeyandvaluecolumns.
3287-3295: ModelUsage schema is properly documented
TheModelUsagemessage and its singletokensfield are correctly presented.
4814-4814: Includeinferencein Usage message
TheUsagetable is updated to include the optionalinferencefield of typeInferenceUsage, matching the design in gRPC and REST models.
coszio
left a comment
There was a problem hiding this comment.
I didn't notice before for a lack of being in the loop, but I believe there is no need to duplicate the inference handling twice (one for grpc and one for rest) since we already have CollectionQueryRequest, which acts as a direct converging type from both interfaces, exactly with this kind of purpose.
Previously, it was introduced to handle ID -> vector resolution. I think it is the same case for Object -> vector inference.
I'm fine with dismissing this change request if there a reason it is being handled directly on the API types, I just want to make it explicit.
| message ModelUsage { | ||
| uint64 tokens = 1; | ||
| } |
There was a problem hiding this comment.
Is this only intended for text? how will we represent measurements about other modalities like images?
There was a problem hiding this comment.
for images it will be same format:
"models": {
"openai/clip-v1": { "tokens": 50 }
},
currently supported: text, images, object (it could be anything like sound, video, etc)
| let CollectionQueryRequestWithUsage { request, usage } = | ||
| convert_query_request_from_rest(query_request, &inference_token).await?; | ||
|
|
||
| let pass = check_strict_mode( |
There was a problem hiding this comment.
Strict mode should be checked before inference.
There was a problem hiding this comment.
it will require implementing StrictModeVerification Trait for external type
There was a problem hiding this comment.
We've decided to implement this in a separate PR to keep the current one simple
src/actix/api/query_api.rs
Outdated
| .collect_vec(); | ||
|
|
||
| Ok(QueryResponse { points }) | ||
| Ok((QueryResponse { points }, usage)) |
There was a problem hiding this comment.
we also want to return usage even in a case when operation failed on later stages.
There was a problem hiding this comment.
we've decided to not implement it in this pr
After discussion in chat, it makes sense to keep current approach of handling at API layer with the purpose of not introducing inference responsibility into Collection level. This dismisses my previous review 👍
| @@ -55,12 +56,17 @@ pub struct ApiResponse<D> { | |||
| pub struct Usage { | |||
| #[serde(skip_serializing_if = "Option::is_none")] | |||
There was a problem hiding this comment.
Why skipping only for hardware?
There was a problem hiding this comment.
it will be not enough to add same condition for inference because there is no guarantee that it will be exactly None instead of Some. Because of that there is check here
src/actix/api/query_api.rs
Outdated
| .await; | ||
|
|
||
| helpers::process_response(result, timing, request_hw_counter.to_rest_api()) | ||
| let (result, inference_usage) = match result { |
There was a problem hiding this comment.
This statement seems to be used a lot. Can we maybe extract it into a function?
There was a problem hiding this comment.
These two lines don't justify extracting a standalone function we'll never reuse, I think
| hardware_usage: Option<HardwareUsage>, | ||
| inference_usage: Option<InferenceUsage>, | ||
| ) -> HttpResponse { | ||
| let usage = { |
There was a problem hiding this comment.
You created a function that seems to do exactly the same: usage_or_none
Can we call this one here?
There was a problem hiding this comment.
it can't be used directly because usage_or_none expects grpc types
| )) | ||
| })?; | ||
|
|
||
| if inference_response.embeddings.is_empty() { |
There was a problem hiding this comment.
Why don't we need this error anymore?
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/common/inference/query_requests_rest.rs (1)
56-59: Consider standardizing error handling for usage dataThere's some inconsistency in how usage data is handled in errors. Low-level functions like
convert_vector_input_with_inferreddon't include usage in their errors (lines 187, 196, 205), but these errors are later wrapped with usage data in parent functions. Consider standardizing this approach by either:
- Passing usage data to all error-generating functions, or
- Creating a helper function for consistent error mapping with usage data
This would reduce code duplication and ensure consistent error handling throughout the file.
Also applies to: 69-72, 128-133, 142-145, 222-228, 241-248, 253-261, 278-284, 289-295, 307-315
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
lib/collection/src/operations/types.rs(1 hunks)lib/collection/src/operations/universal_query/collection_query.rs(3 hunks)lib/collection/src/operations/verification/query.rs(2 hunks)lib/storage/src/content_manager/errors.rs(5 hunks)src/actix/api/query_api.rs(8 hunks)src/common/inference/query_requests_rest.rs(15 hunks)src/common/inference/update_requests.rs(13 hunks)
✅ Files skipped from review due to trivial changes (1)
- lib/collection/src/operations/universal_query/collection_query.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- src/actix/api/query_api.rs
🔇 Additional comments (37)
lib/collection/src/operations/types.rs (1)
1040-1043: Excellent addition of inference usage tracking to error reportingThis change adds an optional
usagefield to theInferenceErrorvariant, allowing the system to track and propagate inference usage metrics alongside error descriptions. This is valuable for comprehensive error reporting, accurate billing, and resource monitoring, even when operations fail.lib/storage/src/content_manager/errors.rs (4)
5-5: LGTM: Import added for InferenceUsageAdding this import properly supports the changes to the error types to include inference usage data.
40-43: Good extension of the InferenceError variantThis change maintains consistency with the corresponding change to
CollectionErrorin the first file, ensuring that inference usage data can be propagated through the error handling chain.
52-59: Appropriate update to the inference_error constructorThe constructor method is properly updated to accept and store the new usage parameter, providing a clean API for creating inference errors with associated usage data.
159-161: Correct propagation of usage data in error mappingThis change ensures that inference usage data is not lost when errors are translated from
CollectionErrortoStorageError, maintaining data integrity across the error handling chain.lib/collection/src/operations/verification/query.rs (2)
7-9: LGTM: Import added for CollectionQueryRequestWithUsageThis import properly supports the new trait implementation added in this file.
139-187: Clean implementation of StrictModeVerification for the wrapper typeThis implementation elegantly uses delegation to the nested
requestobject, minimizing code duplication while ensuring consistent behavior between the two implementations. The wrapper pattern allows the system to carry inference usage data without disrupting existing logic.src/common/inference/query_requests_rest.rs (4)
4-6: LGTM: Imports added for wrapped request typesThese imports properly support the changes to return types in the conversion functions.
25-25: Appropriate return type updates for conversion functionsChanging the return types to include the wrapped request types with usage data ensures that inference usage metrics are propagated through the request processing pipeline.
Also applies to: 104-104
40-51: Good extraction of inference usage dataThe code now properly extracts both the inference results and usage data from the batch processing, making this information available for further processing and error reporting.
Also applies to: 106-107
95-98: Correct wrapping of requests with usage dataThe code properly wraps the original request objects with usage data at the end of the conversion functions, preserving the existing logic flow while adding the new functionality.
Also applies to: 163-166
src/common/inference/update_requests.rs (26)
3-3: LGTM: Import added for the new InferenceUsage type.Good addition of the necessary import for the new
InferenceUsagetype from the REST models.
21-21: Function signature updated to return inference usage data.The return type has been appropriately updated to include the optional
InferenceUsagealongside the converted point structs.
45-52: LGTM: Properly destructuring inference results into data and usage.Good refactoring to destructure the tuple returned from the batch inference operation into
(inferred_data, usage), properly capturing both the inference results and usage metrics.
67-67: Better variable naming for clarity.Changing the variable name from
vectortovector_dataimproves code clarity by making it explicit that this is the data being processed, not just a reference.
76-77: Error message improvements.The error message has been enhanced to provide more specific context about the failure scenario, and properly includes the
Noneparameter for the optional inference usage.
87-98: Improved variable naming and error handling.Good renaming of
vectortovector_datafor clarity and proper error messaging with context about document inference.
109-118: Consistent variable naming pattern applied to image processing.Maintaining the consistent pattern of using
vector_datafor clarity in the image processing branch.
129-138: Consistent variable naming pattern applied to object processing.The same pattern of using
vector_datais applied to the object processing branch, maintaining consistency throughout the code.
159-159: LGTM: Return type matches updated function signature.The return statement now correctly includes both the converted points and the optional inference usage data.
165-165: Function signature updated for batch conversion.The return type has been appropriately updated to include the optional
InferenceUsagealongside the converted batch data.
172-173: Initialize inference usage tracking.Good initialization of the optional inference usage tracking variable that will be populated during the batch processing.
174-207: Comprehensive batch processing with inference tracking.The batch processing logic has been significantly enhanced to track inference usage while handling different vector types. The error messages for unsupported direct processing operations provide clear guidance.
209-215: LGTM: Clean batch construction and result return.The batch construction and result return are correctly implemented to include the inference usage data.
222-222: Function signature updated for point vectors conversion.The return type has been appropriately updated to include the optional
InferenceUsagealongside the converted point vectors.
238-245: Consistent pattern for handling inference results and usage.The same pattern used in other functions is applied here to destructure the tuple into inference results and usage data.
266-267: Improved error message for named vectors.The error message has been enhanced to provide more specific context about failures in named vectors, with proper inclusion of the optional inference usage parameter.
279-281: Consistent error message improvements for direct processing.Error messages have been enhanced consistently across different vector types to provide more specific context about unsupported direct processing operations.
Also applies to: 285-287, 291-293
297-301: Clean construction of persisted point vectors.The construction of persisted point vectors is clean and unchanged, as expected since it doesn't directly involve inference usage.
303-303: LGTM: Return type matches updated function signature.The return statement now correctly includes both the converted point vectors and the optional inference usage data.
323-324: Consistent variable naming in internal function.The internal function
convert_point_struct_with_inferredalso benefits from the improved variable naming fromvectortovector_data.
330-332: Improved code formatting for vector data processing.The code formatting for vector data processing has been improved for better readability and consistency across different vector types.
Also applies to: 345-346, 359-360
387-387: Function signature updated for vectors conversion.The return type has been appropriately updated to include the optional
InferenceUsagealongside the converted vectors.
398-405: Consistent pattern for handling inference results and usage.The same pattern used in other functions is applied here to destructure the tuple into inference results and usage data.
407-424: Improved vector conversion with inference usage tracking.The vector conversion logic has been enhanced to include inference usage tracking while maintaining the same overall structure and error handling.
425-425: LGTM: Return type matches updated function signature.The return statement now correctly maps the result to include both the converted vectors and the optional inference usage data.
439-440: Consistent error handling in vector conversion.Error handling in the vector conversion function is consistent and provides clear context about missing inferred vectors for different data types.
Also applies to: 444-446, 450-452
There was a problem hiding this comment.
Actionable comments posted: 12
♻️ Duplicate comments (3)
lib/api/src/rest/models.rs (1)
57-59: 🛠️ Refactor suggestionAdd
skip_serializing_ifattribute toinferencefield for consistency.The
hardwarefield has#[serde(skip_serializing_if = "Option::is_none")]to avoid serializingnullvalues, but theinferencefield is missing this attribute. This inconsistency could lead to different serialization behavior.Apply this diff to maintain consistency:
#[serde(skip_serializing_if = "Option::is_none")] pub hardware: Option<HardwareUsage>, +#[serde(skip_serializing_if = "Option::is_none")] pub inference: Option<InferenceUsage>,lib/storage/src/content_manager/toc/telemetry.rs (1)
23-35: Apply the same error handling improvement here.This method has the same silent error handling pattern as
get_telemetry_data.lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (1)
114-117: Performance consideration for cloning sparse vectors.Similar to the previous comment, cloning sparse vectors here could impact performance for large vectors.
🧹 Nitpick comments (25)
tests/consensus_tests/test_consensus_compaction.py (1)
132-132: Good improvement for test reliability.Adding
?wait=trueensures the metadata operation completes before proceeding, which improves test reliability in consensus scenarios.Consider whether the
sleep(CONSENSUS_WAIT_SECONDS)on line 134 is still necessary now that we're explicitly waiting for the operation to complete withwait=true.lib/segment/src/segment_constructor/segment_builder.rs (2)
100-109: Consider cleaner conditional compilation approach.The conditional compilation for RocksDB-specific parameters works but could be more readable. The placement of
#[cfg(feature = "rocksdb")]on individual parameters makes the function call harder to read.Consider using conditional blocks or feature-specific wrapper functions for better readability:
- let vector_storage = open_vector_storage( - #[cfg(feature = "rocksdb")] - &mut db_builder, - vector_config, - #[cfg(feature = "rocksdb")] - &Default::default(), - &vector_storage_path, - #[cfg(feature = "rocksdb")] - vector_name, - )?; + let vector_storage = { + #[cfg(feature = "rocksdb")] + { + open_vector_storage( + &mut db_builder, + vector_config, + &Default::default(), + &vector_storage_path, + vector_name, + )? + } + #[cfg(not(feature = "rocksdb"))] + { + open_vector_storage(vector_config, &vector_storage_path)? + } + };
123-132: Same readability concern with conditional compilation.Similar to the
open_vector_storagecall, the conditional compilation here could be more readable with block-level conditionals rather than parameter-level attributes.lib/storage/src/content_manager/snapshots/recover.rs (1)
172-189:Details
❓ Verification inconclusive
Consider batch operation for payload index registration.
The current implementation registers each payload index field via separate consensus operations. While functionally correct, this could lead to performance issues with large schemas.
Consider if the consensus system supports batch operations for payload index registration:
- for (field_name, field_schema) in schema.iter() { - let consensus_op = - CollectionMetaOperations::CreatePayloadIndex(CreatePayloadIndex { - collection_name: collection_pass.to_string(), - field_name: field_name.clone(), - field_schema: field_schema.clone(), - }); - - dispatcher - .submit_collection_meta_op(consensus_op, access.clone(), None) - .await?; - } + // If batch operations are supported: + let batch_ops: Vec<_> = schema.iter().map(|(field_name, field_schema)| { + CollectionMetaOperations::CreatePayloadIndex(CreatePayloadIndex { + collection_name: collection_pass.to_string(), + field_name: field_name.clone(), + field_schema: field_schema.clone(), + }) + }).collect(); + + dispatcher.submit_batch_collection_meta_ops(batch_ops, access.clone(), None).await?;Let me check if batch consensus operations are available:
🏁 Script executed:
#!/bin/bash # Check if batch consensus operations are supported in the dispatcher ast-grep --pattern 'impl $_ for Dispatcher { $$$ submit_batch_$_($$$) { $$$ } $$$ }'Length of output: 98
Dispatcher Lacks Batch Meta-Op API
I didn’t find any
submit_batch_*method inDispatcher, so batch‐submitting payload index operations isn’t currently supported. You have two paths forward:
- Verify consensus support: Confirm whether the underlying consensus service can handle batch index‐creation requests.
- Add batch API if needed: If it does, consider extending
Dispatcherwith asubmit_batch_collection_meta_opsmethod before applying the diff below. Otherwise, retaining the per‐field loop is the only option today.File: lib/storage/src/content_manager/snapshots/recover.rs
Lines: 172–189- for (field_name, field_schema) in schema.iter() { - let consensus_op = - CollectionMetaOperations::CreatePayloadIndex(CreatePayloadIndex { - collection_name: collection_pass.to_string(), - field_name: field_name.clone(), - field_schema: field_schema.clone(), - }); - - dispatcher - .submit_collection_meta_op(consensus_op, access.clone(), None) - .await?; - } + // Batch operation isn’t available in Dispatcher today. + // If you add `submit_batch_collection_meta_ops` to Dispatcher, + // you can replace this loop with a single batch call: + // + // let batch_ops: Vec<_> = schema.iter().map(|(field_name, field_schema)| { + // CollectionMetaOperations::CreatePayloadIndex(CreatePayloadIndex { + // collection_name: collection_pass.to_string(), + // field_name: field_name.clone(), + // field_schema: field_schema.clone(), + // }) + // }).collect(); + // dispatcher + // .submit_batch_collection_meta_ops(batch_ops, access.clone(), None) + // .await?; + for (field_name, field_schema) in schema.iter() { + dispatcher + .submit_collection_meta_op( + CollectionMetaOperations::CreatePayloadIndex(CreatePayloadIndex { + collection_name: collection_pass.to_string(), + field_name: field_name.clone(), + field_schema: field_schema.clone(), + }), + access.clone(), + None, + ) + .await?; + }Please verify consensus capabilities or implement the batch API before adopting this change.
lib/segment/Cargo.toml (1)
26-28: Important: Document the partial RocksDB disablingThe comment correctly notes that disabling the RocksDB feature only partially disables RocksDB usage. This is valuable documentation for future work.
Would you like me to help identify remaining RocksDB dependencies that need to be gated behind this feature flag?
tests/consensus_tests/test_force_delete_peer.py (1)
39-42: Simplified test flow is more maintainable.The direct use of
replicate_shardhelper function makes the test more focused and easier to understand compared to the previous complex flow involving process management.Note: The static analysis tool flags
replicate_shardas potentially undefined, but based on the relevant code snippets, it's correctly defined inutils.pyand imported via the star import on line 2. Consider replacing the star import with explicit imports for better code clarity and to avoid static analysis warnings:-from .utils import * +from .utils import start_cluster, wait_collection_exists_and_active_on_all_peers, wait_for_collection_shard_transfers_count, assert_project_root, replicate_shard🧰 Tools
🪛 Ruff (0.11.9)
42-42:
replicate_shardmay be undefined, or defined from star imports(F405)
lib/storage/src/content_manager/toc/mod.rs (1)
10-10: Good modularization of telemetry functionality.Adding the
telemetrymodule improves code organization by separating telemetry-related functionality into a dedicated module. This aligns with the refactoring described in the AI summary whereget_telemetry_dataandget_aggregated_telemetry_datamethods were moved out of the mainTableOfContentimplementation.The module is appropriately placed in alphabetical order within the module declarations.
Note: While the AI summary indicates that telemetry methods were removed from this file, those removals are not visible in the provided diff, so I cannot review the actual method removals or verify the refactoring is complete.
lib/segment/src/segment_constructor/rocksdb_builder.rs (1)
34-46: Clean up unused variable warning in conditional compilation.The conditional compilation pattern correctly gates RocksDB-specific logic, but the fallback path creates unused variables that will trigger compiler warnings.
Apply this diff to eliminate the unused variable warning:
.filter_map(|(vector_name, config)| { #[cfg(feature = "rocksdb")] if matches!( config.storage_type, crate::types::SparseVectorStorageType::OnDisk ) { return Some(get_vector_name_with_prefix( rocksdb_wrapper::DB_VECTOR_CF, vector_name, )); } - let (_, _) = (vector_name, config); + #[cfg(not(feature = "rocksdb"))] + let _ = (vector_name, config); None });lib/storage/src/content_manager/toc/telemetry.rs (1)
8-21: Consider logging errors when collection retrieval fails.The current implementation silently skips collections that fail to load. This could hide important issues like corrupted collections or permission problems.
Consider adding error logging to help with debugging:
for collection_pass in &all_collections { - if let Ok(collection) = self.get_collection(collection_pass).await { + match self.get_collection(collection_pass).await { + Ok(collection) => { result.push(collection.get_telemetry_data(detail).await); + } + Err(e) => { + log::warn!("Failed to get telemetry data for collection {}: {}", collection_pass.name(), e); + } } }tests/consensus_tests/test_partial_snapshot.py (1)
46-62: Test logic is sound but consider making it more robust.The test correctly validates that search requests are rejected during recovery, but the polling approach could be improved for reliability.
Consider adding a small delay and timeout mechanism to make the test more robust:
def test_partial_snapshot_read_lock(tmp_path: pathlib.Path): assert_project_root() write_peer, read_peer = bootstrap_peers(tmp_path, 100_000) executor = concurrent.futures.ThreadPoolExecutor(max_workers = 1) recover_future = executor.submit(recover_partial_snapshot_from, read_peer, write_peer) is_search_rejected = False + attempts = 0 + max_attempts = 50 # Prevent infinite loop - while not recover_future.done(): + while not recover_future.done() and attempts < max_attempts: response = try_search_random(read_peer) if response.status_code == 500: is_search_rejected = True break + attempts += 1 + time.sleep(0.1) # Small delay between attempts assert is_search_rejected, "Search requests have to be rejected during partial snapshot recovery"🧰 Tools
🪛 Ruff (0.11.9)
47-47:
assert_project_rootmay be undefined, or defined from star imports(F405)
lib/collection/src/shards/shard_holder/mod.rs (1)
1020-1023: Consider removing the underscore prefix from the lock variable.The
_partial_snapshot_search_lockvariable holds an important lock guard that ensures exclusive access during recovery. Consider removing the underscore prefix to make it clear that this variable serves an important purpose through its RAII behavior.- let _partial_snapshot_search_lock = self + let partial_snapshot_search_lock = self .take_partial_snapshot_search_write_lock(shard_id, recovery_type) .await?;lib/posting_list/src/iterator.rs (1)
45-49: Consider using the visitor's public API instead of accessing internal fields.The implementation is correct, but accessing
visitor.listdirectly breaks encapsulation. Consider usingself.visitor.len()instead.impl<H: ValueHandler> ExactSizeIterator for PostingIterator<'_, H> { fn len(&self) -> usize { - self.visitor.list.len().saturating_sub(self.offset) + self.visitor.len().saturating_sub(self.offset) } }lib/posting_list/src/visitor.rs (1)
67-108: Consider adding bounds check optimizationThe
get_by_offsetmethod performs bounds checking, but you could optimize the chunk boundary cases by caching the total chunk count.lib/collection/src/shards/replica_set/partial_snapshot_meta.rs (2)
60-70: Consider more specific error typesThe error returned when failing to acquire the search read lock uses a generic
ServiceError. Consider creating a more specific error variant for this case to help with error handling and debugging.- .map_err(|_| CollectionError::ServiceError { - error: "shard unavailable, partial snapshot recovery is in progress".into(), - backtrace: None, - }) + .map_err(|_| CollectionError::ShardUnavailable { + reason: "partial snapshot recovery in progress".into(), + })
107-112: Potential integer overflow in request counterThe request counter uses
fetch_addwithout checking for overflow. While unlikely in practice, consider usingfetch_addwithOrdering::Relaxedand checking for overflow if the counter could potentially wrap around.lib/segment/src/index/hnsw_index/graph_layers_builder.rs (1)
501-638: Well-implemented healing search algorithm with room for optimization.The
search_shortcuts_on_levelmethod effectively implements the dual-graph search for finding border points between deleted and non-deleted regions. The implementation is correct and well-documented.Consider addressing the TODO comments (lines 548-550) about using a priority queue instead of a stack for potentially better search navigation in the deleted sub-graph.
lib/segment/src/index/hnsw_index/hnsw.rs (1)
1582-1657: Complex but well-structured point migration with healing.The
migrate_pointmethod correctly implements the healing logic for migrating points with missing neighbors. The use ofsearch_shortcuts_on_levelto find replacement links is appropriate.Consider adding more detailed documentation about the healing heuristics and when they're applied.
lib/posting_list/src/posting_list.rs (1)
32-33: Document the purpose of#[repr(C)]on PostingChunkThe
#[repr(C)]attribute suggests this struct might be used for memory-mapped files, FFI, or requires a stable memory layout. Consider adding a comment explaining why this representation is needed.lib/posting_list/src/builder.rs (1)
41-42: Consider using stable sort for deterministic behaviorUsing
sort_unstable_by_keymeans elements with duplicate IDs may be reordered non-deterministically. If deterministic ordering is important for reproducibility or testing, consider usingsort_by_keyinstead.lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (2)
60-73: Consider clarifying the early return logic for non-existent vectors.The condition at line 61 returns
falsewhen trying to undelete a non-existent vector, which might be confusing sincefalsetypically indicates the vector was not previously deleted. Consider either documenting this edge case or returning a more explicit error/option type.
158-161: Clarify the deletion logic for better readability.The negation in line 159 is correct but confusing. Consider using a more descriptive variable name or adding a comment to clarify that we're inverting the previous state to indicate whether the deletion was successful.
fn delete_vector(&mut self, key: PointOffsetType) -> OperationResult<bool> { - let is_deleted = !self.set_deleted(key, true); - Ok(is_deleted) + let was_previously_deleted = self.set_deleted(key, true); + let deletion_was_successful = !was_previously_deleted; + Ok(deletion_was_successful) }lib/posting_list/src/view.rs (1)
124-126: Document safety requirements for unchecked access.Since this method provides unchecked access, consider adding a safety comment documenting the preconditions that callers must ensure.
+ /// Returns sized values for the given chunk without bounds checking. + /// + /// # Safety + /// Caller must ensure that chunk_idx is within bounds of self.chunks. pub(crate) fn sized_values_unchecked(&self, chunk_idx: usize) -> &[H::Sized] { &self.chunks[chunk_idx].sized_values }lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (1)
269-284: Clean conditional compilation for storage creation!The RocksDB-specific imports are properly scoped within the feature flag.
Consider adding a compile-time error or panic for unsupported configurations:
match storage_type { #[cfg(feature = "rocksdb")] MultiDenseStorageType::RocksDbFloat => { // existing code... } + #[cfg(not(feature = "rocksdb"))] + MultiDenseStorageType::RocksDbFloat => { + panic!("RocksDB storage requested but rocksdb feature not enabled"); + } MultiDenseStorageType::AppendableMmapFloat => // existing code... }lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (1)
86-86: Consider documenting the performance implications of cloning.The
cloned()operation on sparse vectors could be expensive for vectors with many non-zero elements. Consider adding a comment explaining why cloning is necessary here for ownership in volatile storage.} else { + // Clone is necessary as we need to own the vector in volatile storage *entry = vector.cloned(); }lib/segment/src/segment_constructor/segment_constructor_base.rs (1)
720-745: Consider documenting the migration strategy.The logic for choosing between RocksDB and file-based mutable trackers is crucial for backward compatibility. Consider adding a comment explaining the migration path and when the RocksDB check can be removed.
#[cfg(feature = "rocksdb")] { use crate::common::rocksdb_wrapper::DB_MAPPING_CF; + // Migration logic: Use RocksDB tracker if there are existing mappings in the database. + // This ensures backward compatibility with segments created before the file-based tracker. + // TODO(1.15 or later): remove this check and use new mutable ID tracker unconditionally let use_rocksdb_mutable_tracker = if let Some(db) = db_builder.read() { - // New ID tracker is enabled by default, but we still use the old tracker if we have - // any mappings stored in RocksDB - // - // TODO(1.15 or later): remove this check and use new mutable ID tracker unconditionally
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (100)
Cargo.toml(7 hunks)Dockerfile(1 hunks)config/config.yaml(1 hunks)docs/redoc/master/openapi.json(5 hunks)lib/api/src/rest/conversions.rs(2 hunks)lib/api/src/rest/models.rs(3 hunks)lib/collection/Cargo.toml(1 hunks)lib/collection/src/collection/snapshots.rs(1 hunks)lib/collection/src/collection_manager/holders/segment_holder.rs(8 hunks)lib/collection/src/collection_manager/optimizers/segment_optimizer.rs(8 hunks)lib/collection/src/operations/types.rs(3 hunks)lib/collection/src/operations/verification/query.rs(7 hunks)lib/collection/src/problems/unindexed_field.rs(11 hunks)lib/collection/src/shards/local_shard/mod.rs(2 hunks)lib/collection/src/shards/replica_set/execute_read_operation.rs(2 hunks)lib/collection/src/shards/replica_set/mod.rs(5 hunks)lib/collection/src/shards/replica_set/partial_snapshot_meta.rs(1 hunks)lib/collection/src/shards/replica_set/snapshots.rs(3 hunks)lib/collection/src/shards/replica_set/telemetry.rs(2 hunks)lib/collection/src/shards/shard_holder/mod.rs(2 hunks)lib/collection/src/shards/telemetry.rs(2 hunks)lib/collection/src/tests/mod.rs(5 hunks)lib/collection/src/update_handler.rs(2 hunks)lib/common/common/src/fixed_length_priority_queue.rs(2 hunks)lib/common/common/src/flags.rs(3 hunks)lib/gridstore/benches/bitmask_bench.rs(1 hunks)lib/gridstore/benches/real_data_bench.rs(1 hunks)lib/gridstore/src/bitmask/gaps.rs(1 hunks)lib/posting_list/Cargo.toml(1 hunks)lib/posting_list/src/builder.rs(1 hunks)lib/posting_list/src/iterator.rs(1 hunks)lib/posting_list/src/lib.rs(1 hunks)lib/posting_list/src/posting_list.rs(1 hunks)lib/posting_list/src/tests.rs(1 hunks)lib/posting_list/src/value_handler.rs(1 hunks)lib/posting_list/src/view.rs(1 hunks)lib/posting_list/src/visitor.rs(1 hunks)lib/segment/Cargo.toml(4 hunks)lib/segment/benches/dynamic_mmap_flags.rs(1 hunks)lib/segment/benches/mmap_bitslice_buffered_update_wrapper.rs(1 hunks)lib/segment/src/fixtures/payload_fixtures.rs(1 hunks)lib/segment/src/id_tracker/id_tracker_base.rs(22 hunks)lib/segment/src/id_tracker/mod.rs(2 hunks)lib/segment/src/id_tracker/simple_id_tracker.rs(2 hunks)lib/segment/src/index/field_index/full_text_index/text_index.rs(1 hunks)lib/segment/src/index/field_index/mod.rs(1 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs(2 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs(2 hunks)lib/segment/src/index/hnsw_index/gpu/mod.rs(2 hunks)lib/segment/src/index/hnsw_index/graph_layers.rs(1 hunks)lib/segment/src/index/hnsw_index/graph_layers_builder.rs(6 hunks)lib/segment/src/index/hnsw_index/hnsw.rs(11 hunks)lib/segment/src/index/hnsw_index/search_context.rs(1 hunks)lib/segment/src/segment_constructor/rocksdb_builder.rs(3 hunks)lib/segment/src/segment_constructor/segment_builder.rs(3 hunks)lib/segment/src/segment_constructor/segment_constructor_base.rs(10 hunks)lib/segment/src/spaces/simple_avx.rs(1 hunks)lib/segment/src/spaces/simple_sse.rs(1 hunks)lib/segment/src/types.rs(5 hunks)lib/segment/src/vector_storage/common.rs(2 hunks)lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs(7 hunks)lib/segment/src/vector_storage/dense/mod.rs(1 hunks)lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs(1 hunks)lib/segment/src/vector_storage/mod.rs(1 hunks)lib/segment/src/vector_storage/multi_dense/mod.rs(1 hunks)lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs(1 hunks)lib/segment/src/vector_storage/quantized/quantized_vectors.rs(2 hunks)lib/segment/src/vector_storage/raw_scorer.rs(2 hunks)lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs(1 hunks)lib/segment/src/vector_storage/sparse/mod.rs(1 hunks)lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs(1 hunks)lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs(1 hunks)lib/segment/src/vector_storage/tests/async_raw_scorer.rs(2 hunks)lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs(3 hunks)lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs(12 hunks)lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs(8 hunks)lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs(2 hunks)lib/segment/src/vector_storage/vector_storage_base.rs(44 hunks)lib/storage/src/content_manager/conversions.rs(1 hunks)lib/storage/src/content_manager/errors.rs(6 hunks)lib/storage/src/content_manager/snapshots/recover.rs(3 hunks)lib/storage/src/content_manager/toc/create_collection.rs(1 hunks)lib/storage/src/content_manager/toc/mod.rs(1 hunks)lib/storage/src/content_manager/toc/telemetry.rs(1 hunks)lib/storage/src/types.rs(1 hunks)lib/storage/tests/integration/alias_tests.rs(1 hunks)src/actix/api/query_api.rs(7 hunks)src/actix/api/snapshot_api.rs(3 hunks)src/actix/helpers.rs(7 hunks)src/common/inference/update_requests.rs(13 hunks)src/common/snapshots.rs(2 hunks)src/common/telemetry_ops/collections_telemetry.rs(2 hunks)src/common/update.rs(10 hunks)tests/consensus_tests/test_consensus_compaction.py(1 hunks)tests/consensus_tests/test_force_delete_peer.py(1 hunks)tests/consensus_tests/test_partial_snapshot.py(3 hunks)tests/consensus_tests/test_shard_wal_delta_transfer.py(4 hunks)tests/consensus_tests/utils.py(0 hunks)tests/openapi/test_service.py(1 hunks)tests/openapi/test_strictmode.py(12 hunks)
💤 Files with no reviewable changes (1)
- tests/consensus_tests/utils.py
✅ Files skipped from review due to trivial changes (15)
- lib/storage/tests/integration/alias_tests.rs
- lib/segment/benches/dynamic_mmap_flags.rs
- Dockerfile
- lib/segment/src/vector_storage/mod.rs
- lib/gridstore/src/bitmask/gaps.rs
- lib/segment/benches/mmap_bitslice_buffered_update_wrapper.rs
- lib/collection/src/shards/replica_set/mod.rs
- lib/segment/src/index/field_index/mod.rs
- lib/segment/src/vector_storage/dense/mod.rs
- lib/posting_list/Cargo.toml
- lib/segment/src/vector_storage/sparse/mod.rs
- lib/segment/src/vector_storage/multi_dense/mod.rs
- lib/segment/src/id_tracker/id_tracker_base.rs
- lib/collection/Cargo.toml
- lib/common/common/src/fixed_length_priority_queue.rs
🚧 Files skipped from review as they are similar to previous changes (9)
- lib/collection/src/operations/types.rs
- lib/api/src/rest/conversions.rs
- src/actix/api/query_api.rs
- lib/storage/src/content_manager/errors.rs
- src/actix/helpers.rs
- src/common/update.rs
- docs/redoc/master/openapi.json
- src/common/inference/update_requests.rs
- lib/collection/src/operations/verification/query.rs
🧰 Additional context used
🧠 Learnings (1)
lib/segment/src/vector_storage/vector_storage_base.rs (1)
Learnt from: generall
PR: qdrant/qdrant#6323
File: lib/segment/src/vector_storage/quantized/quantized_mmap_storage.rs:12-16
Timestamp: 2025-04-07T23:31:22.515Z
Learning: The `populate()` method in `QuantizedMmapStorage` correctly returns void instead of a `Result` because it directly implements the `Madviseable` trait which defines `populate(&self)` without a return type. Adding an unnecessary `Ok(())` return would trigger Clippy warnings about unnecessary Result wrapping.
🧬 Code Graph Analysis (17)
lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (1)
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (1)
new_volatile_sparse_vector_storage(34-36)
lib/collection/src/shards/replica_set/telemetry.rs (1)
src/actix/api/service_api.rs (1)
telemetry(36-59)
lib/segment/src/index/hnsw_index/graph_layers.rs (1)
lib/segment/src/index/hnsw_index/search_context.rs (1)
new(16-21)
tests/consensus_tests/test_force_delete_peer.py (2)
tests/consensus_tests/fixtures.py (1)
upsert_random_points(83-129)tests/consensus_tests/utils.py (1)
replicate_shard(793-802)
tests/consensus_tests/test_shard_wal_delta_transfer.py (1)
tests/consensus_tests/utils.py (1)
wait_for_collection_shard_transfers_count(616-622)
lib/collection/src/collection/snapshots.rs (3)
lib/collection/src/shards/replica_set/snapshots.rs (1)
try_take_partial_snapshot_recovery_lock(70-74)lib/collection/src/shards/shard_holder/mod.rs (1)
try_take_partial_snapshot_recovery_lock(1072-1088)src/common/snapshots.rs (1)
try_take_partial_snapshot_recovery_lock(318-337)
lib/collection/src/shards/telemetry.rs (1)
lib/segment/src/common/anonymize.rs (13)
anonymize(45-45)anonymize(49-51)anonymize(55-57)anonymize(61-63)anonymize(67-71)anonymize(75-79)anonymize(115-119)anonymize(123-127)anonymize(131-135)anonymize(139-148)anonymize(152-154)anonymize(158-162)anonymize(166-181)
src/actix/api/snapshot_api.rs (4)
lib/collection/src/shards/replica_set/snapshots.rs (1)
try_take_partial_snapshot_recovery_lock(70-74)lib/collection/src/shards/shard_holder/mod.rs (1)
try_take_partial_snapshot_recovery_lock(1072-1088)src/common/snapshots.rs (1)
try_take_partial_snapshot_recovery_lock(318-337)src/actix/helpers.rs (2)
process_response_error(75-101)time(108-114)
lib/posting_list/src/value_handler.rs (2)
lib/posting_list/src/lib.rs (1)
from_bytes(29-29)lib/posting_list/src/tests.rs (1)
from_bytes(22-25)
lib/storage/src/content_manager/toc/mod.rs (1)
src/actix/api/service_api.rs (1)
telemetry(36-59)
lib/collection/src/shards/replica_set/snapshots.rs (4)
lib/collection/src/collection/snapshots.rs (1)
try_take_partial_snapshot_recovery_lock(340-349)lib/collection/src/shards/shard_holder/mod.rs (1)
try_take_partial_snapshot_recovery_lock(1072-1088)src/common/snapshots.rs (1)
try_take_partial_snapshot_recovery_lock(318-337)lib/collection/src/shards/replica_set/partial_snapshot_meta.rs (2)
take_search_write_lock(56-58)try_take_search_read_lock(60-70)
lib/posting_list/src/iterator.rs (3)
lib/posting_list/src/posting_list.rs (1)
visitor(82-85)lib/posting_list/src/view.rs (2)
visitor(68-70)len(200-202)lib/posting_list/src/visitor.rs (2)
new(22-28)len(30-32)
lib/api/src/rest/models.rs (2)
lib/segment/src/common/anonymize.rs (13)
anonymize(45-45)anonymize(49-51)anonymize(55-57)anonymize(61-63)anonymize(67-71)anonymize(75-79)anonymize(115-119)anonymize(123-127)anonymize(131-135)anonymize(139-148)anonymize(152-154)anonymize(158-162)anonymize(166-181)lib/api/src/grpc/ops.rs (1)
is_empty(26-33)
lib/posting_list/src/posting_list.rs (4)
lib/posting_list/src/view.rs (1)
visitor(68-70)lib/posting_list/src/iterator.rs (1)
new(16-22)lib/posting_list/src/visitor.rs (1)
new(22-28)lib/posting_list/src/builder.rs (1)
new(25-27)
tests/consensus_tests/test_partial_snapshot.py (4)
tests/consensus_tests/utils.py (1)
assert_project_root(110-113)src/actix/api/snapshot_api.rs (1)
recover_partial_snapshot_from(676-779)tests/consensus_tests/assertions.py (1)
assert_http_ok(6-13)tests/consensus_tests/fixtures.py (1)
random_dense_vector(19-20)
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (3)
lib/segment/src/vector_storage/vector_storage_base.rs (29)
size_of_available_vectors_in_bytes(152-154)size_of_available_vectors_in_bytes(181-181)size_of_available_vectors_in_bytes(469-540)distance(47-47)distance(644-691)datatype(49-49)datatype(693-740)total_vector_count(56-56)total_vector_count(793-840)get_vector(71-71)get_vector(842-889)get_vector_sequential(74-74)get_vector_sequential(891-938)get_vector_opt(77-77)get_vector_opt(940-987)insert_vector(79-84)insert_vector(989-1065)update_from(92-96)update_from(1067-1134)flusher(98-98)flusher(1136-1183)files(100-100)files(1185-1232)delete_vector(109-109)delete_vector(1234-1281)is_deleted_vector(112-112)is_deleted_vector(1283-1330)deleted_vector_bitslice(134-134)deleted_vector_bitslice(1381-1428)lib/segment/src/common/operation_error.rs (1)
check_process_stopped(242-247)lib/segment/src/data_types/named_vectors.rs (1)
default_sparse(99-101)
lib/segment/src/segment_constructor/segment_constructor_base.rs (3)
lib/segment/src/common/operation_error.rs (2)
check_process_stopped(242-247)service_error(84-89)lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (1)
open_simple_sparse_vector_storage(41-77)lib/segment/src/id_tracker/immutable_id_tracker.rs (1)
mappings_file_path(345-347)
🪛 Ruff (0.11.9)
tests/consensus_tests/test_force_delete_peer.py
42-42: replicate_shard may be undefined, or defined from star imports
(F405)
tests/consensus_tests/test_shard_wal_delta_transfer.py
205-205: wait_for_collection_shard_transfers_count may be undefined, or defined from star imports
(F405)
304-304: wait_for_collection_shard_transfers_count may be undefined, or defined from star imports
(F405)
325-325: wait_for_collection_shard_transfers_count may be undefined, or defined from star imports
(F405)
453-453: wait_for_collection_shard_transfers_count may be undefined, or defined from star imports
(F405)
tests/consensus_tests/test_partial_snapshot.py
36-36: assert_project_root may be undefined, or defined from star imports
(F405)
47-47: assert_project_root may be undefined, or defined from star imports
(F405)
128-128: assert_http_ok may be undefined, or defined from star imports
(F405)
142-142: assert_http_ok may be undefined, or defined from star imports
(F405)
148-148: random_dense_vector may be undefined, or defined from star imports
(F405)
🔇 Additional comments (160)
lib/gridstore/benches/real_data_bench.rs (1)
2-2: LGTM! Good modernization ofblack_boximport.The change from
criterion::black_boxtostd::hint::black_boxis a positive modernization that reduces dependency on the criterion crate. Theblack_boxfunction became part of the standard library in Rust 1.66, making this the preferred approach for benchmarking.Also applies to: 7-7
lib/gridstore/benches/bitmask_bench.rs (1)
1-1: LGTM! Consistent modernization across benchmark files.The import change maintains consistency with the other benchmark files mentioned in the AI summary. Using
std::hint::black_boxis the modern standard approach and reduces external dependencies.Also applies to: 4-4
lib/segment/src/spaces/simple_avx.rs (1)
12-15: LGTM! Clean removal of redundant unsafe blocks.The horizontal sum implementation logic is correct and follows the standard pattern for reducing AVX vectors to scalars. Removing the redundant inner
unsafeblocks improves code clarity since the entire function is already marked asunsafe.lib/segment/src/spaces/simple_sse.rs (1)
14-16: LGTM! Consistent cleanup matching the AVX version.The SSE horizontal sum implementation is correct and the removal of redundant inner
unsafeblocks is consistent with the similar cleanup performed in the AVX version. This improves code consistency across SIMD implementations.lib/segment/src/index/field_index/full_text_index/text_index.rs (1)
448-787: Test infrastructure looks comprehensive but unrelated to PR objectives.This test module addition provides thorough testing for full-text index implementations across different storage backends. However, these changes appear unrelated to the PR's stated objective of "Add inference usage tracking to REST API".
The test implementation is well-structured with good coverage of different index types and scenarios.
Likely an incorrect or invalid review comment.
config/config.yaml (1)
241-244: New collection limit configuration is unrelated to PR objectives.The
max_collectionsparameter addition is a useful feature for limiting collection creation, but it doesn't align with the PR's stated goal of adding inference usage tracking to the REST API.The configuration follows the established pattern and is properly documented.
Likely an incorrect or invalid review comment.
lib/storage/src/content_manager/conversions.rs (1)
48-48: Error handling addition is correct but unrelated to PR objectives.The handling of
StorageError::ShardUnavailablewithtonic::Code::Unavailableis the appropriate gRPC status code for this error type. However, this change doesn't relate to the PR's stated purpose of adding inference usage tracking.The implementation follows the existing pattern correctly.
Likely an incorrect or invalid review comment.
lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (1)
219-219: LGTM! Good refactoring to consolidate shared constants.Moving
SPARSE_VECTOR_DISTANCEto the parent module eliminates duplication and provides a single source of truth for this constant across sparse vector storage implementations.lib/storage/src/types.rs (1)
107-109: Well-implemented configuration addition.The new
max_collectionsfield follows best practices with proper serde annotations, clear documentation, and appropriate typing usingOption<usize>for optional configuration.lib/segment/src/vector_storage/sparse/simple_sparse_vector_storage.rs (1)
177-177: Consistent with the sparse vector storage refactoring.This change aligns with the consolidation of
SPARSE_VECTOR_DISTANCEto the parent module, maintaining consistency across sparse vector storage implementations.tests/openapi/test_service.py (2)
87-87: LGTM! Test correctly updated for new telemetry data.The addition of
'partial_snapshot'to the expected shard keys appropriately reflects the new partial snapshot telemetry functionality mentioned in the AI summary.
87-87: Inconsistency between PR objectives and file changes.The PR title mentions "Add inference usage tracking to REST API" but this file contains changes related to partial snapshot telemetry, not inference usage tracking. Please verify this change belongs in this PR.
Likely an incorrect or invalid review comment.
lib/common/common/src/flags.rs (3)
25-26: LGTM! Feature flag properly implemented.The new
hnsw_healingfeature flag follows the established pattern with clear documentation and appropriate conservative default.
25-26: Inconsistency with PR objectives.This change adds HNSW healing functionality, which doesn't align with the stated PR objective of "Add inference usage tracking to REST API". Please verify this change belongs in this PR.
Likely an incorrect or invalid review comment.
35-35: Implementation correctly integrated across all required locations.The feature flag is properly:
- Given a conservative default value of
false- Included in the destructuring pattern for initialization
- Enabled when the
allflag is setAlso applies to: 54-54, 61-61
lib/segment/src/vector_storage/common.rs (3)
3-4: LGTM! Proper conditional compilation for RocksDB features.The serde imports are correctly gated behind the "rocksdb" feature flag, ensuring serialization traits are only available when needed.
17-18: Appropriate feature gating for RocksDB-specific types.The
StoredRecord<T>struct is correctly made conditional on the "rocksdb" feature, aligning with the modular build approach for RocksDB dependencies.
3-4: Inconsistency with PR objectives.These changes relate to RocksDB feature gating, not inference usage tracking as stated in the PR objectives. Please verify this change belongs in this PR.
Also applies to: 17-18
Likely an incorrect or invalid review comment.
lib/collection/src/shards/replica_set/execute_read_operation.rs (3)
125-125: LGTM! Proper concurrency control for partial snapshot operations.The addition of the partial snapshot search read lock appropriately prevents read operations during snapshot recovery phases, following RAII pattern with the lock guard.
160-160: Consistent locking implementation across read operation paths.The partial snapshot search read lock is correctly applied in both local and cluster read operation code paths, ensuring consistent concurrency control.
125-125: Inconsistency with PR objectives.These changes implement partial snapshot locking mechanisms, which don't align with the stated PR objective of "Add inference usage tracking to REST API". Please verify this change belongs in this PR.
Also applies to: 160-160
Likely an incorrect or invalid review comment.
lib/collection/src/collection/snapshots.rs (1)
340-349: Well-implemented delegation pattern.The new method correctly delegates to the
shards_holderand follows the established pattern seen in other snapshot-related methods in this file. The async signature and return type are consistent with the broader snapshot recovery locking mechanism.lib/segment/src/segment_constructor/segment_builder.rs (1)
486-487: Proper feature flag usage for RocksDB ID tracker.The conditional compilation for the RocksDB ID tracker match arm is correctly implemented and aligns with the broader effort to gate RocksDB-dependent code behind feature flags.
lib/segment/src/index/hnsw_index/search_context.rs (1)
16-21: Good simplification that improves separation of concerns.Removing the
entry_pointparameter from the constructor and initializing empty collections is a clean improvement. This separates the creation of the search context from the insertion of the initial candidate, making the API more intuitive and flexible.lib/segment/src/vector_storage/tests/test_appendable_sparse_vector_storage.rs (2)
19-19: LGTM: Import addition for volatile storageThe import correctly adds the function needed for volatile sparse vector storage.
136-136: LGTM: Transition to volatile storage for testingThis change replaces the RocksDB-backed storage with volatile in-memory storage, which improves test isolation and removes external dependencies. The volatile storage provides the same interface, so the test logic remains unaffected.
lib/collection/src/tests/mod.rs (4)
40-40: LGTM: Import addition for disk-backed storageThe import correctly adds
SaveOnDiskfor the new payload schema handling approach.
243-245: LGTM: Transition to disk-backed schema with shared ownershipThis change improves the architecture by using a thread-safe, disk-backed payload schema wrapped in
Arc<SaveOnDisk<PayloadIndexSchema>>. This provides better persistence and shared access compared to the previous in-memory approach.
266-266: LGTM: Consistent usage of cloned Arc referencesThe calls correctly use
.clone()to share the Arc-wrapped schema instead of passing references, which aligns with the new shared ownership model.Also applies to: 277-277, 322-322
243-322: Note: Changes appear unrelated to PR objectivesThese changes focus on payload schema handling refactoring, but the PR objectives mention "Add inference usage tracking to REST API". This might indicate that the AI summary or PR objectives don't accurately reflect the actual changes being made.
Likely an incorrect or invalid review comment.
lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs (2)
493-493: LGTM: Import addition for volatile storageThe import correctly adds the function needed for volatile dense vector storage in tests.
528-528: LGTM: Transition to volatile storage for GPU testsThis change replaces RocksDB-backed storage with volatile in-memory storage for GPU testing, which simplifies the test setup and removes external dependencies. The parameters (dimension and distance metric) are correctly preserved.
lib/segment/src/vector_storage/dense/memmap_dense_vector_storage.rs (2)
318-318: LGTM: Import addition for volatile storageThe import correctly adds the function needed for volatile dense vector storage in tests.
349-349: LGTM: Consistent transition to volatile storage across all testsAll test functions now use
new_volatile_dense_vector_storage(4, Distance::Dot)instead of RocksDB-backed storage. This change:
- Improves test isolation by removing external dependencies
- Maintains the same test functionality with simplified setup
- Uses consistent parameters (dimension=4, Distance::Dot) matching the test data
- Follows the same refactoring pattern applied across the codebase
The test logic remains unchanged, ensuring the same coverage while improving test performance and reliability.
Also applies to: 380-380, 442-442, 558-558, 631-631, 705-705
lib/segment/src/fixtures/payload_fixtures.rs (1)
112-128: LGTM! Well-implemented test fixture function.The
random_full_text_payloadfunction follows established patterns in the file and correctly generates random alphabetic strings for testing purposes. The implementation properly uses the random generator, respects the specified ranges, and returns the expected Vec format.lib/collection/src/shards/telemetry.rs (2)
24-25: LGTM! Proper integration of partial snapshot telemetry.The optional
partial_snapshotfield is correctly added with appropriate serialization attributes. Theskip_serializing_ifensures clean JSON output when the field is not present.
78-84: LGTM! Well-designed telemetry struct.The
PartialSnapshotTelemetrystruct is properly designed with:
- Meaningful field names that clearly indicate their purpose
- Appropriate types (usize for counts, u64 for timestamps)
- Correct anonymization settings for telemetry data
- All necessary derive macros for serialization and schema generation
lib/segment/src/index/hnsw_index/graph_layers.rs (1)
106-107: LGTM! Clean refactoring of SearchContext initialization.The separation of SearchContext creation from initial candidate insertion improves code clarity. This change aligns with the updated
SearchContext::newconstructor that now only accepts theefparameter, as shown in the relevant code snippet fromsearch_context.rs.tests/consensus_tests/test_shard_wal_delta_transfer.py (4)
205-206: LGTM! Consistent API migration to count-based waiting.The replacement of
wait_for_collection_shard_transfer_progresswithwait_for_collection_shard_transfers_countfollows a clear pattern: first wait for the transfer to start (count=1), then wait for completion (count=0). This approach is more reliable for testing transfer lifecycle.🧰 Tools
🪛 Ruff (0.11.9)
205-205:
wait_for_collection_shard_transfers_countmay be undefined, or defined from star imports(F405)
206-206:
wait_for_collection_shard_transfers_countmay be undefined, or defined from star imports(F405)
304-305: LGTM! Consistent with the established pattern.Same clean API migration pattern as seen in previous test functions.
🧰 Tools
🪛 Ruff (0.11.9)
304-304:
wait_for_collection_shard_transfers_countmay be undefined, or defined from star imports(F405)
305-305:
wait_for_collection_shard_transfers_countmay be undefined, or defined from star imports(F405)
325-326: LGTM! Maintains consistency across chain recovery test.The API migration is consistently applied in the chain recovery scenario.
🧰 Tools
🪛 Ruff (0.11.9)
325-325:
wait_for_collection_shard_transfers_countmay be undefined, or defined from star imports(F405)
326-326:
wait_for_collection_shard_transfers_countmay be undefined, or defined from star imports(F405)
453-454: LGTM! Final consistent application of the API change.The abort and retry test also follows the same reliable pattern for monitoring transfer progress.
Note: The static analysis warnings about
wait_for_collection_shard_transfers_countbeing undefined are false positives - the function is properly imported via the star import from.utilsand is defined in the utils module as shown in the relevant code snippets.🧰 Tools
🪛 Ruff (0.11.9)
453-453:
wait_for_collection_shard_transfers_countmay be undefined, or defined from star imports(F405)
454-454:
wait_for_collection_shard_transfers_countmay be undefined, or defined from star imports(F405)
lib/segment/src/id_tracker/mod.rs (2)
7-8: LGTM! Proper feature gating for RocksDB dependency.The conditional compilation correctly gates the
simple_id_trackermodule behind the "rocksdb" feature, which aligns with the broader pattern of isolating RocksDB-dependent code.
74-81: LGTM! Efficient conversion implementation.The
From<&ExtendedPointId> for PointIdTypeimplementation is well-designed:
- Takes a reference to avoid unnecessary cloning
- Properly handles both
NumIdandUuidvariants- Follows Rust conversion conventions
lib/collection/src/shards/replica_set/telemetry.rs (1)
30-35: LGTM! Well-structured partial snapshot telemetry integration.The implementation correctly:
- Wraps the telemetry data in
Some()to match the optional field type- Accesses the appropriate fields from
partial_snapshot_meta- Follows the existing telemetry collection pattern
src/common/telemetry_ops/collections_telemetry.rs (2)
20-22: LGTM! Proper field configuration for telemetry.The
max_collectionsfield is correctly configured:
Option<usize>type allows for absent valuesskip_serializing_if = "Option::is_none"prevents null values in JSON output#[anonymize(false)]appropriately excludes this configuration value from anonymization
50-54: LGTM! Clean integration of max collections telemetry.The implementation correctly retrieves the max collections value from the TOC and includes it in the telemetry struct, following the established pattern for other telemetry fields.
lib/storage/src/content_manager/snapshots/recover.rs (1)
151-160: LGTM! Proper payload index schema loading with error handling.The implementation correctly:
- Uses the established
SaveOnDiskpattern for loading configuration- Provides descriptive error messages with the file path
- Follows the existing error handling patterns in the codebase
lib/collection/src/shards/local_shard/mod.rs (1)
409-409: LGTM: Simplified payload index schema usageThe change from explicit read-lock acquisition and schema cloning to cloning the entire
Arc<SaveOnDisk<PayloadIndexSchema>>wrapper simplifies the API usage. This aligns with the broader refactoring mentioned in the summary.lib/segment/src/vector_storage/tests/async_raw_scorer.rs (2)
16-16: Good improvement: Using volatile storage for testsAdding the import for volatile dense vector storage improves test isolation by removing RocksDB dependencies.
62-70: Excellent test improvement: Replaced RocksDB with volatile storageThe changes to use volatile in-memory storage instead of RocksDB-backed storage provide several benefits:
- Faster test execution (no disk I/O)
- Better test isolation (no temporary directory management)
- Reduced external dependencies
The test logic remains unchanged while improving reliability and performance.
lib/segment/Cargo.toml (3)
15-15: Good: Added RocksDB to default featuresIncluding
rocksdbin the default feature set maintains backward compatibility while allowing selective disabling.
138-138: Consistent benchmark feature gatingProperly requiring the
rocksdbfeature for benchmarks that depend on RocksDB ensures they only run when the feature is enabled. This prevents build failures when RocksDB is disabled.Also applies to: 187-187, 192-192
133-133:Details
❓ Verification inconclusive
Minor dependency update: io-uring version bump
Updated from 0.7.6 to 0.7.7. This appears to be a patch version update.
🌐 Web query:
What are the changes between io-uring 0.7.6 and 0.7.7? Are there any breaking changes or security fixes?💡 Result:
Based on the search results, there is no specific information about changes between io-uring versions 0.7.6 and 0.7.7. The search results contain information about the io_uring Linux API and various other software projects, but none directly address version changes for an io-uring library at those specific versions.
The search results mention:
- A low-level io_uring userspace interface for Rust on GitHub[1], but without version-specific changelog information.
- General documentation about the io_uring Linux API[2][6].
- Changelogs from other projects like axum[3], netaddr[5], and iodine[8], but not for the io-uring library specifically.
- Performance characteristics of io_uring in Proxmox environments[4].
- A security issue with io_uring reported in April 2025 about a security blindspot allowing rootkit attacks[7], but this isn't tied to specific library versions.
Without specific changelog information for io-uring versions 0.7.6 and 0.7.7, I cannot provide details about changes, breaking changes, or security fixes between these versions. The search results don't contain this information.
Citations:
- 1: https://github.com/tokio-rs/io-uring
- 2: https://man.archlinux.org/man/io_uring.7.en
- 3: https://github.com/tokio-rs/axum/blob/main/axum/CHANGELOG.md
- 4: https://kb.blockbridge.com/technote/proxmox-aio-vs-iouring/index.html
- 5: https://netaddr.readthedocs.io/en/latest/changes.html
- 6: https://man7.org/linux/man-pages/man7/io_uring.7.html
- 7: https://www.bleepingcomputer.com/news/security/linux-io-uring-security-blindspot-allows-stealthy-rootkit-attacks/
- 8: https://github.com/boazsegev/iodine/blob/master/CHANGELOG.md
- 9: https://compomics.github.io/projects/reporter/wiki/ReleaseNotes
- 10: https://unixism.net/loti/ref-iouring/io_uring_enter.html
Verify upstream CHANGELOG for io-uring 0.7.7
This bump from 0.7.6 → 0.7.7 appears to be a patch-level update with no documented breaking changes in publicly available sources. However, I couldn’t locate a dedicated changelog entry for 0.7.7. Please review the io-uring repository’s CHANGELOG or release notes (e.g. https://github.com/tokio-rs/io-uring/blob/main/CHANGELOG.md) to confirm there are no unexpected breaking changes or security fixes before merging.
lib/segment/src/vector_storage/raw_scorer.rs (4)
57-67: Excellent: Consistent feature gating for dense vector storageThe conditional compilation properly gates:
- RocksDB-backed storage variants (
DenseSimple*) behind therocksdbfeature- Volatile storage variants (
DenseVolatile*) behind test configurationThis pattern enables partial RocksDB disabling while maintaining test coverage.
106-109: Consistent sparse vector storage gatingThe same feature gating pattern is applied to sparse vector storage, maintaining consistency across the codebase.
111-122: Complete multi-dense vector storage feature gatingAll multi-dense vector storage variants are properly gated with the same consistent pattern:
MultiDenseSimple*variants require therocksdbfeatureMultiDenseVolatile*variants are test-onlyThis completes the feature gating for all vector storage types.
51-142: Flag inconsistency with PR objectivesThe changes in this file (and the entire PR) appear to be about RocksDB feature gating and payload schema improvements, but the PR title indicates "Add inference usage tracking to REST API". There seems to be a mismatch between the stated objectives and the actual changes.
Likely an incorrect or invalid review comment.
src/actix/api/snapshot_api.rs (5)
40-40: LGTM! Import addition supports the new locking mechanism.The import for
try_take_partial_snapshot_recovery_lockis correctly added to support the partial snapshot recovery concurrency control.
613-621: Excellent concurrency control implementation.The early acquisition of the partial snapshot recovery lock prevents concurrent partial snapshot recoveries on the same shard. The error handling immediately returns an HTTP error response if lock acquisition fails, which is appropriate for the REST API context.
624-624: Good RAII pattern for lock management.Using
_recovery_lockwith an underscore prefix correctly indicates the variable is intentionally unused but held for its RAII properties. This ensures the lock is held for the entire duration of the async recovery operation.
691-704: Consistent lock acquisition pattern.The same lock acquisition and error handling pattern is correctly applied to the
recover_partial_snapshot_fromendpoint, maintaining consistency across both partial snapshot recovery endpoints.
707-707: Proper lock holding during recovery.The lock is correctly held during the async recovery operation using the same RAII pattern as the other endpoint.
tests/consensus_tests/test_force_delete_peer.py (2)
31-34: Improved mapping for convenience.The change from
peer_id_to_urltopeer_url_to_idmakes the subsequent code more readable by directly mapping URLs to IDs, which is how they're used in the test.
47-47: Updated comment reflects the simplified logic.The comment correctly describes the expected behavior after the test simplification.
src/common/snapshots.rs (2)
9-9: Appropriate imports for the new functionality.The added imports support the new
try_take_partial_snapshot_recovery_lockfunction:
VerificationPassfor access verificationDispatcherfor TOC accessOwnedMutexGuardfor the return typeAlso applies to: 17-17, 19-19
318-337: Well-implemented recovery lock acquisition function.The function follows excellent patterns:
- Proper access control: Checks global manage access before proceeding
- Clear separation of concerns: Uses dispatcher → TOC → collection hierarchy
- Appropriate error propagation: Returns
StorageErrorfor consistent error handling- Good return type:
Option<OwnedMutexGuard<()>>allows callers to handle lock unavailabilityThe implementation integrates cleanly with the existing codebase architecture and provides the necessary concurrency control for partial snapshot recovery operations.
lib/segment/src/index/hnsw_index/gpu/mod.rs (2)
111-111: LGTM! Improved test isolation.The switch from RocksDB-backed storage to volatile storage in tests improves test isolation and reduces external dependencies while maintaining the same interface for testing GPU index functionality.
134-134: LGTM! Consistent with test storage simplification.The use of volatile dense vector storage aligns with the import change and maintains test functionality while removing RocksDB dependency.
lib/segment/src/vector_storage/tests/custom_query_scorer_equivalency.rs (3)
27-27: LGTM! Appropriate import change for test simplification.Switching to volatile dense vector storage import aligns with the goal of removing RocksDB dependencies from tests.
56-58: LGTM! Simplified test storage creation.The
ram_storagefunction is now much cleaner and directly creates volatile storage without needing temporary directories or RocksDB setup. The unused_dirparameter maintains the function signature for compatibility.
118-118: LGTM! Consistent with storage simplification.Using volatile storage in the main test function aligns with the other changes and maintains scoring equivalency validation while removing external dependencies.
lib/segment/src/vector_storage/quantized/quantized_vectors.rs (4)
242-265: LGTM! Proper conditional compilation for RocksDB storage variants.The conditional compilation correctly gates RocksDB-dependent storage variants behind the
rocksdbfeature flag, maintaining clean separation of storage backends.
254-265: LGTM! Appropriate test-only compilation for volatile storage.The volatile storage variants are correctly gated behind
#[cfg(test)], making them available only during testing which aligns with their intended purpose.
234-341: Flag inconsistency between PR objectives and actual changes.The PR is titled "Add inference usage tracking to REST API" but the actual changes implement conditional compilation for RocksDB features and vector storage modularization. This appears to be either a mislabeled PR or the wrong changeset.
Likely an incorrect or invalid review comment.
293-321:Details
✅ Verification successful
Verify exhaustive pattern matching with conditional compilation.
The systematic conditional compilation of vector storage variants looks correct, but ensure that all possible enum variants are handled when different feature combinations are enabled/disabled.
🏁 Script executed:
#!/bin/bash # Check if VectorStorageEnum definition matches the conditional compilation patterns ast-grep --pattern 'enum VectorStorageEnum { $$$ }' # Also check for any compilation warnings about unreachable patterns rg "unreachable_patterns" lib/segment/src/vector_storage/Length of output: 113
🏁 Script executed:
#!/bin/bash # Locate the file defining VectorStorageEnum echo "Enum definition file:" rg -l "enum VectorStorageEnum" -n lib/segment/src/vector_storage # Display enum definition and its variants enum_file=$(rg -l "enum VectorStorageEnum" -n lib/segment/src/vector_storage) echo -e "\nEnum variants in $enum_file:" awk '/enum VectorStorageEnum/,/}/' "$enum_file" # Display all match arms for VectorStorageEnum in quantized_vectors.rs echo -e "\nMatch arms in quantized_vectors.rs:" rg -n "VectorStorageEnum::" -A0 lib/segment/src/vector_storage/quantized/quantized_vectors.rsLength of output: 7402
🏁 Script executed:
#!/bin/bash # Show match block in quantized_vectors.rs for exhaustive matching echo "Lines 200-400 of quantized_vectors.rs:" sed -n '200,400p' lib/segment/src/vector_storage/quantized/quantized_vectors.rsLength of output: 8888
Exhaustive matching confirmed for VectorStorageEnum
All enum variants inVectorStorageEnumare covered by thecreate()match—each with the correct#[cfg]guard corresponding to its definition. Rust’s compiler enforces exhaustiveness, and there are no missing or unreachable arms under any feature combination. No further changes required.lib/segment/src/vector_storage/tests/test_appendable_dense_vector_storage.rs (4)
16-19: LGTM! Appropriate feature gating for RocksDB dependency.The conditional import and new volatile storage import properly support testing without RocksDB dependency.
125-125: Good change to use volatile storage in the test helper.Using volatile storage in this shared test helper function is appropriate since it's now used by both RocksDB-backed and volatile storage tests. This makes the test lighter weight and removes unnecessary RocksDB dependency from the helper.
333-333: Correct feature gating for RocksDB-dependent tests.All tests using
open_simple_dense_vector_storageare properly gated with therocksdbfeature flag.Also applies to: 363-363, 392-392, 421-421
451-474: Well-structured volatile storage tests.The new test functions properly mirror the RocksDB tests, ensuring equivalent test coverage for volatile storage implementation. Good reuse of existing test helpers.
lib/collection/src/update_handler.rs (1)
428-428:Details
✅ Verification successful
API change looks good - ownership semantics updated for better concurrency.
The parameter type change from
&PayloadIndexSchematoArc<SaveOnDisk<PayloadIndexSchema>>enables shared ownership which is beneficial for concurrent access patterns. The Arc clone on line 613 is appropriate.Let me verify that all callers of this method have been updated:
Also applies to: 613-613
🏁 Script executed:
#!/bin/bash # Description: Verify all callers of ensure_appendable_segment_with_capacity have been updated to pass Arc # Search for calls to ensure_appendable_segment_with_capacity rg -A 5 "ensure_appendable_segment_with_capacity" --type rustLength of output: 2270
API change approved: ownership semantics and callers updated
All callers of
ensure_appendable_segment_with_capacitynow pass anArc<SaveOnDisk<PayloadIndexSchema>>—using.clone()where the schema is reused and the original Arc in the final invocation. No further changes are required.• lib/collection/src/update_handler.rs: function signature and internal call updated
• lib/collection/src/tests/mod.rs: test invocations updated with.clone()or original Arc as appropriatelib/segment/src/index/hnsw_index/gpu/gpu_vector_storage/mod.rs (3)
336-359: Proper conditional compilation for dense vector storage variants.The RocksDB variants are correctly gated with the
rocksdbfeature flag, and the new volatile variants are appropriately restricted to test builds with#[cfg(test)]. The implementation maintains consistency across all element types.
399-402: Consistent error handling for sparse volatile storage.The sparse volatile variant correctly returns the same "not supported" error as other sparse variants, maintaining API consistency.
406-435: Well-structured conditional compilation for multi-dense variants.The multi-dense storage variants follow the same conditional compilation pattern as dense variants, ensuring consistency across the codebase.
lib/collection/src/shards/replica_set/snapshots.rs (3)
49-53: Good addition of concurrent request tracking for partial snapshots.The guard properly tracks concurrent
create_partial_snapshotrequests only when needed (manifest is Some), enabling load balancing in the cluster manager. The guard is automatically dropped when the function exits.
70-84: Well-designed lock management methods for partial snapshot recovery.The three new methods provide appropriate synchronization primitives:
try_take_partial_snapshot_recovery_lock: Prevents concurrent recoveriestake_search_write_lock: Blocks searches during recoverytry_take_search_read_lock: Allows searches to fail fast during recoveryThe methods correctly delegate to the
partial_snapshot_metafor actual lock management.
400-402: Proper notification of partial snapshot recovery completion.The
snapshot_recovered()call is correctly placed after successful recovery and only for partial recovery types. This allows the system to update internal state and release any recovery-related resources.lib/api/src/rest/models.rs (3)
63-71: LGTM!The
is_emptymethod correctly checks bothhardwareandinferencefields, maintaining consistency with the gRPC implementation.
73-83: LGTM!The
mergemethod efficiently combines inference usage data using the entry API pattern. The token counts are correctly summed for models that appear in both instances.
103-113: LGTM!The new
InferenceUsageandModelUsagestructs are well-designed with appropriate derives for serialization, comparison, and schema generation. The use ofHashMap<String, ModelUsage>provides an efficient way to track usage per model.lib/storage/src/content_manager/toc/telemetry.rs (1)
37-39: LGTM!Simple and correct getter method for the max collections configuration.
lib/posting_list/src/value_handler.rs (3)
1-31: LGTM!The
ValueHandlertrait provides a clean abstraction for handling both fixed-size and variable-size values in posting lists. The documentation clearly explains the purpose and behavior of each method.
33-50: LGTM!The
SizedHandlerimplementation correctly handles fixed-size values by returning them unchanged. The empty vector forvar_sized_datais appropriate since no additional storage is needed.
52-102: LGTM!The
UnsizedHandlerimplementation is well-designed with:
- Proper overflow checks for both individual values and total size
- Efficient offset calculation using the windows pattern
- Correct range handling for the last element
- Clear error messages when limits are exceeded
The 4GB limit is reasonable for posting list data.
tests/consensus_tests/test_partial_snapshot.py (3)
35-45: LGTM! Well-designed concurrency test for recovery lock.The test logic correctly verifies that multiple concurrent recovery requests are properly rejected during an ongoing recovery operation. The use of ThreadPoolExecutor with multiple workers and the assertion on status code 400 appropriately validates the locking mechanism.
🧰 Tools
🪛 Ruff (0.11.9)
36-36:
assert_project_rootmay be undefined, or defined from star imports(F405)
126-139: Good refactoring to support both blocking and non-blocking calls.The separation of concerns between
recover_partial_snapshot_from(blocking with assertion) andtry_recover_partial_snapshot_from(non-blocking returning response) is well-designed and maintains backward compatibility.🧰 Tools
🪛 Ruff (0.11.9)
128-128:
assert_http_okmay be undefined, or defined from star imports(F405)
146-155: Simple and effective helper function.The
try_search_randomfunction appropriately uses the importedrandom_dense_vector()function and follows the same pattern as other HTTP request helpers in the module.🧰 Tools
🪛 Ruff (0.11.9)
148-148:
random_dense_vectormay be undefined, or defined from star imports(F405)
lib/segment/src/types.rs (4)
384-386: Correct conditional compilation for RocksDB dependency.The feature gating of
SparseVectorStorageType::OnDiskproperly restricts RocksDB-dependent functionality to builds with the "rocksdb" feature enabled. This aligns with the modularization effort mentioned in the AI summary.Also applies to: 398-399
418-427: Well-implemented feature-conditional default function.The conditional compilation correctly handles the default sparse vector storage type based on feature availability. When RocksDB is available, it defaults to
OnDisk; otherwise, it falls back to the standard default (Mmap).
2250-2260: Appropriate test-only convenience implementation.The
From<std::ops::Range<usize>>implementation forValuesCountis properly restricted to test builds and provides a convenient way to create range-based value count filters for testing purposes.
2797-2806: Comprehensive and well-designed method for key extraction.The
targeted_keymethod correctly handles allConditionvariants and provides a useful utility for query analysis. The implementation properly:
- Extracts keys from field-based conditions (Field, IsEmpty, IsNull)
- Handles nested conditions by converting to array keys
- Recursively searches filter conditions
- Returns
Nonefor non-key-based conditions (HasId, HasVector, CustomIdChecker)This supports the enhanced strict mode verification mentioned in the AI summary.
lib/collection/src/collection_manager/optimizers/segment_optimizer.rs (3)
345-373: LGTM! Clean cancellation handling.The method properly restores the original state by unwrapping proxies and handling the temp segment based on whether it contains data.
653-673: Well-structured refactoring that minimizes lock duration.The extraction of the slow optimization part into
optimize_segment_propagate_changeseffectively reduces the time the write lock is held. The error handling properly restores state on failure.
718-823: Excellent separation of slow and fast optimization phases.The method effectively isolates the CPU-intensive segment building from the critical section that requires the write lock. The optimization of tracking already removed points (lines 762-769) is a nice touch to minimize work in the critical section.
lib/collection/src/shards/shard_holder/mod.rs (2)
1072-1088: Clean conditional locking implementation.The method correctly differentiates between full and partial recovery types, only acquiring locks when necessary for partial recovery.
1090-1107: Consistent implementation with recovery lock method.The search write lock acquisition follows the same pattern, maintaining code consistency.
lib/posting_list/src/iterator.rs (1)
25-43: Well-implemented Iterator trait.The use of
inspectfor state updates is clean, and thesize_hintimplementation correctly provides exact bounds.Cargo.toml (2)
1-267:⚠️ Potential issueCritical: Changes don't match PR objectives
The PR is titled "Add inference usage tracking to REST API" but this Cargo.toml only shows changes related to posting lists, RocksDB features, and dependency updates. None of these changes appear to be related to inference usage tracking.
Likely an incorrect or invalid review comment.
14-14:Details
❓ Verification inconclusive
Verify Rust 1.87 requirement
The Rust version requirement was bumped from 1.85 to 1.87. Please ensure this version is necessary and document what specific Rust 1.87 features are being used.
🌐 Web query:
What are the key features and improvements in Rust 1.87 compared to Rust 1.85?💡 Result:
Rust 1.87, released on May 15, 2025, introduced several key features and improvements over Rust 1.85, which was released on February 20, 2025. Notable changes include:
Language Enhancements:
Stabilization of
asm_gotoFeature: Theasm_gotofeature has been stabilized, allowing for more flexible inline assembly in Rust code. (github.com)Support for Unbounded Shift Operations: The language now supports unbounded left and right shift operations, enhancing bitwise manipulation capabilities. (github.com)
Compiler Updates:
- SSE2 Requirement for i686 Targets: The compiler now requires SSE2 instructions for i686 targets, aligning with modern hardware capabilities. (github.com)
Platform Support:
- Deprecation of
i586-pc-windows-msvcTarget: Thei586-pc-windows-msvctarget has been deprecated and will be removed in the next release, encouraging users to migrate to more widely supported targets. (github.com)Standard Library Improvements:
Stabilization of Anonymous Pipe API: The anonymous pipe API has been stabilized, providing a more robust and efficient way to handle inter-process communication. (github.com)
Undeprecation of
env::home_dir: The previously deprecatedenv::home_dirfunction has been undeprecated, restoring its availability for retrieving the user's home directory. (github.com)Cargo and Tooling:
Terminal Integration via ANSI OSC 9;4 Sequences: Cargo now supports terminal integration through ANSI OSC 9;4 sequences, enhancing the terminal experience. (github.com)
Bump to OpenSSL v3: Cargo has updated its OpenSSL dependency to version 3, ensuring compatibility with the latest security standards. (github.com)
These updates in Rust 1.87 build upon the advancements introduced in Rust 1.85, including the stabilization of async closures and the 2024 Edition, further enhancing the language's performance, safety, and developer experience. (blog.rust-lang.org)
Citations:
- 1: rust-lang/rust#140133
- 2: rust-lang/rust#140133
- 3: rust-lang/rust#140133
- 4: rust-lang/rust#140133
- 5: rust-lang/rust#140133
- 6: rust-lang/rust#140133
- 7: rust-lang/rust#140133
- 8: rust-lang/rust#140133
- 9: https://blog.rust-lang.org/2025/02/20/Rust-1.85.0.html?utm_source=openai
Document and Confirm Rust 1.87 Features
The Rust MSRV was bumped from 1.85 to 1.87 in
Cargo.toml(line 14). Please verify which of the following 1.87 enhancements your code now depends on and document them in your upgrade notes or changelog:
- Stabilized
asm_gotofeature for inline assembly- Unbounded left/right shift operations
- SSE2 requirement for i686 targets
- Stabilized anonymous pipe API in the standard library
- Undeprecation of
env::home_dir- ANSI OSC 9;4 terminal integration via Cargo
- Bump of OpenSSL dependency to v3
If none of these features are actually used, consider reverting the MSRV back to 1.85.
lib/posting_list/src/visitor.rs (2)
37-45: Good optimization with chunk cachingThe lazy decompression and caching of chunks is a good performance optimization that avoids redundant decompression operations.
1-118:⚠️ Potential issueFile unrelated to PR objectives
This new posting list visitor implementation is unrelated to the PR objective of "Add inference usage tracking to REST API". The code appears to be part of a different feature.
Likely an incorrect or invalid review comment.
lib/collection/src/collection_manager/holders/segment_holder.rs (3)
36-2116:⚠️ Potential issueArchitectural change unrelated to PR objectives
This file updates the payload index schema handling to use
Arc<SaveOnDisk<PayloadIndexSchema>>instead of direct references. While this is a good architectural improvement for concurrency and persistence, it's completely unrelated to the PR objective of adding inference usage tracking.Likely an incorrect or invalid review comment.
998-999: Consistent pattern for schema parameter updatesThe updates to use
Arc<SaveOnDisk<PayloadIndexSchema>>are consistently applied across all relevant methods. The pattern of acquiring a read lock before accessing the schema is properly implemented.Also applies to: 1075-1076, 1108-1109, 1153-1154
1139-1142:Details
❓ Verification inconclusive
Verify read lock scope
The read lock on
payload_index_schemais held while iterating and creating field indexes. Ensure this doesn't cause contention issues if the schema has many entries.
🏁 Script executed:
#!/bin/bash # Check for other places where payload_index_schema read lock might be held for extended periods rg -A 10 "payload_index_schema\.read\(\)"Length of output: 5809
Narrow RwLock guard scope on payload_index_schema
Currently in
lib/collection/src/collection_manager/holders/segment_holder.rs:L1139-1142let payload_schema_lock = payload_index_schema.read(); for (key, schema) in payload_schema_lock.schema.iter() { segment.create_field_index(0, key, Some(schema), &hw_counter)?; }the read lock is held across every
create_field_indexcall, which may block writers if the schema is large or index creation is slow. By contrast, other modules (e.g.search.rs,sharding_keys.rs) do:let schema = self.payload_index_schema.read().schema.clone();dropping the guard immediately after cloning.
To reduce contention, consider narrowing the lock’s scope, for example:
// clone the entire schema, then drop the guard let schemas = payload_index_schema.read().schema.clone(); for (key, schema) in schemas.iter() { segment.create_field_index(0, key, Some(schema), &hw_counter)?; }or explicitly dropping
payload_schema_lockbefore callingcreate_field_index.lib/collection/src/shards/replica_set/partial_snapshot_meta.rs (2)
7-27: Excellent API flow documentationThe ASCII diagram clearly illustrates the locking sequence and state transitions during recovery. This is very helpful for understanding the concurrency model.
1-118:⚠️ Potential issueNew concurrency control structure unrelated to PR objectives
This new partial snapshot metadata management is unrelated to the PR objective of "Add inference usage tracking to REST API". The implementation appears to be for a different feature entirely.
Likely an incorrect or invalid review comment.
lib/segment/src/index/hnsw_index/graph_layers_builder.rs (2)
354-358: Good parameter naming improvement!The rename from
levelstolinks_by_levelbetter describes the data structure - a vector containing links organized by level.
387-388: Clean separation of construction and initialization.The change to create
SearchContextwithout an initial entry point and then add it viaprocess_candidateprovides better API flexibility.lib/segment/src/index/hnsw_index/hnsw.rs (3)
1440-1442: Verify the feature flag logic for healing.The code returns
Nonewhenhnsw_healingis enabled in multiple places. This seems counterintuitive - shouldn't healing be performed when the flag is enabled?Please verify that this logic is correct, or if it should be inverted.
Also applies to: 1488-1490, 1501-1503, 1516-1518
387-414: Efficient parallel point migration implementation.The migration logic correctly partitions points and processes them in parallel using the thread pool. Good use of
HwMeasurementAcc::disposable()for internal operations.
1527-1535: Reasonable threshold for healing decision.The 10% missing points threshold is a good heuristic to decide between healing and rebuilding from scratch. The debug logging provides useful information for monitoring the decision.
lib/segment/src/id_tracker/simple_id_tracker.rs (2)
132-133: Good simplification using standard trait conversions.Replacing custom helper methods with direct
Fromtrait usage improves code consistency and leverages Rust's standard conversion patterns.Also applies to: 137-138
304-341: Complete set of conversion implementations.The
Fromtrait implementations provide bidirectional conversions betweenExtendedPointIdandStoredPointId, handling both owned and borrowed variants properly. The panic for string conversions is appropriate since strings are internal-only.lib/posting_list/src/lib.rs (1)
1-54: PR objectives mismatch: This posting list implementation doesn't align with "Add inference usage tracking to REST API"The PR is titled "Add inference usage tracking to REST API" (#6563), but these files implement a compressed posting list data structure, which appears to be unrelated to the stated objectives. The AI summary mentions adding
InferenceUsagefields to REST/gRPC APIs, but the actual code changes are about posting lists.Please clarify if:
- This is the correct PR for these changes
- The PR title/objectives need to be updated
- These files were accidentally included
Likely an incorrect or invalid review comment.
lib/posting_list/src/tests.rs (1)
87-105: Excellent test coverage with edge cases!The test suite comprehensively covers various lengths including boundaries around
CHUNK_LEN, empty lists, and large lists. This property-based testing approach with the reference model validation is well-designed.lib/posting_list/src/posting_list.rs (1)
45-59: Well-implemented edge case handlingThe
get_compressed_sizemethod correctly handles the last chunk by using the total data length. This is a clean implementation.lib/segment/src/vector_storage/dense/volatile_dense_vector_storage.rs (1)
22-56: LGTM! Well-structured volatile storage implementation.The struct definition and constructors are properly designed for in-memory volatile storage with appropriate fields for vector data, deletion tracking, and distance metrics.
lib/posting_list/src/view.rs (2)
1-209: Well-designed posting list view implementation.The overall design with lifetimes, phantom types, and separation of concerns between view and storage is excellent. The binary search implementations and iteration support are correctly implemented.
153-191:Details
✅ Verification successful
Consider validating assumptions in release builds.
The method relies on debug_assert for range validation (lines 174-175), which won't run in release builds. Since the method is public within the crate, consider either:
- Promoting these to regular assertions
- Clearly documenting these as safety preconditions
- Returning
Noneif the assumptions are violated
🏁 Script executed:
#!/bin/bash # Show the definition of the public view struct to inspect its fields rg -n '^pub struct' -A5 lib/posting_list/src/view.rs # Show the find_chunk method with surrounding context rg -n 'pub fn find_chunk' -C5 lib/posting_list/src/view.rsLength of output: 1176
🏁 Script executed:
#!/bin/bash # Locate debug_assert calls in find_chunk rg -n 'debug_assert' -C3 lib/posting_list/src/view.rsLength of output: 488
Precondition is already documented; no further checks needed
The doc comment on
find_chunk(line 152) explicitly states that callers must supply anidwithin the posting list’s range, and thedebug_assert!s enforce this during development without impacting release-build performance. Keeping the current design is acceptable.lib/segment/src/vector_storage/vector_storage_base.rs (3)
13-26: Excellent feature-gated integration of volatile storage variants.The conditional compilation directives are properly applied, ensuring:
- RocksDB dependencies are optional via the "rocksdb" feature
- Volatile storage is only available in test configurations
- All storage types follow a consistent naming pattern
Also applies to: 186-265
542-590: Correct handling of populate() for volatile storage.The volatile storage variants correctly skip population since they are not memory-mapped. The empty blocks with comments clearly indicate this intentional no-op behavior.
13-1429:Details
❓ Verification inconclusive
Comprehensive and consistent integration of volatile storage throughout the codebase.
All trait methods and helper functions have been systematically updated to handle the new volatile storage variants. The feature gating pattern is consistently applied throughout.
Run the following script to verify all match expressions are complete:
🏁 Script executed:
#!/bin/bash # Description: Verify that all match expressions on VectorStorageEnum handle all variants # Search for match expressions on VectorStorageEnum and check if they handle volatile variants ast-grep --pattern 'match $_ { $$$ VectorStorageEnum::$_($_) => $_, $$$ }'Length of output: 82
Manual verification required for complete
matchcoverageI attempted to confirm via AST grep that every
matchonVectorStorageEnumhandles all variants, but the command yielded no results. To ensure no storage variant is left unhandled, please perform one of the following checks:• List all enum variants and verify they appear in each
matcharm:rg -o 'VectorStorageEnum::[A-Za-z][A-Za-z0-9_]+' -h lib/segment/src/vector_storage/vector_storage_base.rs | sort -u• Inspect each
impl VectorStorage for VectorStorageEnum(and the helper methods:
try_multi_vector_configdefault_vectorsize_of_available_vectors_in_bytespopulateclear_cache- All trait methods in the
VectorStorageimpl)
to ensure every variant above is matched.All other changes look consistent and correctly include the new volatile and feature-gated variants. Please verify coverage of the
VectorStorageEnummatch expressions.tests/openapi/test_strictmode.py (8)
165-165: Good naming improvement!The function rename to
test_strict_mode_unindexed_filter_keyword_read_validationbetter describes the test's specific focus on keyword filters.
220-303: Comprehensive test coverage for integer filter strict mode validation!The test thoroughly covers all scenarios including index creation with different capabilities (lookup vs range), which is essential for validating the strict mode behavior with integer payloads.
1456-1475: Proper validation of formula expression with index requirement!The addition of index creation for
discount_pricefield correctly validates that formula expressions respect strict mode indexing requirements.
1796-1797: Consistent error message format updates.The removal of "Forbidden:" prefix from error messages makes them more direct and concise.
Also applies to: 1811-1811, 1841-1841
1967-1967: Improved error messages with actionable guidance!The new error message format provides clear help text, making it easier for users to understand what index type is needed.
Also applies to: 1982-1982, 2011-2011, 2026-2026
2121-2121: Comprehensive error messages for full scan restrictions!The error messages clearly explain why the request is forbidden and provide multiple resolution options.
Also applies to: 2175-2175
2178-2235: Good test coverage for full scan with HNSW index toggling!The test effectively validates that disabling HNSW indexing (m=0) triggers strict mode restrictions when
search_allow_exactis false.
2236-2349: Excellent comprehensive test for multitenant indexing scenarios!The test thoroughly validates the interaction between strict mode, various index types (keyword, multitenant, HNSW payload-based), and filtered queries. The progression through different configurations ensures proper behavior in complex multitenant setups.
lib/collection/src/problems/unindexed_field.rs (5)
17-17: Good use of strum for enum iteration!The addition of strum and the rename to
all_indexes()returningFieldIndexTypebetter reflects the function's purpose.Also applies to: 164-166
168-255: Excellent refactoring for more precise index capability representation!The change from returning schemas to returning specific
FieldIndexTypevalues provides better granularity and makes the code's intent clearer. The distinction between capabilities (e.g.,IntMatchvsIntRange) is particularly valuable.
529-542: Well-designed enum for index capabilities!The
FieldIndexTypeenum provides excellent granularity in representing different index capabilities, particularly the separation of match and range operations for numeric types.
544-591: Robust capability mapping implementation!The
schema_capabilitiesfunction correctly handles all schema types and their parametrized variants. The special handling for integer indexes with configurable lookup/range capabilities is particularly well implemented, including the debug assertion to ensure at least one capability is enabled.
610-628: Good test coverage for integer index capabilities!The test properly validates that integer indexes with both lookup and range enabled correctly map to both
IntMatchandIntRangecapabilities.lib/segment/src/vector_storage/tests/test_appendable_multi_dense_vector_storage.rs (4)
26-30: Appropriate use of conditional compilation for RocksDB!Making the RocksDB variant conditional on the feature flag allows for cleaner test builds when RocksDB is not needed.
78-124: Comprehensive match coverage for all storage variants!The match statement properly handles all storage variants with appropriate conditional compilation directives. The consistent iteration logic across multi-dense variants ensures uniform behavior.
296-297: Proper use of cfg_attr for conditional test cases!The conditional inclusion of RocksDB test cases using
cfg_attrensures tests adapt correctly to the build configuration.Also applies to: 322-323, 350-351
375-434: Comprehensive test coverage for volatile storage!The three test functions provide complete coverage for volatile multi-dense vector storage, mirroring the functionality tested for persistent storage types. The direct instantiation approach is appropriate for in-memory storage.
lib/segment/src/vector_storage/sparse/volatile_sparse_vector_storage.rs (4)
38-54: LGTM! Deletion flag management is well-implemented.The method correctly prevents marking non-existent vectors as undeleted and properly maintains the deleted count with appropriate bounds checking.
157-169: Good validation with debug assertion.The debug assertion for sorted sparse vector indices helps catch potential bugs during development.
171-187: Batch update implementation looks good.Proper use of cancellation checking and error propagation throughout the update process.
197-204:⚠️ Potential issueFix inverted deletion logic.
The logic for determining
is_deletedis inverted. Theset_deletedmethod returns the previous deleted state, so if it returnsfalse(wasn't deleted), then we successfully deleted it andis_deletedshould betrue.fn delete_vector(&mut self, key: PointOffsetType) -> OperationResult<bool> { - let is_deleted = !self.set_deleted(key, true); + let was_deleted = self.set_deleted(key, true); + let is_deleted = !was_deleted; if is_deleted { let old_vector = self.get_sparse_opt(key).ok().flatten(); self.update_stored(key, true, old_vector.as_ref()); } Ok(is_deleted) }Likely an incorrect or invalid review comment.
lib/segment/src/segment_constructor/segment_constructor_base.rs (3)
13-28: Well-structured conditional imports.The RocksDB-specific imports are properly feature-gated, maintaining clean separation of concerns.
104-122: Good error handling for disabled RocksDB.The function properly handles the case when Memory storage type is requested but RocksDB is disabled, providing a clear error message.
730-737: Good error handling for RocksDB operations.Proper error conversion with descriptive context and safe fallback to default value.
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs (4)
23-29: Clear metadata structure with helpful documentation.The comment clarifying that fields count vectors (not dimensions) is valuable for understanding the data structure.
107-122: Consistent implementation with sparse storage.The deletion flag management follows the same pattern as the sparse storage implementation, which is good for code consistency.
135-140: Good validation with descriptive error message.The check for oversized vectors provides a clear, actionable error message including both the attempted size and the maximum allowed size.
302-305:⚠️ Potential issueFix inverted deletion logic.
Same issue as in sparse storage - the logic for determining
is_deletedis inverted.fn delete_vector(&mut self, key: PointOffsetType) -> OperationResult<bool> { - let is_deleted = !self.set_deleted(key, true); + let was_deleted = self.set_deleted(key, true); + let is_deleted = !was_deleted; Ok(is_deleted) }Likely an incorrect or invalid review comment.
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs
Show resolved
Hide resolved
lib/segment/src/vector_storage/multi_dense/volatile_multi_dense_vector_storage.rs
Show resolved
Hide resolved
0f109ff to
83df404
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (5)
docs/grpc/docs.md (5)
181-182: Fix list indentation in the Table of Contents
The new list items forInferenceUsageandInferenceUsage.ModelEntryare indented with 4 spaces, which violates the project's markdown lint rule (MD007 expects 2 spaces).
Apply this diff:- - [InferenceUsage](#qdrant-InferenceUsage) - - [InferenceUsage.ModelEntry](#qdrant-InferenceUsage-ModelEntry) + - [InferenceUsage](#qdrant-InferenceUsage) + - [InferenceUsage.ModelEntry](#qdrant-InferenceUsage-ModelEntry)🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
181-181: Unordered list indentation
Expected: 2; Actual: 4(MD007, ul-indent)
182-182: Unordered list indentation
Expected: 2; Actual: 4(MD007, ul-indent)
3177-3181: Add description forInferenceUsage.modelfield
The table entry formodelinInferenceUsageis missing a description. Consider adding a brief explanation such as "Mapping of model name to its usage metrics".
3193-3196: Add descriptions forInferenceUsage.ModelEntryfields
Bothkeyandvalueentries lack descriptions. Suggested descriptions:
key: "Model name"value: "Usage metrics for the model"
3293-3296: Add description forModelUsage.tokens
Thetokensfield is missing a description. Consider: "Number of tokens processed by the model."
4814-4815: Add description forUsage.inferencefield
The newinferencefield in theUsagemessage should include a description. For example: "Inference usage metrics including per-model token counts".
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (36)
.gitignore(1 hunks)docs/grpc/docs.md(4 hunks)docs/redoc/master/openapi.json(1 hunks)lib/api/src/grpc/conversions.rs(1 hunks)lib/api/src/grpc/ops.rs(2 hunks)lib/api/src/grpc/proto/points.proto(1 hunks)lib/api/src/grpc/qdrant.rs(1 hunks)lib/api/src/rest/conversions.rs(2 hunks)lib/api/src/rest/models.rs(3 hunks)lib/collection/src/operations/types.rs(1 hunks)lib/collection/src/operations/universal_query/collection_query.rs(3 hunks)lib/collection/src/operations/verification/query.rs(2 hunks)lib/storage/src/content_manager/errors.rs(5 hunks)src/actix/api/collections_api.rs(6 hunks)src/actix/api/count_api.rs(2 hunks)src/actix/api/discovery_api.rs(4 hunks)src/actix/api/facet_api.rs(2 hunks)src/actix/api/local_shard_api.rs(5 hunks)src/actix/api/query_api.rs(7 hunks)src/actix/api/recommend_api.rs(6 hunks)src/actix/api/retrieve_api.rs(6 hunks)src/actix/api/search_api.rs(10 hunks)src/actix/api/service_api.rs(3 hunks)src/actix/api/shards_api.rs(2 hunks)src/actix/api/snapshot_api.rs(2 hunks)src/actix/api/update_api.rs(25 hunks)src/actix/helpers.rs(5 hunks)src/common/inference/infer_processing.rs(3 hunks)src/common/inference/query_requests_grpc.rs(2 hunks)src/common/inference/query_requests_rest.rs(15 hunks)src/common/inference/service.rs(4 hunks)src/common/inference/update_requests.rs(13 hunks)src/common/update.rs(10 hunks)src/tonic/api/points_api.rs(2 hunks)src/tonic/api/query_common.rs(15 hunks)src/tonic/api/update_common.rs(3 hunks)
✅ Files skipped from review due to trivial changes (1)
- src/actix/api/snapshot_api.rs
🚧 Files skipped from review as they are similar to previous changes (33)
- .gitignore
- src/actix/api/facet_api.rs
- src/common/inference/query_requests_grpc.rs
- src/actix/api/local_shard_api.rs
- src/actix/api/count_api.rs
- src/actix/api/discovery_api.rs
- lib/collection/src/operations/universal_query/collection_query.rs
- src/tonic/api/points_api.rs
- src/actix/api/shards_api.rs
- src/actix/api/retrieve_api.rs
- src/actix/api/recommend_api.rs
- src/actix/api/service_api.rs
- lib/api/src/grpc/ops.rs
- lib/api/src/rest/conversions.rs
- lib/collection/src/operations/types.rs
- src/actix/api/collections_api.rs
- src/tonic/api/query_common.rs
- src/actix/api/search_api.rs
- lib/api/src/grpc/conversions.rs
- src/actix/api/query_api.rs
- src/actix/api/update_api.rs
- lib/storage/src/content_manager/errors.rs
- src/tonic/api/update_common.rs
- lib/api/src/rest/models.rs
- lib/collection/src/operations/verification/query.rs
- lib/api/src/grpc/qdrant.rs
- src/common/inference/infer_processing.rs
- src/common/inference/service.rs
- src/common/update.rs
- src/common/inference/query_requests_rest.rs
- src/actix/helpers.rs
- lib/api/src/grpc/proto/points.proto
- docs/redoc/master/openapi.json
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
docs/grpc/docs.md
181-181: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
182-182: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
183-183: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
184-184: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
185-185: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
186-186: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
187-187: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
188-188: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: rust-tests (macos-latest)
- GitHub Check: integration-tests-consensus
- GitHub Check: test-consensus-compose
- GitHub Check: integration-tests
- GitHub Check: rust-tests (ubuntu-latest)
- GitHub Check: rust-tests (windows-latest)
- GitHub Check: storage-compat-test
- GitHub Check: lint
🔇 Additional comments (8)
docs/grpc/docs.md (3)
3171-3174: Confirm anchor and heading formatting
The anchor and heading forInferenceUsageare correctly added following established conventions.
3187-3192: Confirm anchor and heading formatting
The anchor and heading forInferenceUsage.ModelEntryfollow the existing pattern and are correctly placed.
3288-3291: Confirm anchor and heading formatting forModelUsage
TheModelUsagesection is correctly added and consistent with other message definitions.src/common/inference/update_requests.rs (5)
182-187: LGTM! Usage aggregation issue has been properly addressed.The inference usage merging logic correctly accumulates usage data from multiple named vector batches, addressing the concern raised in the previous review. The implementation properly merges each
batch_usageinto the accumulator instead of only capturing the first non-None value.
67-67: Improved variable naming for clarity.The renaming from
vectortovector_datain named vector processing provides better clarity about what the variable represents, especially when dealing with nested vector structures.Also applies to: 326-327, 412-414
76-78: Enhanced error messages with explicit context.The error messages now provide specific context about where the inference failure occurred (e.g., "for named vector", "for document", "for image", "for object"), making debugging easier. The explicit
Noneparameter for the optional error detail is also consistent with the error constructor signature.Also applies to: 93-95, 113-115, 133-135
21-21: Consistent inference usage tracking across all conversion functions.All conversion functions now properly return tuples containing both the converted data and optional
InferenceUsage, enabling comprehensive usage tracking throughout the conversion pipeline.Also applies to: 165-165, 225-225, 390-390
192-210: Appropriate error handling for unsupported direct processing.The explicit error handling for direct Document/Image/Object processing in top-level batch vectors correctly guides users to use named vectors instead, preventing potential confusion about supported usage patterns.
83df404 to
1dc2bc7
Compare
|
In a call we agreed that the current approach isn't very sustainable. Returning usage in all ok/err returns adds a lot of boilerplate. We agreed to try a different approach instead, one that follows the design of our current hardware usage counters. Both concepts seem quite similar. That way we don't need to return usages everywhere, and we handle ok/error statuses in exactly the same way. |
1dc2bc7 to
7eb2ef3
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR adds inference usage tracking to the REST API endpoints by updating the calls to process_response and process_response_error throughout the codebase, as well as integrating new models for inference usage into the GRPC and REST models.
- Updated process_response and process_response_error calls to accept an extra argument for inference usage.
- Added new types and conversion routines to support inference usage in both REST and GRPC APIs.
- Extended the OpenAPI and GRPC schemas and documentation to include inference tracking data.
Reviewed Changes
Copilot reviewed 32 out of 32 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/actix/api/*.rs | Updated API endpoints to pass an extra inference usage parameter (defaulted to None). |
| lib/api/src/rest/models.rs | Introduced new InferenceUsage and ModelUsage types along with corresponding changes. |
| lib/api/src/grpc/* | Extended GRPC models, conversions, and proto definitions to include inference usage. |
| docs/redoc/master/openapi.json | Updated documentation and schema definitions to document the new inference tracking. |
Comments suppressed due to low confidence (1)
src/actix/api/query_api.rs:90
- Verify that the extraction and consistent passing of the
inference_usagevalue is handled uniformly across all API endpoints to avoid unexpected behavior.
let (result, inference_usage) = match result {
* Add inference usage tracking to REST API --------- Co-authored-by: Andrey Vasnetsov <andrey@vasnetsov.com>
No description provided.