Measure hardware IO for update operations#5922
Conversation
f7e453a to
657951b
Compare
0d4e450 to
e234949
Compare
|
Please include an integration test for this changes |
98f2eaa to
d73fe4f
Compare
src/actix/helpers.rs
Outdated
| dispatcher: &Dispatcher, | ||
| collection_name: String, | ||
| report_to_api: bool, | ||
| wait: Option<bool>, // Whether the client is awaiting the operation or `None` if function always waits. |
There was a problem hiding this comment.
I find this parameter a bit confusing. Do we need it if we also have report_to_api?
I'm confused by it because it doesn't seem to match the wait parameters we have in our API requests:
| wait parameter: | wait in API | report in this function |
|---|---|---|
| None | false | true |
| Some(false) | false | false |
| Some(true) | true | true |
There was a problem hiding this comment.
On one hand, I agree that this parameter is confusing and probably an abstraction leak (hardware counter should not care if the request is wait or not).
On the other hand, if we decide that the caller should provide correct report_to_api value, which accounts for service_config.hardware_reporting() && params.wait it might be easy to forget in one of many places to do correctly
There was a problem hiding this comment.
I will push the second version, but I am on the fence with this change
There was a problem hiding this comment.
I agree with this parameter being quite confusing but I think we shouldn't make the caller decide the logic because, as @generall already said, it's very easy to make a mistake here.
An alternative could be to make this an enum with variants like Blocking/NonBlocking to make it more clear and report HW if it's not Some(BlockingMode::NonBlocking). And maybe even don't make the parameter an option but always state the type of operation.
For context, the reason we need this is because if we don't wait, we get empty or wrong hardware metrics in the API response which confuses users and may make implementation errors because we assume those values to be correct.
| ); | ||
|
|
||
| RequestHwCounter::new(counter, self.service_config.hardware_reporting()) | ||
| let waiting = wait != Some(false); |
There was a problem hiding this comment.
Maybe rename this to report_to_api?
d73fe4f to
353f5f7
Compare
|
It seems that payload operations created during resharding are still accounted |
7b8d7e6 to
d57cc2f
Compare
|
I've rebased this on |
d57cc2f to
845549e
Compare
📝 WalkthroughWalkthroughThe updates integrate hardware measurement tracking throughout the system. A new hardware usage field is added to various response models and Protocol Buffers messages, ensuring that telemetry data is captured during operations. Conversion implementations, API service functions, and internal update methods now pass a hardware measurement accumulator parameter, which standardizes the collection of hardware usage metrics across different workflows. Changes are applied across documentation, GRPC, Actix, and Tonic interfaces, with method signatures updated to include additional hardware measurement arguments. The modifications also extend to shard operations, replica set functions, and content management processes, thereby embedding hardware measurement handling into both local and remote operations. Additionally, new test functions and assertions have been incorporated into integration and consensus tests to capture, verify, and compare hardware telemetry, and metric types have been updated to counters to better reflect cumulative hardware activity. ✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (20)
lib/api/src/grpc/qdrant.rs (1)
9183-9184: Document the purpose of theunmeasuredflag.The new boolean flag
unmeasuredhas been added, but its purpose and usage conditions aren't clear from the code. Consider adding a code comment explaining when and why this flag would be set to true.#[prost(message, optional, tag = "3")] pub clock_tag: ::core::option::Option<ClockTag>, + /// Flag indicating if hardware measurements were skipped for this operation #[prost(bool, optional, tag = "4")] pub unmeasured: ::core::option::Option<bool>,docs/grpc/docs.md (1)
3502-3502: Clarify theusageField DescriptionThe new optional
usagefield is correctly added to the PointsOperationResponse documentation, in line with the hardware usage measurement update. However, the description column is currently empty. It would be beneficial to provide a brief explanation of what hardware metrics are tracked (e.g., CPU usage, payload I/O read/write, vector I/O read/write) so users understand its purpose.For example, consider changing the row diff as follows:
-| usage | [HardwareUsage](#qdrant-HardwareUsage) | optional | | +| usage | [HardwareUsage](#qdrant-HardwareUsage) | optional | Aggregated hardware usage metrics (CPU, payload_io_read, payload_io_write, vector_io_read, vector_io_write). |lib/collection/src/collection/payload_index_schema.rs (1)
86-92: Consider preserving hardware measurements for index operationsWhile using
HwMeasurementAcc::disposable()is technically correct, index operations can be computationally expensive. It might be valuable to track these in the future to give users insight into resource consumption during index management.Consider passing through an actual measurement accumulator from higher-level methods rather than using a disposable one:
- .update_all_local( - delete_index_operation, - false, - HwMeasurementAcc::disposable(), // Unmeasured API - ) + .update_all_local( + delete_index_operation, + false, + hw_measurement_acc, // Propagate measurements from caller + )And update the method signature to accept this parameter:
- pub async fn drop_payload_index( - &self, - field_name: JsonPath, - ) -> CollectionResult<Option<UpdateResult>> { + pub async fn drop_payload_index( + &self, + field_name: JsonPath, + hw_measurement_acc: HwMeasurementAcc, + ) -> CollectionResult<Option<UpdateResult>> {src/actix/api/search_api.rs (1)
1-339: Verify integration test for hardware reportingThe PR comment mentioned that an integration test is needed. While the code changes look good, I'd recommend adding at least one test that verifies the hardware usage is properly reported in the API response.
Would you like me to suggest an integration test implementation to verify that hardware usage metrics are properly included in API responses?
lib/collection/src/shards/forward_proxy_shard.rs (1)
314-325: Consider updating this block to use update_remote_unmeasured for consistency.While the current operation at line 186 has been updated to use
update_remote_unmeasured, this similar block still uses the standardupdatemethod. For consistency and to prevent redundant hardware measurements for internal operations, consider updating this to useupdate_remote_unmeasuredas well.if let Some(operation) = forward_operation { let remote_result = self .remote_shard - .update(operation, false, hw_measurement_acc) + .update_remote_unmeasured(operation, false, hw_measurement_acc) .await .map_err(|err| { CollectionError::forward_proxy_error(self.remote_shard.peer_id, err) })?; // Merge `result` and `remote_result`: // // - Pick `clock_tag` with *newer* `clock_tick` let tick = result.clock_tag.map(|tag| tag.clock_tick); let remote_tick = remote_result.clock_tag.map(|tag| tag.clock_tick);lib/collection/tests/integration/collection_test.rs (1)
252-255: Consider reusing the hardware counter.A new hardware counter is being created for each operation. Consider reusing the existing counter if you want to accumulate measurements across multiple operations.
- let hw_counter = HwMeasurementAcc::new(); collection - .update_from_client_simple(assign_payload, true, WriteOrdering::default(), hw_counter) + .update_from_client_simple(assign_payload, true, WriteOrdering::default(), hw_counter.clone())lib/collection/src/shards/queue_proxy_shard.rs (1)
505-506: Consider measuring hardware usage during transfer operationsCurrently, a disposable hardware measurement accumulator is used for transfer operations, indicated as "Internal operation". Since the PR aims to add measurements for update operations, you might want to consider whether these transfer operations should also be measured and included in the metrics rather than using disposable counters.
If these operations can be significant in terms of hardware usage, tracking them could provide more comprehensive resource utilization insights.
- let disposed_hw = HwMeasurementAcc::disposable(); // Internal operation + // Use a real accumulator to track transfer operation resource usage + let transfer_hw = HwMeasurementAcc::new(); // Track transfer operationssrc/tonic/api/points_api.rs (1)
55-56: Consider renaming the variable for clarity.The variable naming could be improved to better distinguish between the parameter and its processed value.
- let waiting = wait != Some(false); + let should_wait = wait != Some(false); - RequestHwCounter::new(counter, self.service_config.hardware_reporting() && waiting) + RequestHwCounter::new(counter, self.service_config.hardware_reporting() && should_wait)tests/consensus_tests/test_hw_measurement.py (6)
28-28: Add tests for vector writes when implemented.The TODO comment indicates the need for tests for vector writes. This aligns with the PR description mentioning that vector storage measurements will be included in a future PR.
Would you like assistance in preparing a placeholder structure for these vector write tests to make them easier to implement in the future PR?
40-40: Consider adjusting assertion to ensure minimum expected change.The current assertion only checks if the payload I/O has increased by any amount (
>=), but it would be better to assert that the increase meets the minimum expected value.-assert abs(peer_hw["payload_io_write"] - peer_hw_infos[peer_idx]["payload_io_write"]) >= expected_delta +assert_with_upper_bound_error( + abs(peer_hw["payload_io_write"] - peer_hw_infos[peer_idx]["payload_io_write"]), + expected_delta +)
63-63: Verify equality with error margin for hardware metrics.The strict equality check may be too rigid for hardware metrics that can have small variations. Consider using the
assert_with_upper_bound_errorfunction you defined for more flexibility.-assert update_hw_data['payload_io_write'] == total_payload_io_write - total_payload_io_write_old +delta = total_payload_io_write - total_payload_io_write_old +assert_with_upper_bound_error(update_hw_data['payload_io_write'], delta)
92-101: Use raised exceptions instead ofassert False.Using
assert Falsecan be problematic because assertions can be disabled with the Python-Oflag. Instead, raise an explicitAssertionError.def assert_with_upper_bound_error(inp: int, min_value: int, upper_bound_error_percent: float = 0.05): """Asserts `inp` being equal to `min_value` with a max upperbound error given in percent.""" if inp < min_value: - assert False, f"Assertion {inp} >= {min_value} failed" + raise AssertionError(f"Assertion {inp} >= {min_value} failed") max_error = ceil(float(min_value) + float(min_value) * upper_bound_error_percent) upper_bound = inp + max_error if inp > upper_bound: - assert False, f"Assertion {inp} being below upperbound error of {upper_bound_error_percent}(={upper_bound}) failed." + raise AssertionError(f"Assertion {inp} being below upperbound error of {upper_bound_error_percent}(={upper_bound}) failed.")🧰 Tools
🪛 Ruff (0.8.2)
95-95: Do not
assert False(python -Oremoves these calls), raiseAssertionError()Replace
assert False(B011)
101-101: Do not
assert False(python -Oremoves these calls), raiseAssertionError()Replace
assert False(B011)
97-98: Simplify the upper bound calculation.The current calculation for the upper bound is a bit confusing. A simpler approach would make the code more readable.
- max_error = ceil(float(min_value) + float(min_value) * upper_bound_error_percent) - upper_bound = inp + max_error + # Calculate maximum allowed value based on the minimum value plus error margin + upper_bound = ceil(float(min_value) * (1 + upper_bound_error_percent))
88-88: Increase readability with a separate assertion for each check.The test is combining two assertions (minimum payload write and upper bound) in one line. Consider separating them for clarity.
-assert_with_upper_bound_error(peer_hw["payload_io_write"], upsert_vectors * 5) # 50 vectors on this node with payload of ~5 bytes +# Assert minimum payload I/O write (50 vectors on this node with payload of ~5 bytes) +min_expected_payload = upsert_vectors * 5 +assert_with_upper_bound_error(peer_hw["payload_io_write"], min_expected_payload)lib/collection/src/shards/remote_shard.rs (1)
245-245: Reduce argument list if possible.You have annotated this function to allow many arguments. While this works, consider grouping related parameters into a struct for better readability and maintainability.
lib/collection/src/shards/replica_set/update.rs (1)
197-204: Consider adding a descriptive comment about hardware measurement cloningThe code clones the
hw_measurement_accwhen passing it toupdate_impl. While this is correct behavior to maintain separate tracking for each attempt, it would be helpful to add a comment explaining why cloning is necessary here.- hw_measurement_acc.clone(), + // Clone the accumulator to maintain separate tracking for each retry attempt + hw_measurement_acc.clone(),src/common/update.rs (3)
230-231: Consider adding doc comments explaining the purpose of clippy attributesThe
#[expect(clippy::too_many_arguments)]attribute is used correctly, but it would be helpful to add a brief comment explaining why this exception is necessary.-#[expect(clippy::too_many_arguments)] +// This function has many arguments due to its central role in coordinating +// point operations with various configurations and measurement capabilities +#[expect(clippy::too_many_arguments)]Also applies to: 272-273
379-380: Consider using a separate variable for cloned hardware counterTo improve code readability, consider storing the cloned hardware measurement in a variable before passing it to the update function.
- hw_measurement_acc.clone(), + let hw_acc_cloned = hw_measurement_acc.clone(); + hw_acc_cloned,
818-828: Consider refactoring to reduce parameter countWhile the
#[expect(clippy::too_many_arguments)]attribute is in place, consider refactoring to use a parameter object or struct to group related parameters, which would improve maintainability.-pub async fn update( - toc: &TableOfContent, - collection_name: &str, - operation: CollectionUpdateOperations, - internal_params: InternalUpdateParams, - params: UpdateParams, - shard_key: Option<ShardKeySelector>, - access: Access, - hw_measurement_acc: HwMeasurementAcc, -) -> Result<UpdateResult, StorageError> { +pub struct UpdateContext<'a> { + pub toc: &'a TableOfContent, + pub collection_name: &'a str, + pub operation: CollectionUpdateOperations, + pub internal_params: InternalUpdateParams, + pub params: UpdateParams, + pub shard_key: Option<ShardKeySelector>, + pub access: Access, + pub hw_measurement_acc: HwMeasurementAcc, +} + +pub async fn update( + ctx: UpdateContext<'_>, +) -> Result<UpdateResult, StorageError> {src/tonic/api/points_internal_api.rs (1)
258-262: Consider adding a comment explaining the purpose of hardware measurementFor clarity, consider adding a comment explaining why hardware measurement is being tracked and how it's used in the response.
+ // Create a hardware measurement accumulator for this collection to track resource usage + // during the update operation. This will be included in the response for monitoring. let hw_metrics = self.get_request_collection_hw_usage_counter_for_internal( update_point_vectors.collection_name.clone(), );
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (47)
docs/grpc/docs.md(1 hunks)lib/api/src/grpc/conversions.rs(1 hunks)lib/api/src/grpc/proto/points.proto(1 hunks)lib/api/src/grpc/proto/points_internal_service.proto(3 hunks)lib/api/src/grpc/qdrant.rs(4 hunks)lib/collection/src/collection/clean.rs(1 hunks)lib/collection/src/collection/payload_index_schema.rs(3 hunks)lib/collection/src/collection/point_ops.rs(9 hunks)lib/collection/src/collection/sharding_keys.rs(3 hunks)lib/collection/src/shards/conversions.rs(2 hunks)lib/collection/src/shards/forward_proxy_shard.rs(1 hunks)lib/collection/src/shards/local_shard/shard_ops.rs(2 hunks)lib/collection/src/shards/queue_proxy_shard.rs(3 hunks)lib/collection/src/shards/remote_shard.rs(9 hunks)lib/collection/src/shards/replica_set/mod.rs(1 hunks)lib/collection/src/shards/replica_set/update.rs(8 hunks)lib/collection/src/tests/points_dedup.rs(1 hunks)lib/collection/src/update_handler.rs(4 hunks)lib/collection/tests/integration/collection_restore_test.rs(3 hunks)lib/collection/tests/integration/collection_test.rs(9 hunks)lib/collection/tests/integration/distance_matrix_test.rs(1 hunks)lib/collection/tests/integration/grouping_test.rs(4 hunks)lib/collection/tests/integration/lookup_test.rs(1 hunks)lib/collection/tests/integration/multi_vec_test.rs(1 hunks)lib/collection/tests/integration/pagination_test.rs(1 hunks)lib/collection/tests/integration/snapshot_recovery_test.rs(1 hunks)lib/storage/src/content_manager/data_transfer.rs(2 hunks)lib/storage/src/content_manager/toc/point_ops.rs(6 hunks)lib/storage/src/content_manager/toc/request_hw_counter.rs(1 hunks)src/actix/api/count_api.rs(1 hunks)src/actix/api/discovery_api.rs(2 hunks)src/actix/api/facet_api.rs(1 hunks)src/actix/api/local_shard_api.rs(3 hunks)src/actix/api/query_api.rs(3 hunks)src/actix/api/recommend_api.rs(3 hunks)src/actix/api/retrieve_api.rs(3 hunks)src/actix/api/search_api.rs(5 hunks)src/actix/api/update_api.rs(16 hunks)src/actix/helpers.rs(1 hunks)src/common/metrics.rs(1 hunks)src/common/update.rs(38 hunks)src/tonic/api/points_api.rs(28 hunks)src/tonic/api/points_internal_api.rs(9 hunks)src/tonic/api/update_common.rs(35 hunks)tests/consensus_tests/fixtures.py(2 hunks)tests/consensus_tests/test_hw_measurement.py(1 hunks)tests/consensus_tests/utils.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/consensus_tests/test_hw_measurement.py
5-5: from .utils import * used; unable to detect undefined names
(F403)
18-18: start_cluster may be undefined, or defined from star imports
(F405)
20-20: wait_collection_exists_and_active_on_all_peers may be undefined, or defined from star imports
(F405)
68-68: start_cluster may be undefined, or defined from star imports
(F405)
70-70: wait_collection_exists_and_active_on_all_peers may be undefined, or defined from star imports
(F405)
72-72: check_collection_points_count may be undefined, or defined from star imports
(F405)
80-80: wait_collection_points_count may be undefined, or defined from star imports
(F405)
95-95: Do not assert False (python -O removes these calls), raise AssertionError()
Replace assert False
(B011)
101-101: Do not assert False (python -O removes these calls), raise AssertionError()
Replace assert False
(B011)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test-consensus-compose
- GitHub Check: test (macos-latest)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test (windows-latest)
- GitHub Check: test-consensus
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (151)
lib/api/src/grpc/qdrant.rs (3)
5803-5804: Good addition of hardware usage tracking to the response structure.The addition of an optional
usagefield of typeHardwareUsageto the response structure aligns perfectly with the PR's objectives of implementing hardware IO measurements for update operations. This will provide valuable metrics for users monitoring resource consumption.
9304-9305: Consistent implementation of hardware usage tracking.Good addition of the
usagefield here, maintaining consistency with other response structures. The field being optional ensures backward compatibility with existing clients, as noted in the comment below about remaining compatible withUpdateResult.
9794-9795: Hardware usage tracking is consistently implemented across APIs.The hardware usage field has been added consistently to all relevant response structures, including this internal client response. This ensures uniformity in how resource consumption is reported throughout both public and internal APIs.
tests/consensus_tests/utils.py (1)
82-82: Hardware reporting enabled for consensus testsThis change enables hardware usage reporting in the testing environment, which aligns with the PR objective of adding measurements for all internal update operations.
src/actix/api/facet_api.rs (1)
57-57: Hardware tracking parameter properly addedThe
Noneparameter has been added to theget_request_hardware_counterfunction call, properly implementing hardware usage tracking for the facet API endpoint. This is consistent with the PR's goal of adding hardware measurement capabilities throughout the system.src/actix/api/retrieve_api.rs (3)
103-103: Hardware tracking parameter properly added to get_pointThe
Noneparameter has been added to theget_request_hardware_counterfunction call, properly implementing hardware usage tracking for the point retrieval endpoint.
162-162: Hardware tracking parameter properly added to get_pointsThe
Noneparameter has been added to theget_request_hardware_counterfunction call, properly implementing hardware usage tracking for the batch point retrieval endpoint.
223-223: Hardware tracking parameter properly added to scroll_pointsThe
Noneparameter has been added to theget_request_hardware_counterfunction call, properly implementing hardware usage tracking for the scroll points endpoint.src/actix/api/count_api.rs (1)
52-52: This field is correctly set to None for read APIs.The hardware counter doesn't need the
waitparameter for read operations likecount_points, so addingNonehere is appropriate.lib/collection/tests/integration/pagination_test.rs (1)
40-43: Hardware measurement correctly integrated into the test.The hardware measurement accumulator is properly initialized and passed to the update operation. This change aligns with the PR objective to add measurements for all update functions.
lib/collection/src/collection/sharding_keys.rs (3)
3-3: Import added appropriately.The import for the hardware measurement accumulator is properly added.
64-65: Address the TODO comment in a future PR.The code correctly uses
HwMeasurementAcc::disposable()but there's an outstanding TODO comment about propagating the value.This hardware counter is created but not returned from the function. According to the PR objectives, you're tracking measurements for update operations, but the current implementation with
disposable()means these measurements aren't being propagated outside of this function.Consider whether these measurements should be captured and propagated in a subsequent PR as indicated by the TODO comment.
127-131: Hardware counter correctly passed to update_local.The hardware measurement accumulator is correctly passed to the
update_localmethod, enabling the tracking of hardware usage during index creation operations.lib/api/src/grpc/proto/points.proto (1)
813-813: Hardware usage field properly added to the response message.The optional
HardwareUsage usagefield is correctly added to thePointsOperationResponsemessage, enabling the API to report hardware metrics to clients.This aligns perfectly with the PR objective to add reporting capabilities for hardware measurements in both REST and gRPC APIs.
lib/collection/tests/integration/snapshot_recovery_test.rs (3)
109-112: Hardware counter correctly added to update operationThe hardware measurement counter is properly initialized and passed to the update operation. This change aligns with the PR objective of adding measurements for update operations.
161-169: Consistent use of hardware measurement for search operationsThe code correctly initializes and passes the
HwMeasurementAccinstance to the search operation, maintaining consistency across the codebase for hardware measurement tracking.
173-181: Separate hardware counter used for recovery testingGood practice to use a separate hardware counter instance for the recovered collection search, which allows for potential independent measurement of original vs recovered collection performance.
lib/collection/src/collection/payload_index_schema.rs (2)
4-4: Import for hardware measurement correctly addedThe import is properly added to support the hardware measurement feature.
70-71: Explicit comment about unmeasured API usageThe code correctly uses
HwMeasurementAcc::disposable()with a clear comment indicating that this is an unmeasured API. This is appropriate for internal operations that don't need to be tracked in user-facing measurement results.lib/collection/tests/integration/distance_matrix_test.rs (2)
69-72: Hardware counter correctly integrated with update operationThe hardware measurement counter is properly initialized and passed to the update operation, consistent with the implementation in other tests.
75-87: Hardware counter correctly used for matrix searchGood to see the consistent use of hardware measurements for different operations, including the matrix search in this test.
lib/collection/tests/integration/multi_vec_test.rs (4)
110-113: Hardware counter correctly added to multi-vector update operationThe hardware measurement counter is properly initialized and passed to the update operation in this multi-vector context.
133-141: Consistent hardware measurement for vector search operationsThe code correctly initializes and passes the hardware measurement accumulator to the search operation, maintaining consistency with other search implementations.
223-236: Hardware measurement correctly applied to point retrievalThe code properly integrates hardware measurement with the point retrieval operation, ensuring that all operations have consistent measurement capabilities.
248-263: Hardware measurement applied to recommendation operationsThe recommendation functionality correctly includes hardware measurements, completing the integration across all major operation types.
src/actix/api/discovery_api.rs (2)
51-56: Consistent API parameter update for hardware reportingThis change adds the
Noneparameter toget_request_hardware_counterfunction to align with hardware measurement tracking, supporting the new monitoring capability being added system-wide.
106-111: Consistent API parameter update for hardware reportingThis change adds the
Noneparameter toget_request_hardware_counterfunction to align with hardware measurement tracking, supporting the new monitoring capability being added system-wide.lib/collection/src/tests/points_dedup.rs (1)
135-135:Details
❓ Verification inconclusive
Update to include hardware measurement in local update operations
The
update_localmethod now accepts a hardware measurement accumulator, which aligns with the PR objective of tracking I/O operations for updates.
🏁 Script executed:
#!/bin/bash # Verify the signature change for update_local across the codebase rg -t rust "fn update_local.*HwMeasurementAcc" --no-filename -A 1Length of output: 67
Review Verification: Confirm
update_localSignature Change Consistency
The change to include a hardware measurement accumulator appears in the test file (lib/collection/src/tests/points_dedup.rs), as shown by the call:.update_local(op, true, HwMeasurementAcc::new())However, our automated search for function definitions (using
rg -t rust "fn update_local.*HwMeasurementAcc") returned no results. This suggests that while the test call has been updated, it’s unclear whether all implementations ofupdate_localhave been updated accordingly.Please manually verify that:
- The signature of
update_localin all relevant modules now includes the hardware measurement accumulator parameter.- All calls to
update_localthroughout the codebase align with this updated signature.Thank you for checking these details to ensure consistency with the PR objective of tracking I/O operations.
lib/collection/tests/integration/collection_restore_test.rs (3)
47-50: Added hardware measurement tracking to update operationsAdded hardware measurement tracking via
HwMeasurementAccas required by the updatedupdate_from_client_simplesignature. This change is consistent with the PR goal of measuring hardware I/O for update operations.
92-95: Added hardware measurement tracking to update operationsAdded hardware measurement tracking via
HwMeasurementAccas required by the updatedupdate_from_client_simplesignature. This change is consistent with the PR goal of measuring hardware I/O for update operations.
173-176: Added hardware measurement tracking to update operationsAdded hardware measurement tracking via
HwMeasurementAccas required by the updatedupdate_from_client_simplesignature. This change is consistent with the PR goal of measuring hardware I/O for update operations.src/actix/api/search_api.rs (5)
59-64: Hardware reporting parameter added consistentlyAdded
Noneparameter toget_request_hardware_counterto support hardware usage tracking while maintaining backward compatibility. This change is part of the system-wide implementation of hardware measurement for operations.
130-135: Hardware reporting parameter added consistentlyAdded
Noneparameter toget_request_hardware_counterto support hardware usage tracking while maintaining backward compatibility. This change is part of the system-wide implementation of hardware measurement for operations.
196-201: Hardware reporting parameter added consistentlyAdded
Noneparameter toget_request_hardware_counterto support hardware usage tracking while maintaining backward compatibility. This change is part of the system-wide implementation of hardware measurement for operations.
251-256: Hardware reporting parameter added consistentlyAdded
Noneparameter toget_request_hardware_counterto support hardware usage tracking while maintaining backward compatibility. This change is part of the system-wide implementation of hardware measurement for operations.
307-312: Hardware reporting parameter added consistentlyAdded
Noneparameter toget_request_hardware_counterto support hardware usage tracking while maintaining backward compatibility. This change is part of the system-wide implementation of hardware measurement for operations.lib/collection/src/shards/replica_set/mod.rs (1)
948-956: Hardware measurement parameter correctly integrated in update_local callThe additional parameter
hw_measurement_accis now properly passed to theupdate_localmethod, ensuring hardware I/O metrics are collected during point deletion operations.lib/collection/src/shards/conversions.rs (2)
65-65: Function signature updated to include unmeasured_on_remote parameterThis boolean parameter has been added to control whether hardware measurements should be tracked for remote operations, supporting the PR's goal of adding hardware measurement capabilities.
83-83: Properly setting unmeasured field based on parameterThe new parameter is correctly used to set the
unmeasuredfield in theUpsertPointsInternalstruct, allowing the caller to control whether hardware metrics should be collected for this operation when executed remotely.src/actix/helpers.rs (1)
20-22: Conditional hardware reporting based on wait parameterThe addition of the
waitparameter and the logic to setreport_to_api = report_to_api && wait != Some(false)ensures hardware metrics are only reported when appropriate. This prevents reporting empty or incorrect hardware metrics in non-blocking API responses.The parameter might seem confusing at first glance, but it serves an important purpose in ensuring accurate hardware metrics reporting. As discussed in previous reviews, making this an explicit parameter helps avoid implementation errors across the codebase.
lib/collection/tests/integration/lookup_test.rs (1)
72-75: Test updated to include hardware measurement parameterThe test now correctly creates and passes a
HwMeasurementAccinstance to theupdate_from_client_simplemethod, aligning with the API changes. This ensures tests remain compatible with the enhanced hardware measurement tracking.src/actix/api/local_shard_api.rs (3)
111-111: Added parameter for hardware monitoring configuration.The addition of
Noneas the last argument toget_request_hardware_countermaintains consistency with other API calls in this file. This is part of the systematic implementation of hardware monitoring across all endpoints.
185-185: Added parameter for hardware monitoring configuration.The addition of
Noneas the last argument toget_request_hardware_countercompletes the implementation of hardware monitoring for all endpoint functions in this file.
53-53:Details
✅ Verification successful
Added parameter for hardware monitoring configuration.
The addition of
Noneas the last argument toget_request_hardware_counteraligns with the PR's objective to implement hardware IO measurements. This parameter likely controls waiting behavior for hardware measurements.
🏁 Script executed:
#!/bin/bash # Verify the signature of get_request_hardware_counter to understand the parameter addition # Find the function definition in the codebase rg -n "fn get_request_hardware_counter" --type rustLength of output: 114
Action: Approve the parameter addition for hardware monitoring
The addition of
Noneas the final argument to theget_request_hardware_countercall insrc/actix/api/local_shard_api.rsproperly aligns with the PR’s objective. The search confirms thatget_request_hardware_counteris defined insrc/actix/helpers.rsand now accepts this optional parameter to adjust waiting behavior for hardware measurements.
- Confirmed the function’s location:
src/actix/helpers.rs- Verified that the passed
Noneis consistent with the intended design for hardware IO measurementlib/collection/src/collection/clean.rs (1)
336-337:Details
✅ Verification successful
Proper implementation of hardware monitoring for internal operations.
The use of
HwMeasurementAcc::disposable()for theupdate_localoperation is appropriate since this is an internal cleaning operation where we don't need to track or report hardware usage metrics to clients.
🏁 Script executed:
#!/bin/bash # Verify how disposable HwMeasurementAcc is used in similar internal operations # Look for other instances of disposable hardware measurement accumulators rg "HwMeasurementAcc::disposable" --type rustLength of output: 3001
Hardware Measurement Metrics Usage Confirmed
The usage of
HwMeasurementAcc::disposable()in the call to.update_local(delete_operation, last_batch, HwMeasurementAcc::disposable())is consistent with similar internal operations across the codebase. As verified by its consistent usage in other modules (e.g., internal cleaning operations inlib/collection/src/collection/clean.rsand elsewhere), it's appropriate for operations where hardware measurement tracking isn’t needed.lib/storage/src/content_manager/toc/request_hw_counter.rs (1)
14-14:Details
✅ Verification successful
Added Clone derivation for better hardware counter management.
Adding
#[derive(Clone)]to theRequestHwCounterstruct is essential for propagating hardware measurements through the API chain. This allows the counter to be duplicated and passed to multiple operations without ownership conflicts, which is particularly important for accumulating hardware metrics across different components.
🏁 Script executed:
#!/bin/bash # Check for instances where RequestHwCounter is cloned to understand the usage pattern rg "request_hw_counter.*clone" --type rustLength of output: 834
Clone Derivation Confirmed for RequestHwCounter
The addition of
#[derive(Clone)]inlib/storage/src/content_manager/toc/request_hw_counter.rsis validated by its active usage insrc/tonic/api/update_common.rs, where multiple calls torequest_hw_counter.clone()are observed. This confirms that the change is essential for correctly propagating hardware measurements through the API chain. No further modifications are required.lib/collection/src/shards/forward_proxy_shard.rs (1)
186-186:Details
✅ Verification successful
Updated to use the specialized unmeasured update method.
The change from
updatetoupdate_remote_unmeasuredis appropriate for internal shard transfer operations. This specialized method likely avoids redundant hardware measurements for internal operations while maintaining the functionality of the update.
🏁 Script executed:
#!/bin/bash # Verify the implementation of update_remote_unmeasured to understand its behavior # Look for the definition of update_remote_unmeasured rg "fn update_remote_unmeasured" --type rust # Check if there are other internal operations using this method rg "\.update_remote_unmeasured" --type rustLength of output: 265
Shard Transfer Method – Confirmed Correct Usage
The specialized method
update_remote_unmeasuredis implemented inlib/collection/src/shards/remote_shard.rsand is correctly invoked withinlib/collection/src/shards/forward_proxy_shard.rsfor internal shard transfer operations. This change avoids redundant hardware measurements and preserves the intended update functionality.src/actix/api/query_api.rs (3)
119-124: LGTM: Consistent parameter addition.The
Noneparameter addition matches the updated function signature, maintaining consistency with the implementation in thequery_pointsendpoint.
199-204: LGTM: Consistent parameter addition.The
Noneparameter addition matches the updated function signature, maintaining consistency with other query endpoints in this file.
52-57:Details
✅ Verification successful
Confirm hardware reporting parameter usage is consistent.
The
Noneparameter added toget_request_hardware_counterensures the function properly handles the new hardware measurement tracking feature. This appears to be part of a system-wide update to add measurement capabilities for operations.
🏁 Script executed:
#!/bin/bash # Verify consistency of get_request_hardware_counter calls in API endpoints echo "Checking for consistent usage of get_request_hardware_counter across API files:" rg --type rust "get_request_hardware_counter\(" src/actix/api/ -A 5Length of output: 11038
Hardware Reporting Parameter Usage Confirmed
The verification shows that the usage of
get_request_hardware_counteris consistent across the API endpoints. In all endpoints (includingquery_api.rs,count_api.rs,facet_api.rs, etc.), the function is called withservice_config.hardware_reporting()paired withNone. The exception is inupdate_api.rs, whereSome(params.wait)is intentionally provided for its specific operational requirements.No changes are required here.
lib/collection/tests/integration/collection_test.rs (8)
61-64: LGTM: Hardware measurement integrates properly with test operation.The addition of the hardware counter parameter allows for tracking I/O operations during test execution. The implementation is clean and straightforward.
132-135: LGTM: Consistent hardware counter implementation.The hardware counter is properly initialized and passed to the update method, consistent with the pattern established in the first test.
236-239: LGTM: Proper hardware counter usage.The hardware counter is correctly initialized and passed to the update method.
379-386: LGTM: Good use of clone for hardware counter reuse.Properly cloning the hardware counter allows it to be reused in both the update operation and the recommend operation, allowing for cumulative measurement if needed.
444-447: LGTM: Consistent hardware counter implementation.The hardware counter is properly initialized and passed to the update method.
542-545: LGTM: Consistent hardware counter implementation.The hardware counter is properly initialized and passed to the update method.
796-803: LGTM: Good use of clone for hardware counter reuse.Properly cloning the hardware counter allows it to be reused in both operations.
822-824: LGTM: Consistent hardware counter reuse.The same hardware counter is used for the delete operation, which is appropriate for tracking cumulative measurements across operations.
tests/consensus_tests/fixtures.py (2)
50-65: LGTM: New helper for payload updates with telemetry tracking.This new function will be helpful for testing hardware measurement during payload update operations. It properly handles the HTTP request and returns the JSON response for further verification.
209-215: LGTM: Clean implementation for retrieving hardware telemetry data.This function provides a simple way to get hardware telemetry information for a specific collection, enabling tests to verify that measurements are being properly tracked and reported.
lib/api/src/grpc/conversions.rs (3)
2117-2128: LGTM: Added hardware usage field to response conversion.The
usagefield is now properly included in the conversion from internal to external response format, ensuring hardware measurements are preserved across API boundaries.
2134-2144: LGTM: Properly implemented bi-directional conversion with hardware usage.The reverse conversion (external to internal) also correctly handles the hardware usage field, maintaining consistency in both directions.
2596-2606:Details
✅ Verification successful
Verify hardware measurements are accurately converted to protobuf format.
The implementation of
From<HwMeasurementAcc> for HardwareUsagelooks good, converting all relevant measurement fields. Make sure these are collected and reported correctly in all contexts where they're used.
🏁 Script executed:
#!/bin/bash # Check where HardwareUsage is used and if any fields might be missing echo "Checking protobuf definition of HardwareUsage:" rg -A 10 "message HardwareUsage" --glob "*.proto" echo "\nChecking where HardwareUsage is accumulated and reported:" rg "HardwareUsage" --type rust lib/api/src/Length of output: 2745
Verified Conversion Logic: The conversion function from
HwMeasurementAcctoHardwareUsageinlib/api/src/grpc/conversions.rscorrectly maps all fields. The protobuf definition (found inlib/api/src/grpc/proto/points.proto) includescpu,payload_io_read,payload_io_write,vector_io_read, andvector_io_write, and all of these are being accurately converted. Additionally, the usage ofHardwareUsageacross the codebase has been confirmed, with no missing or additional fields detected.lib/collection/src/update_handler.rs (2)
59-59: Addition of hardware measurement tracking to OperationDataThe new field
hw_measurementscorrectly stores hardware measurement data in the operation data structure, which will help track resource usage during operations.
702-707: Hardware measurement integration for update operationsGood change - replacing the disposable counter with the measurement accumulator from the operation data allows for tracking hardware usage throughout the update process.
lib/collection/src/shards/local_shard/shard_ops.rs (2)
41-41: Parameter renamed to indicate active usageGood change - removing the underscore prefix indicates that the parameter is now being actively used rather than ignored.
91-91: Hardware measurement accumulator passed to operation dataThe hardware measurement accumulator is now correctly passed to the update operation, allowing resource usage to be tracked during shard operations.
lib/collection/src/shards/queue_proxy_shard.rs (2)
698-699: Hardware measurement parameter added to transfer_operations_batchThe function signature has been properly updated to include hardware measurement tracking capabilities.
715-715: Hardware measurement propagated to remote updateThe hardware measurement accumulator is correctly passed to the forward_update method, ensuring consistent tracking across local and remote operations.
lib/collection/tests/integration/grouping_test.rs (3)
83-90: Hardware measurement integration in test setupThe test has been correctly updated to include hardware measurement when calling update_from_client_simple.
523-552: Hardware counter reused across multiple test operationsGood approach - creating a single hardware counter and reusing it with clone() for multiple operations in the test setup.
581-586: Hardware measurement passed to update operation in lookup collectionThe hardware measurement accumulator is correctly passed to the update operation, ensuring consistent implementation across all test cases.
src/common/metrics.rs (1)
341-341: Appropriate change from gauge to counter for hardware metrics.The metric type has been correctly changed from
GAUGEtoCOUNTERfor all hardware metrics, as these represent cumulative values that only increase over time. This change aligns with the comment from timvisee in a previous review who pointed out that a total should be represented as a counter.Also applies to: 348-348, 355-355, 362-362, 369-369
src/tonic/api/points_api.rs (3)
46-57: Improved function signature with optional waiting configuration.The addition of the
waitparameter to control whether hardware tracking should wait for operations to complete is a good enhancement. The logic for determining thewaitingvalue is also clear and defaults correctly totruewhenwaitisNone.
72-73: Consistent extraction of wait parameter from update requests.The code correctly extracts the wait parameter from requests and passes it to the hardware counter initialization for update operations.
Also applies to: 97-98, 144-145
119-120: Consistent use of None for read operations.Read operations correctly pass
Noneas the wait parameter, which will default to waiting behavior. This is consistent across all read operations.Also applies to: 167-170, 237-240
lib/api/src/grpc/proto/points_internal_service.proto (2)
49-49: New unmeasured flag for internal upsert operations.The addition of the
unmeasuredflag allows for internal operations to opt out of hardware usage measurement. This is useful for performance optimization in cases where measurement isn't needed.Consider documenting the purpose of this flag with a comment to enhance code readability:
- optional bool unmeasured = 4; + optional bool unmeasured = 4; // When true, hardware usage will not be measured for this operation
104-104: Added hardware usage field to response messages.The addition of the
HardwareUsagefield to response messages is consistent with the PR objective to report hardware usage metrics through the API. This implementation allows clients to receive detailed information about resource consumption.Also applies to: 323-323
src/actix/api/recommend_api.rs (2)
60-65: Consistently passing None for hardware counter in recommend APIs.The addition of the
Noneparameter to allget_request_hardware_countercalls is consistent with the changes in other API endpoints. This ensures uniform behavior across the entire codebase.Also applies to: 146-151, 211-216
1-239:Details
❓ Verification inconclusive
Consider adding integration test for hardware measurement in recommendation API.
While the code changes look correct, it would be valuable to add integration tests that verify the hardware measurements are properly captured and reported for recommendation operations.
🏁 Script executed:
#!/bin/bash # Description: Check for integration tests related to hardware measurements in recommendation API # Look for tests that verify hardware measurements in recommendation operations rg -A 3 "recommend.*hardware|hardware.*recommend" --type=rust lib/collection/tests/Length of output: 85
Integration Test for Hardware Measurement:
The recommendation API endpoints correctly incorporate hardware measurement viaget_request_hardware_counter, but our search did not reveal any integration tests verifying that these measurements are captured and reported as expected. Please verify manually whether such tests exist in other parts of the repository; if not, consider adding integration tests to ensure that hardware measurements in the recommendation operations are fully validated.lib/storage/src/content_manager/toc/point_ops.rs (9)
448-448: Parameter addition for hardware measurement tracking.Adding
hw_measurement_accparameter to the_update_shard_keysmethod allows for tracking hardware resource usage during shard key updates. This is consistent with the overall PR objective of measuring hardware IO for update operations.
459-461: Properly propagating hardware measurement to collection operations.The hardware measurement accumulator is correctly passed to the
update_from_clientmethod, with cloning to ensure each update operation gets its own copy of the accumulator.
477-486: Function signature updated with hardware measurement parameter.The
#[allow(clippy::too_many_arguments)]is necessary as the function now exceeds the default limit for parameter count. The addition ofhw_measurement_accparameter to the public update method ensures consistency with the internal implementation.
548-550: Hardware measurement propagation for non-empty shard selector.The hardware measurement accumulator is correctly propagated to the client update operation, ensuring resource usage is tracked across all code paths.
562-564: Hardware measurement propagation for all shards.Hardware measurement is correctly passed to the client update when no specific shard keys are provided, ensuring comprehensive tracking of IO operations.
572-574: Hardware measurement for multi-shard updates.The hardware measurement accumulator is properly passed to the
_update_shard_keysmethod, ensuring consistent measurement across all update code paths.
585-587: Hardware measurement for specific shard key updates.Hardware usage is properly tracked for shard key-specific updates, maintaining consistent measurement across all update pathways.
597-599: Hardware measurement for multiple shard key updates.The hardware measurement accumulator is correctly propagated for multi-shard key operations, ensuring comprehensive tracking across all update paths.
609-611: Hardware measurement for peer updates.Hardware usage is properly tracked for shard-id specific update operations coming from peers, completing the coverage of all update code paths.
lib/storage/src/content_manager/data_transfer.rs (2)
134-137: Proper handling of internal operations.Using
HwMeasurementAcc::disposable()is appropriate for internal operations that shouldn't affect user-facing hardware metrics. This ensures data replication operations don't skew hardware usage reporting.
237-240: Consistent approach to internal operations.Similar to the replication function, using a disposable hardware measurement accumulator for index transfers prevents these internal operations from affecting user-facing metrics. This maintains the distinction between user-initiated operations and system activities.
tests/consensus_tests/test_hw_measurement.py (1)
5-5:Details
✅ Verification successful
Consider explicit imports instead of star imports.
Using wildcard imports (
from .utils import *) makes it difficult to determine which symbols are imported and can lead to namespace pollution.-from .utils import * +from .utils import ( + start_cluster, wait_collection_exists_and_active_on_all_peers, + check_collection_points_count, wait_collection_points_count +)
🏁 Script executed:
#!/bin/bash # Check which functions from utils are actually used in this file grep -o "start_cluster\|wait_collection_exists_and_active_on_all_peers\|check_collection_points_count\|wait_collection_points_count" tests/consensus_tests/test_hw_measurement.pyLength of output: 360
Explicit Imports for Utils Module – Replace Wildcard Import
The verification confirms that all four functions—
start_cluster,wait_collection_exists_and_active_on_all_peers,check_collection_points_count, andwait_collection_points_count—are actually in use in this test file. To prevent potential namespace pollution and improve code clarity, please replace the star import with explicit imports. The diff snippet below illustrates the necessary change:
- File: tests/consensus_tests/test_hw_measurement.py (Line 5)
- from .utils import * + from .utils import ( + start_cluster, + wait_collection_exists_and_active_on_all_peers, + check_collection_points_count, + wait_collection_points_count + )🧰 Tools
🪛 Ruff (0.8.2)
5-5:
from .utils import *used; unable to detect undefined names(F403)
lib/collection/src/collection/point_ops.rs (10)
31-31: Parameter addition for hardware measurement tracking.Adding the
hw_measurement_accparameter to theupdate_all_localmethod is consistent with the PR's objective of measuring hardware IO for update operations. This enables tracking hardware resource usage during local updates.
53-57: Properly propagating hardware measurements to local updates.The hardware measurement accumulator is correctly cloned and passed to the
update_localmethod for each shard, ensuring accurate tracking of resource usage.
93-93: Parameter addition for peer update operations.Adding the
hw_measurement_accparameter to theupdate_from_peermethod extends hardware usage tracking to peer-to-peer updates, aligning with the PR objectives.
106-106: Hardware measurement for weak ordering updates.The hardware measurement accumulator is correctly propagated to the
update_localmethod in the weak ordering case, ensuring consistent measurement across all ordering types.
116-118: Hardware measurement for medium/strong ordering updates.The hardware measurement accumulator is correctly passed to the
update_with_consistencymethod for medium and strong ordering, completing coverage for all ordering types.
145-145: Parameter addition for client update operations.Adding the
hw_measurement_accparameter to theupdate_from_clientmethod extends hardware usage tracking to client-initiated updates, ensuring comprehensive coverage of update operations.
159-159: Local clone for thread-safe hardware measurement in async context.Creating a local clone of the hardware measurement accumulator ensures that the async block has its own copy, avoiding potential thread-safety issues in the FuturesUnordered execution.
169-175: Hardware measurement for update_all operations.The hardware measurement accumulator is correctly passed to the
update_with_consistencymethod for "update_all" operations, ensuring resource tracking for all parts of the update process.
181-187: Hardware measurement for update_only_existing operations.The hardware measurement accumulator is correctly propagated to the
update_with_consistencymethod for "update_only_existing" operations, completing the coverage for all update operation types.
251-254: Parameter addition for simplified client update method.Adding the
hw_measurement_accparameter to theupdate_from_client_simplemethod and properly forwarding it toupdate_from_clientensures that the simplified update interface also tracks hardware usage.lib/collection/src/shards/remote_shard.rs (8)
226-227: PassHwMeasurementAccparameter consistently.Adding the
hw_measurement_accparameter toforward_updateis consistent with similar refactoring across the codebase and ensures hardware usage can be tracked here as well.
236-238: Initialize measurement logic carefully.By passing
hw_measurement_accandfalsetoexecute_update_operation, you ensure that measurement is enabled for remote operations by default. Confirm that this matches the intended logic (sincefalseindicates measuring on the remote).
253-255: Extend usage toggles for hardware measurement.The
hw_measurement_accandunmeasured_on_remoteparameters expand the function signature. Validate that these new flags are always used as intended to avoid inconsistent measurement states.
272-273: Propagate 'unmeasured_on_remote' correctly.Forwarding this boolean ensures the remote logic can correctly decide to skip or enable hardware measurement. This addition appears consistent with the rest of the update code.
507-515: Accumulate remote usage metrics.Accumulating hardware usage from
point_operation_response.usageintohw_measurement_accis correct. Additionally, ensure that the response usage only reflects newly introduced operations (i.e., no double counting across multiple calls).
674-699: Introduce specialized unmeasured update.The
update_remote_unmeasuredmethod clearly documents that it should only be used for internal operations. Confirm the call sites are truly internal to prevent unintended skipping of hardware measurement.
712-726: Add measurement parameter toupdateroutine.Taking
hw_measurement_accensures consistent metric tracking at the top-level update function. The introduction offalseforunmeasured_on_remotehere aligns with the default logic for end-user updates.
1120-1127: Record usage infacetmethod.Collecting CPU and I/O usage after
facetcalls is consistent with other endpoints. Make sure that the concurrency model (if any parallel calls exist) does not inadvertently mix usage data from multiple calls.src/actix/api/update_api.rs (13)
10-10: AddHwMeasurementAccimport.Introducing this import aligns with the hardware measurement tracking refactor.
54-55: Propagatewaitinto measurement logic.Passing
Some(params.wait)intoget_request_hardware_counterensures that synchronous or asynchronous waiting is included in the usage reporting logic.
66-66: Initiate measurement on upsert.Calling
request_hw_counter.get_counter()to track hardware usage for upsert operations is consistent with the new design.
94-95: Consistent usage tracking in delete endpoint.Including
Some(params.wait)and reusingrequest_hw_counter.get_counter()ensures the delete operation also reports hardware usage.Also applies to: 106-107
135-136: Add hardware counter for vector updates.These changes ensure
update_vectorsconsistently reports usage, helping to gather performance metrics on vector operations.Also applies to: 147-148
176-177: Include measurement for vector deletion.Adopting the same pattern for the
delete_vectorsendpoint keeps hardware usage reporting consistent across vector CRUD operations.Also applies to: 187-188
215-216: Set payload usage reporting.By passing the request hardware counter, the
set_payloadfunction now collects CPU and I/O usage metrics for payload operations.Also applies to: 226-227
253-254: Enable usage measurement for overwrite payload.Just like
set_payload, the overwrite operation can now measure hardware usage, thanks to the newly added counter parameter.Also applies to: 264-265
291-292: Delete payload usage tracking.Ensuring measurement for both synchronous and asynchronous scenarios in
delete_payloadhelps capture resource usage across all payload removal paths.Also applies to: 302-303
329-330: Clear payload updates.Integrating the hardware counter ensures clearing payload data is also measured, consistent with the other routes.
Also applies to: 340-341
380-381: Batch updates include hardware usage.Adding the counter to the batch endpoint ensures aggregated operations also feed into the global measurement logic.
Also applies to: 394-395
418-419: Use disposable counter for create field index.The disposable measurement indicates field index creation is not fully tracked as a user-heavy operation, which may be acceptable. Consider whether the overhead is minimal enough to measure as well.
441-442: Similarly skip measurement for delete field index.Following the same pattern as create, this also uses a disposable counter. Confirm it remains an acceptable approach from a performance perspective.
src/tonic/api/update_common.rs (14)
5-5: Introduce hardware usage types and request counter.Adding these imports (
HardwareUsage,HwMeasurementAcc,RequestHwCounter) lays the foundation for measuring hardware usage in the Tonic layer.Also applies to: 22-22, 28-28
45-46: Track usage on upsert.Injecting
request_hw_counterand capturing usage withinpoints_operation_response_internalensures that upsert calls accumulate hardware metrics.Also applies to: 75-76, 79-80
90-91: Delete operation usage reporting.These additions mirror the upsert approach, ensuring deletion is properly captured in usage metrics.
Also applies to: 118-119, 122-123
133-134: Integrate measurement for updating vectors.Consistent with the other changes, capturing usage in the Tonic layer for
update_vectorsensures coverage for all vector update paths.Also applies to: 175-176, 179-180
189-190: Delete vectors usage additions.Similar to the update logic, ensuring vector deletions feed into usage data helps maintain consistent hardware measurement coverage.
Also applies to: 225-226, 229-230
239-240: Record usage for set payload.Accepting the request counter in
set_payloadensures that memory, I/O, and CPU usage from payload operations is captured at the Tonic layer.Also applies to: 273-274, 278-279
287-288: Enable reporting on overwrite payload.Aligning with set payload, the overwrite operation now also updates hardware usage reporting.
Also applies to: 321-322, 326-327
335-336: Capture usage on delete payload.These lines complete the coverage of payload-related operations for Tonic’s usage tracking.
Also applies to: 367-368, 372-373
381-382: Support clear payload measurement.Ensuring the clear payload call is tracked closes the loop on all main payload operations.
Also applies to: 409-410, 412-413
423-424: Expanded batch updates coverage.Including
request_hw_counterin theupdate_batchoperation ensures that each sub-operation within the batch is accurately reflected in usage metrics.Also applies to: 457-458, 474-475, 487-488, 499-500, 525-526, 548-549, 567-568, 589-590, 612-613, 628-629
669-670: Create field index with usage.Although “unmeasured” in some contexts, accepting the request counter can help if the operation is elevated to full measurement later.
Also applies to: 696-697, 700-701
761-762: Delete field index also unmeasured.Code follows the same approach as index creation. If usage data becomes more relevant, it can be measured consistently in the future.
Also applies to: 764-765
840-841: Sync operation remains unmeasured.While it is designated “API unmeasured,” consider verifying that skipping usage measurement here is intentional, as it can involve potentially large data transfers.
Also applies to: 844-844
848-852: Include optional usage in the response structure.Adding
usage: Option<HardwareUsage>toPointsOperationResponseInternalallows the final response to include hardware metrics. This is consistent with the broader changes.Also applies to: 856-857
lib/collection/src/shards/replica_set/update.rs (6)
30-35: LGTM: Parameter added for hardware measurement trackingThe addition of the
hw_measurementparameter to theupdate_localmethod is consistent with the approach taken throughout the codebase to enable hardware usage tracking for update operations.
100-106: LGTM: Coherently passing hardware metrics to update operationThe hardware measurement accumulator is correctly propagated to the
updatemethod call in theupdate_with_consistencymethod to track hardware usage.
177-183: LGTM: Hardware metric tracking added to update signatureThe addition of the
hw_measurement_accparameter to theupdatemethod aligns with the broader changes to track hardware metrics during operations.
238-245: LGTM: Hardware measurement accumulator added to update_impl signatureThe hardware measurement accumulator parameter is consistently added to the
update_implmethod signature.
635-642: LGTM: Hardware metrics correctly added to forward_update methodThe
hw_measurement_accparameter is appropriately added to theforward_updatemethod signature, which ensures hardware usage is tracked even when updates are forwarded to other nodes.
305-306:Details
✅ Verification successful
Ensure hardware accumulator is properly tracked across execution contexts
The local update operation correctly clones and passes the hardware accumulator to the async block. Make sure the
HwMeasurementAcctype can safely track hardware metrics across asynchronous execution contexts.Also applies to: 322-323
🏁 Script executed:
#!/bin/bash # Checking if HwMeasurementAcc is designed to work safely with async operations # Look for synchronization primitives in the implementation rg -A 10 -B 10 "pub struct HwMeasurementAcc" --glob "*.rs" # Check if there are any explicit comments or code related to async safety rg "async|tokio::sync|parking_lot|std::sync" --glob "*hardware_accumulator*.rs"Length of output: 2224
Confirmed: Hardware accumulator is safely managed in async contexts
After reviewing the implementation in
lib/common/common/src/counter/hardware_accumulator.rs, theHwMeasurementAcctype is implemented using thread-safe primitives (ArcandAtomicUsize). This design ensures that it can safely track hardware metrics across asynchronous execution contexts. The cloning of the accumulator passed into the async block on lines 305-306 (and similarly on 322-323) is appropriate and maintains thread safety across execution contexts.src/common/update.rs (3)
14-14: LGTM: Import for hardware measurement trackingThe import of
HwMeasurementAccis correctly added to support the new hardware measurement functionality.
239-240: LGTM: Consistent addition of hardware measurement parameterThe hardware measurement accumulator parameter is consistently added to all operation functions. This ensures comprehensive tracking of hardware usage across all update operations.
Also applies to: 281-282, 316-317, 349-350, 414-415, 452-453, 491-492, 527-528, 562-563
267-268: LGTM: Hardware measurement propagated to update functionThe hardware measurement accumulator is correctly passed through to the underlying
updatefunction in all operation handlers.Also applies to: 302-303, 337-338, 398-399, 440-441, 479-480, 515-516, 548-549, 813-814
src/tonic/api/points_internal_api.rs (5)
151-152: LGTM: Hardware usage metrics added to facet responseThe
usagefield is now correctly populated with hardware usage data in theFacetResponseInternalstruct.
189-190: Good implementation of conditional hardware measurementThe code now supports an
unmeasuredflag to optionally disable hardware measurement tracking. This is useful for operations where tracking is not needed or for testing purposes.Also applies to: 192-200
227-231: LGTM: Consistent extraction pattern for hardware metricsThe pattern of extracting the internal request, then creating a hardware metrics counter specific to that collection is consistent and well-implemented.
287-291: Consistent hardware metrics implementation across all methodsThe hardware metrics collection is consistently implemented across all update methods, which is good for maintaining uniform behavior and tracking.
Also applies to: 315-319, 343-347, 371-375, 399-403
163-172:Details
✅ Verification successful
Ensure collection-specific hardware metrics are properly drained
The method
get_request_collection_hw_usage_counter_for_internalcreates a newRequestHwCounterwith a metrics drain. Verify that the metrics are properly drained to the collection's counter when operations complete.
🏁 Script executed:
#!/bin/bash # Check the implementation of HwMeasurementAcc's drain functionality # Find the implementation of new_with_metrics_drain rg -A 10 "new_with_metrics_drain" --glob "*hardware_accumulator*.rs" # Check how the metrics are drained rg "drain" --glob "*hardware_accumulator*.rs"Length of output: 3060
Review Update: Verified Proper Draining of Collection-Specific Metrics
After verifying the implementation in
src/tonic/api/points_internal_api.rsand reviewing the corresponding functionality inlib/common/common/src/counter/hardware_accumulator.rs, it is confirmed that theget_request_collection_hw_usage_counter_for_internalmethod correctly utilizesHwMeasurementAcc::new_with_metrics_drainby passing the collection-specific metrics drain fromself.toc.get_collection_hw_metrics(collection_name). The underlying implementation properly assigns the providedmetrics_drain(without creating a new counter) in accordance with the intended design, ensuring that hardware metrics are correctly drained after operations complete.No further modifications are required.
fe513cd to
2d0886a
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
lib/collection/src/shards/replica_set/update.rs (2)
49-52: Fix spelling error in comment.There's a typo in the comment - "Dont't" should be "Don't".
- // Dont't measure hw when resharding + // Don't measure hw when resharding🧰 Tools
🪛 GitHub Check: Check for spelling errors
[failure] 49-49:
Dont't ==> Don't🪛 GitHub Actions: Codespell
[error] 49-49: codespell: Dont't ==> Don't
121-125: Fix spelling error in comment.Same typo in this comment - "Dont't" should be "Don't".
- // Dont't measure hw when resharding + // Don't measure hw when resharding🧰 Tools
🪛 GitHub Check: Check for spelling errors
[failure] 121-121:
Dont't ==> Don'ttests/consensus_tests/assertions.py (1)
26-28: Fix variable shadowing in loop parameters.The loop control variables
leftandrightare shadowing the function parameters with the same names, which could lead to confusion in maintenance.-def assert_hw_measurements_equal_many(left: list[dict[str, int]], right: list[dict[str, int]]): - for left,right in zip(left, right): - assert_hw_measurements_equal(left, right) +def assert_hw_measurements_equal_many(left: list[dict[str, int]], right: list[dict[str, int]]): + for left_dict, right_dict in zip(left, right): + assert_hw_measurements_equal(left_dict, right_dict)🧰 Tools
🪛 Ruff (0.8.2)
27-27: Loop control variable
leftoverrides iterable it iterates(B020)
27-27: Loop control variable
rightoverrides iterable it iterates(B020)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
lib/collection/src/shards/replica_set/mod.rs(2 hunks)lib/collection/src/shards/replica_set/update.rs(10 hunks)src/common/metrics.rs(1 hunks)tests/consensus_tests/assertions.py(2 hunks)tests/consensus_tests/test_resharding.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/common/metrics.rs
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/consensus_tests/assertions.py
27-27: Loop control variable left overrides iterable it iterates
(B020)
27-27: Loop control variable right overrides iterable it iterates
(B020)
🪛 GitHub Check: Check for spelling errors
lib/collection/src/shards/replica_set/update.rs
[failure] 49-49:
Dont't ==> Don't
[failure] 121-121:
Dont't ==> Don't
🪛 GitHub Actions: Codespell
lib/collection/src/shards/replica_set/update.rs
[error] 49-49: codespell: Dont't ==> Don't
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: test (windows-latest)
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test-consensus
- GitHub Check: test
🔇 Additional comments (9)
lib/collection/src/shards/replica_set/mod.rs (2)
1292-1305: Well-designed utility method for checking resharding state.This method provides a clean abstraction to check if a replica is in any resharding state, making the code more readable and maintainable. The implementation correctly handles all enum variants and follows the pattern of existing methods.
948-956: Hardware measurement parameter properly passed to update_local method.The inclusion of the hardware measurement accumulator in the call to
update_localis consistent with the broader changes across the codebase to track hardware usage during operations.lib/collection/src/shards/replica_set/update.rs (5)
34-34: Type consistency in parameter modification.The parameter change from
hw_measurementtomut hw_measurementis appropriate, as the method needs to potentially modify this value for resharding states.
49-52: Hardware measurement optimization for resharding states.Good optimization to use a disposable hardware measurement accumulator when in resharding state. This prevents unnecessary tracking of metrics during potentially resource-intensive resharding operations.
🧰 Tools
🪛 GitHub Check: Check for spelling errors
[failure] 49-49:
Dont't ==> Don't🪛 GitHub Actions: Codespell
[error] 49-49: codespell: Dont't ==> Don't
121-125: Consistent implementation of hardware measurement handling for resharding.This implementation correctly handles the hardware measurement for the resharding case, consistent with the approach in
update_local. The check for resharding state before deciding on measurement tracking follows good design principles.🧰 Tools
🪛 GitHub Check: Check for spelling errors
[failure] 121-121:
Dont't ==> Don't
652-653: Consistent parameter propagation to forward_update.The hardware measurement accumulator is properly passed through to the forwarded update, ensuring that metrics are collected consistently across remote operations.
665-671: Complete parameter set for forwarded update.All required parameters, including the hardware measurement accumulator, are correctly passed to the remote leader, maintaining the measurement tracking chain.
tests/consensus_tests/test_resharding.py (1)
241-249:Details
✅ Verification successful
Good addition of hardware measurement validation.
The new code effectively validates that resharding transfers don't produce unexpected hardware measurements by comparing telemetry data before and after the transfer process. This aligns well with the PR objectives of ensuring accurate hardware IO measurement reporting.
🏁 Script executed:
#!/bin/bash # Verify the existence of get_telemetry_hw_info function in fixtures.py # since it's imported but was newly added fd -e py fixtures.py -x cat {} | grep -A 5 "get_telemetry_hw_info"Length of output: 319
Hardware Measurement Validation Confirmed
The updated test in
tests/consensus_tests/test_resharding.py(lines 241-249) correctly validates that no unexpected hardware measurement changes occur during the resharding transfer. The functionget_telemetry_hw_infowas successfully verified infixtures.py, and its behavior is as expected. Great work ensuring that telemetry data remains consistent in line with the PR objectives.tests/consensus_tests/assertions.py (1)
16-24: Good implementation of hardware measurement comparisons.The function effectively compares two dictionaries of hardware measurements, with special handling for keys present in only one dictionary and values greater than zero. The symmetric approach ensures both dictionaries are properly validated against each other.
2d0886a to
c357963
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
lib/collection/src/shards/replica_set/update.rs (4)
49-52: Consider refactoring repeated 'resharding check' blocks.
Both here and in lines 121–125, the code sets the measurement to disposable if resharding is detected. You could factor this logic into a small helper to keep the code DRY.
121-125: Duplicate logic for skipping hardware measurement.
As noted for lines 49–52, consider extracting a helper method to avoid duplicating the resharding check logic.
209-215: Excessive cloning of the accumulator could be costly.
IfHwMeasurementAcccloning is expensive, consider passing references or reusing an existing clone.
666-671: Appropriate propagation of hardware measurement to remote forward update.
The comment about clock tags is clear; consider clarifying in the doc comment why the forwarded clock tag must beNone.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/collection/src/shards/replica_set/update.rs(10 hunks)src/common/metrics.rs(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/common/metrics.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-consensus-compose
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-low-resources
- GitHub Check: test (macos-latest)
- GitHub Check: test-consistency
- GitHub Check: test (windows-latest)
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test-consensus
- GitHub Check: test
🔇 Additional comments (7)
lib/collection/src/shards/replica_set/update.rs (7)
34-34: Parameters introduced for local hardware measurement look good.
The addition ofmut hw_measurement: HwMeasurementAccis consistent with the update logic.
110-110: Good to see the hardware accumulator carried forward.
The mutable accumulator parameter is added consistently for consistency-based updates.
137-138: Clean integration of hardware measurement into update call.
No issues found with passing the accumulator here.
141-141: Forwarding the hardware measurement is consistent.
Forwarding with the new parameter ensures hardware metrics aren't lost when designating a leader.
193-193: Additional parameter forupdatefunction.
This change seamlessly aligns with other methods addinghw_measurement_acc.
255-255: Extended signature retains clarity.
Including the accumulator parameter follows the established pattern in this file.
653-653: Param added for hardware tracking inforward_update.
This ensures uniform coverage of hardware metrics in all update paths.
timvisee
left a comment
There was a problem hiding this comment.
Happy to approve after handling the following review comments:
There was a problem hiding this comment.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
tests/consensus_tests/test_hw_measurement.py (1)
92-102: 🛠️ Refactor suggestionImprove error handling in the
assert_with_upper_bound_errorfunction.The function uses
assert Falsewith a message, which can be problematic when Python is run with optimizations enabled (python -O) as it removes assertions.def assert_with_upper_bound_error(inp: int, min_value: int, upper_bound_error_percent: float = 0.05): """Asserts `inp` being equal to `min_value` with a max upperbound error given in percent.""" if inp < min_value: - assert False, f"Assertion {inp} >= {min_value} failed" + raise AssertionError(f"Assertion {inp} >= {min_value} failed") max_error = ceil(float(min_value) + float(min_value) * upper_bound_error_percent) upper_bound = inp + max_error if inp > upper_bound: - assert False, f"Assertion {inp} being below upperbound error of {upper_bound_error_percent}(={upper_bound}) failed." + raise AssertionError(f"Assertion {inp} being below upperbound error of {upper_bound_error_percent}(={upper_bound}) failed.")🧰 Tools
🪛 Ruff (0.8.2)
95-95: Do not
assert False(python -Oremoves these calls), raiseAssertionError()Replace
assert False(B011)
101-101: Do not
assert False(python -Oremoves these calls), raiseAssertionError()Replace
assert False(B011)
🧹 Nitpick comments (2)
tests/consensus_tests/test_hw_measurement.py (2)
5-5: Consider replacing wildcard import with explicit imports.Wildcard imports (
from .utils import *) make it hard to determine which functions are being used and where they're defined. This affects code readability and maintainability.-from .utils import * +from .utils import ( + start_cluster, + wait_collection_exists_and_active_on_all_peers, + check_collection_points_count, + wait_collection_points_count +)🧰 Tools
🪛 Ruff (0.8.2)
5-5:
from .utils import *used; unable to detect undefined names(F403)
59-60: Reuse the expected_delta variable for consistency.The assertion uses a hardcoded value
(19 * 5)which should match the expected_delta defined earlier. Reusing the variable would improve maintainability.- assert abs(peer_hw["payload_io_write"] - peer_hw_infos[peer_idx]["payload_io_write"]) >= (19 * 5) + expected_delta = 20 * 5 # 20 vectors times avg. 5 bytes payload + assert abs(peer_hw["payload_io_write"] - peer_hw_infos[peer_idx]["payload_io_write"]) >= expected_delta
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
lib/collection/src/shards/replica_set/update.rs(10 hunks)lib/common/common/src/counter/hardware_data.rs(1 hunks)tests/consensus_tests/test_hw_measurement.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- lib/collection/src/shards/replica_set/update.rs
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/consensus_tests/test_hw_measurement.py
5-5: from .utils import * used; unable to detect undefined names
(F403)
18-18: start_cluster may be undefined, or defined from star imports
(F405)
20-20: wait_collection_exists_and_active_on_all_peers may be undefined, or defined from star imports
(F405)
68-68: start_cluster may be undefined, or defined from star imports
(F405)
70-70: wait_collection_exists_and_active_on_all_peers may be undefined, or defined from star imports
(F405)
72-72: check_collection_points_count may be undefined, or defined from star imports
(F405)
80-80: wait_collection_points_count may be undefined, or defined from star imports
(F405)
95-95: Do not assert False (python -O removes these calls), raise AssertionError()
Replace assert False
(B011)
101-101: Do not assert False (python -O removes these calls), raise AssertionError()
Replace assert False
(B011)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test-consensus-compose
- GitHub Check: test (macos-latest)
- GitHub Check: test
- GitHub Check: test (windows-latest)
- GitHub Check: test-consensus
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
🔇 Additional comments (7)
lib/common/common/src/counter/hardware_data.rs (3)
1-9: Well-structured hardware metrics data container.The
HardwareDatastruct is clearly defined with appropriate fields for tracking various hardware metrics. The documentation is concise and explains the purpose of this structure.
11-13: Good wrapper type design for CPU measurements.The wrapper type with clear documentation explains its purpose of ensuring proper CPU multiplier application throughout the codebase.
15-25: Clean implementation with proper optimizations.The implementation correctly applies the CPU multiplier in the constructor and provides an inline getter method. Using
selfby value is appropriate since the struct implementsCopy.tests/consensus_tests/test_hw_measurement.py (4)
16-66: Thetest_measuring_hw_for_updatesfunction implements a comprehensive test for hardware I/O measurements.The test creates a collection, inserts points, and verifies that hardware telemetry data reflects the expected I/O operations. The implementation properly checks both the initial insertion and subsequent upsert operations.
Several TODOs indicate plans to add tests for vector operations in future PRs, which aligns with the PR objectives mentioning that vector storage measurements are planned for a subsequent PR.
🧰 Tools
🪛 Ruff (0.8.2)
18-18:
start_clustermay be undefined, or defined from star imports(F405)
20-20:
wait_collection_exists_and_active_on_all_peersmay be undefined, or defined from star imports(F405)
31-31: Make the number of points proportional to the shard count.This aligns with a previous review comment suggesting the use of
N_SHARDS * 20to ensure proper distribution across shards.
38-40: Consider asserting a range rather than using an absolute difference.The current assertion checks that the difference is at least a certain value, but it doesn't put an upper bound on the difference. This could allow for unexpectedly large increases to pass the test.
- expected_delta = (19 * 5) # ~20 vectors times avg. 5 bytes payload - assert abs(peer_hw["payload_io_write"] - peer_hw_infos[peer_idx]["payload_io_write"]) >= expected_delta + expected_delta = 20 * 5 # 20 vectors times avg. 5 bytes payload + assert_with_upper_bound_error( + abs(peer_hw["payload_io_write"] - peer_hw_infos[peer_idx]["payload_io_write"]), + expected_delta + )
67-91: Thetest_measuring_hw_for_updates_without_waitingfunction provides good coverage for asynchronous updates.This test validates that hardware metrics are collected correctly even when using
wait=falsein the upsert operation. The test properly waits for points to be inserted before checking the telemetry.🧰 Tools
🪛 Ruff (0.8.2)
68-68:
start_clustermay be undefined, or defined from star imports(F405)
70-70:
wait_collection_exists_and_active_on_all_peersmay be undefined, or defined from star imports(F405)
72-72:
check_collection_points_countmay be undefined, or defined from star imports(F405)
80-80:
wait_collection_points_countmay be undefined, or defined from star imports(F405)
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tests/consensus_tests/test_hw_measurement.py (1)
92-100:⚠️ Potential issueFix the calculation of upper bound in
assert_with_upper_bound_error.The current implementation calculates the upper bound incorrectly. It adds the percentage-based error to the input value instead of properly calculating the maximum error and adding it to the minimum value.
- upper_bound = ceil(float(min_value) + float(min_value) * upper_bound_error_percent) + max_error = ceil(float(min_value) * upper_bound_error_percent) + upper_bound = min_value + max_error🧰 Tools
🪛 Ruff (0.8.2)
95-95: Do not
assert False(python -Oremoves these calls), raiseAssertionError()Replace
assert False(B011)
100-100: Do not
assert False(python -Oremoves these calls), raiseAssertionError()Replace
assert False(B011)
🧹 Nitpick comments (6)
tests/consensus_tests/test_hw_measurement.py (6)
4-5: Consider replacing star imports with explicit imports.Using wildcard imports (
from .utils import *) makes it harder to track exactly what's being imported and used, which can lead to namespace pollution and unclear dependencies. Consider importing only the specific functions you need from the utils module.-from .utils import * +from .utils import ( + start_cluster, + wait_collection_exists_and_active_on_all_peers, + check_collection_points_count, + wait_collection_points_count +)🧰 Tools
🪛 Ruff (0.8.2)
5-5:
from .utils import *used; unable to detect undefined names(F403)
39-40: Consider documenting payload size assumptions.The test assumes an average payload size of 5 bytes per vector. If this is based on the implementation of
upsert_random_points, consider adding a comment explaining this assumption or parameterizing it for clarity and maintainability.
60-60: Consider extracting the expected payload size to a constant.The value "19 * 5" is repeated from line 39. Consider extracting this to a named constant at the top of the file to improve maintainability and make it easier to update if the payload size changes.
+# Average payload size in bytes per point +AVG_PAYLOAD_SIZE = 5 +# Expected minimum delta for payload writes per ~20 points +EXPECTED_PAYLOAD_DELTA = 19 * AVG_PAYLOAD_SIZE ... - expected_delta = (19 * 5) # ~20 vectors times avg. 5 bytes payload + expected_delta = EXPECTED_PAYLOAD_DELTA # ~20 vectors times avg. payload size ... - assert abs(peer_hw["payload_io_write"] - peer_hw_infos[peer_idx]["payload_io_write"]) >= (19 * 5) + assert abs(peer_hw["payload_io_write"] - peer_hw_infos[peer_idx]["payload_io_write"]) >= EXPECTED_PAYLOAD_DELTA
95-95: Replaceassert Falsewith raised exceptions.Using
assert Falsecan be problematic because assertions can be disabled with the-Oflag when running Python. Replace these with explicit exception raising for more reliable error handling.- assert False, f"Assertion {inp} >= {min_value} failed" + raise AssertionError(f"Assertion {inp} >= {min_value} failed") ... - assert False, f"Assertion {inp} being below upperbound error of {upper_bound_error_percent}(={upper_bound}) failed." + raise AssertionError(f"Assertion {inp} being below upperbound error of {upper_bound_error_percent}(={upper_bound}) failed.")Also applies to: 100-100
🧰 Tools
🪛 Ruff (0.8.2)
95-95: Do not
assert False(python -Oremoves these calls), raiseAssertionError()Replace
assert False(B011)
88-88: Consider usingassert_with_upper_bound_errorhere for consistency.Line 88 makes a simple greater-than-or-equal assertion, but you could use the
assert_with_upper_bound_errorhelper function for consistency with other parts of the codebase and to provide better error messages.- assert peer_hw["payload_io_write"] >= upsert_vectors * 5 # 50 vectors on this node with payload of ~5 bytes + assert_with_upper_bound_error( + peer_hw["payload_io_write"], + upsert_vectors * 5, # 50 vectors on this node with payload of ~5 bytes + upper_bound_error_percent=0.05 + )
28-28: Consider adding timeframes for implementing the TODOs.There are multiple TODO comments about implementing vector storage measurements, which matches the PR objective that mentioned vector storage measurements are planned for a future PR. Consider adding references to future work or tickets to make these TODOs more trackable.
Also applies to: 41-41, 64-64, 90-90
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/consensus_tests/test_hw_measurement.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/consensus_tests/test_hw_measurement.py
5-5: from .utils import * used; unable to detect undefined names
(F403)
18-18: start_cluster may be undefined, or defined from star imports
(F405)
20-20: wait_collection_exists_and_active_on_all_peers may be undefined, or defined from star imports
(F405)
68-68: start_cluster may be undefined, or defined from star imports
(F405)
70-70: wait_collection_exists_and_active_on_all_peers may be undefined, or defined from star imports
(F405)
72-72: check_collection_points_count may be undefined, or defined from star imports
(F405)
80-80: wait_collection_points_count may be undefined, or defined from star imports
(F405)
95-95: Do not assert False (python -O removes these calls), raise AssertionError()
Replace assert False
(B011)
100-100: Do not assert False (python -O removes these calls), raise AssertionError()
Replace assert False
(B011)
⏰ Context from checks skipped due to timeout of 90000ms (12)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test-consensus-compose
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus
- GitHub Check: test (windows-latest)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
🔇 Additional comments (1)
tests/consensus_tests/test_hw_measurement.py (1)
31-31: Ensure consistent point distribution across shards.This line seems to implement the suggestion from a previous review. The calculation of N_SHARDS * 20 ensures that points are distributed relatively evenly across the shards, creating a more consistent and predictable test environment.
Co-authored-by: Tim Visée <tim+github@visee.me>
e7ac5f1 to
0e6aa45
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tests/consensus_tests/test_hw_measurement.py (1)
92-100:⚠️ Potential issueFix the calculation of upper bound in
assert_with_upper_bound_error.The current implementation calculates the upper bound as
min_value + min_value * error_percent, which means the allowed error increases with the minimum value. The error percentage should be applied to the minimum value to get a fixed error margin.- upper_bound = ceil(float(min_value) + float(min_value) * upper_bound_error_percent) + max_error = ceil(float(min_value) * upper_bound_error_percent) + upper_bound = min_value + max_error🧰 Tools
🪛 Ruff (0.8.2)
95-95: Do not
assert False(python -Oremoves these calls), raiseAssertionError()Replace
assert False(B011)
100-100: Do not
assert False(python -Oremoves these calls), raiseAssertionError()Replace
assert False(B011)
🧹 Nitpick comments (12)
lib/api/src/grpc/qdrant.rs (3)
5798-5799: Add documentation for theusagefield.The new field lacks documentation describing its purpose and usage. Adding comments similar to the existing ones for the
timefield would improve API usability and maintainability./// Time spent to process #[prost(double, tag = "2")] pub time: f64, + /// Hardware resources consumed during processing #[prost(message, optional, tag = "3")] pub usage: ::core::option::Option<HardwareUsage>,
9303-9304: Add documentation for theusagefield.The new field lacks documentation describing its purpose and usage. Adding comments similar to the existing ones for the
timefield would improve API usability and maintainability./// Time spent to process #[prost(double, tag = "2")] pub time: f64, + /// Hardware resources consumed during processing #[prost(message, optional, tag = "3")] pub usage: ::core::option::Option<HardwareUsage>,
9793-9794: Add documentation for theusagefield.The new field lacks documentation describing its purpose and usage. Adding comments similar to the existing ones for the
timefield would improve API usability and maintainability./// Time spent to process #[prost(double, tag = "2")] pub time: f64, + /// Hardware resources consumed during processing #[prost(message, optional, tag = "3")] pub usage: ::core::option::Option<HardwareUsage>,tests/consensus_tests/fixtures.py (2)
50-64: Add a docstring for clarityThis new function
update_points_payloadlacks a docstring explaining its purpose, parameters, and return value. This is inconsistent with other functions in the file.Consider adding a docstring like:
def update_points_payload( peer_url, points, collection_name="test_collection", wait="true", ): """ Update payload of specified points in a collection. Args: peer_url: URL of the peer to send request to points: List of point IDs to update collection_name: Name of the collection (default: "test_collection") wait: Whether to wait for the operation to complete (default: "true") Returns: JSON response from the server """
209-214: Add docstring for telemetry functionThis new function
get_telemetry_hw_infolacks a docstring explaining its purpose, parameters, and return value.Consider adding a docstring like:
def get_telemetry_hw_info(peer_url, collection): """ Retrieve hardware telemetry information for a specific collection. Args: peer_url: URL of the peer to send request to collection: Name of the collection to get telemetry for Returns: JSON response containing hardware telemetry data for the specified collection """lib/collection/src/shards/queue_proxy_shard.rs (1)
505-506: Consider reusing disposable hardware measurementA new disposable hardware measurement is created for each retry attempt, which is inefficient. Consider creating it once before the loop and reusing it.
- for remaining_attempts in (0..BATCH_RETRIES).rev() { - let disposed_hw = HwMeasurementAcc::disposable(); // Internal operation - match transfer_operations_batch(&batch, &self.remote_shard, disposed_hw).await { + let disposed_hw = HwMeasurementAcc::disposable(); // Internal operation + for remaining_attempts in (0..BATCH_RETRIES).rev() { + match transfer_operations_batch(&batch, &self.remote_shard, disposed_hw.clone()).await {src/tonic/api/points_api.rs (1)
46-57: Add descriptive type documentation to thewaitparameter.The new
waitparameter is added to control hardware reporting behavior, but its purpose isn't immediately clear from the code. Consider adding a documentation comment that explains what this parameter does and its default behavior.- fn get_request_collection_hw_usage_counter( - &self, - collection_name: String, - wait: Option<bool>, - ) -> RequestHwCounter { + /// Returns a hardware usage counter for the given collection + /// + /// # Parameters + /// * `collection_name` - Name of the collection to get metrics for + /// * `wait` - Controls whether hardware metrics are reported to API. + /// If `None` or not `Some(false)`, hardware reporting is enabled. + fn get_request_collection_hw_usage_counter( + &self, + collection_name: String, + wait: Option<bool>, + ) -> RequestHwCounter {lib/api/src/grpc/conversions.rs (1)
2617-2635: Consider adding a test for the HardwareUsage to HardwareData conversion.The conversion from
HardwareUsagetoHardwareDatalooks correct, but it's important to ensure that the CPU multiplier logic is properly tested since it involves a fixed multiplier of 1.Add unit tests to verify that the conversion correctly handles all hardware usage metrics, especially the CPU measurement which uses a special type.
tests/consensus_tests/test_hw_measurement.py (4)
5-5: Avoid wildcard imports.Using wildcard imports (
from .utils import *) makes it difficult to know which symbols are imported and can lead to name collisions. Consider explicitly importing only the needed functions.-from .utils import * +from .utils import ( + start_cluster, + wait_collection_exists_and_active_on_all_peers, + check_collection_points_count, + wait_collection_points_count +)🧰 Tools
🪛 Ruff (0.8.2)
5-5:
from .utils import *used; unable to detect undefined names(F403)
39-40: Improve assertion to validate minimum and maximum values.The assertion only checks that the change is at least a certain value, but it doesn't set an upper bound. Consider using the
assert_with_upper_bound_errorfunction to validate within a reasonable range.- expected_delta = (19 * 5) # ~20 vectors times avg. 5 bytes payload - assert abs(peer_hw["payload_io_write"] - peer_hw_infos[peer_idx]["payload_io_write"]) >= expected_delta + expected_delta = (19 * 5) # ~20 vectors times avg. 5 bytes payload + delta = abs(peer_hw["payload_io_write"] - peer_hw_infos[peer_idx]["payload_io_write"]) + assert_with_upper_bound_error(delta, expected_delta)
95-95: Replaceassert Falsewithraise AssertionError.Using
assert Falsecan be problematic because assertions are skipped when Python is run with the-Ooptimization flag. Useraise AssertionErrorto ensure the error is always raised.- assert False, f"Assertion {inp} >= {min_value} failed" + raise AssertionError(f"Assertion {inp} >= {min_value} failed")🧰 Tools
🪛 Ruff (0.8.2)
95-95: Do not
assert False(python -Oremoves these calls), raiseAssertionError()Replace
assert False(B011)
100-100: Replaceassert Falsewithraise AssertionError.Similar to the previous comment, use
raise AssertionErrorinstead ofassert Falseto ensure the error is always raised regardless of Python's optimization settings.- assert False, f"Assertion {inp} being below upperbound error of {upper_bound_error_percent}(={upper_bound}) failed." + raise AssertionError(f"Assertion {inp} being below upperbound error of {upper_bound_error_percent}(={upper_bound}) failed.")🧰 Tools
🪛 Ruff (0.8.2)
100-100: Do not
assert False(python -Oremoves these calls), raiseAssertionError()Replace
assert False(B011)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (51)
docs/grpc/docs.md(1 hunks)lib/api/src/grpc/conversions.rs(3 hunks)lib/api/src/grpc/proto/points.proto(1 hunks)lib/api/src/grpc/proto/points_internal_service.proto(2 hunks)lib/api/src/grpc/qdrant.rs(3 hunks)lib/collection/src/collection/clean.rs(2 hunks)lib/collection/src/collection/payload_index_schema.rs(3 hunks)lib/collection/src/collection/point_ops.rs(9 hunks)lib/collection/src/collection/sharding_keys.rs(3 hunks)lib/collection/src/shards/local_shard/shard_ops.rs(2 hunks)lib/collection/src/shards/queue_proxy_shard.rs(3 hunks)lib/collection/src/shards/remote_shard.rs(13 hunks)lib/collection/src/shards/replica_set/mod.rs(2 hunks)lib/collection/src/shards/replica_set/update.rs(10 hunks)lib/collection/src/tests/points_dedup.rs(1 hunks)lib/collection/src/update_handler.rs(4 hunks)lib/collection/tests/integration/collection_restore_test.rs(3 hunks)lib/collection/tests/integration/collection_test.rs(9 hunks)lib/collection/tests/integration/distance_matrix_test.rs(1 hunks)lib/collection/tests/integration/grouping_test.rs(4 hunks)lib/collection/tests/integration/lookup_test.rs(1 hunks)lib/collection/tests/integration/multi_vec_test.rs(1 hunks)lib/collection/tests/integration/pagination_test.rs(1 hunks)lib/collection/tests/integration/snapshot_recovery_test.rs(1 hunks)lib/common/common/src/counter/hardware_accumulator.rs(3 hunks)lib/common/common/src/counter/hardware_counter.rs(4 hunks)lib/common/common/src/counter/hardware_data.rs(1 hunks)lib/common/common/src/counter/mod.rs(1 hunks)lib/storage/src/content_manager/data_transfer.rs(2 hunks)lib/storage/src/content_manager/toc/point_ops.rs(6 hunks)lib/storage/src/content_manager/toc/request_hw_counter.rs(1 hunks)src/actix/api/count_api.rs(1 hunks)src/actix/api/discovery_api.rs(2 hunks)src/actix/api/facet_api.rs(1 hunks)src/actix/api/local_shard_api.rs(3 hunks)src/actix/api/query_api.rs(3 hunks)src/actix/api/recommend_api.rs(3 hunks)src/actix/api/retrieve_api.rs(3 hunks)src/actix/api/search_api.rs(5 hunks)src/actix/api/update_api.rs(16 hunks)src/actix/helpers.rs(1 hunks)src/common/metrics.rs(1 hunks)src/common/update.rs(38 hunks)src/tonic/api/points_api.rs(28 hunks)src/tonic/api/points_internal_api.rs(9 hunks)src/tonic/api/update_common.rs(35 hunks)tests/consensus_tests/assertions.py(2 hunks)tests/consensus_tests/fixtures.py(2 hunks)tests/consensus_tests/test_hw_measurement.py(1 hunks)tests/consensus_tests/test_resharding.py(2 hunks)tests/consensus_tests/utils.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (38)
- lib/common/common/src/counter/mod.rs
- lib/collection/src/collection/sharding_keys.rs
- src/actix/api/facet_api.rs
- lib/collection/tests/integration/snapshot_recovery_test.rs
- lib/collection/tests/integration/distance_matrix_test.rs
- lib/api/src/grpc/proto/points.proto
- lib/collection/tests/integration/multi_vec_test.rs
- src/actix/api/discovery_api.rs
- lib/storage/src/content_manager/toc/request_hw_counter.rs
- lib/collection/tests/integration/pagination_test.rs
- src/actix/api/count_api.rs
- lib/collection/src/collection/payload_index_schema.rs
- lib/collection/src/tests/points_dedup.rs
- src/actix/api/retrieve_api.rs
- tests/consensus_tests/test_resharding.py
- tests/consensus_tests/utils.py
- lib/collection/tests/integration/lookup_test.rs
- src/actix/api/query_api.rs
- lib/api/src/grpc/proto/points_internal_service.proto
- lib/collection/tests/integration/collection_test.rs
- lib/collection/tests/integration/grouping_test.rs
- src/actix/api/search_api.rs
- docs/grpc/docs.md
- tests/consensus_tests/assertions.py
- src/actix/api/local_shard_api.rs
- src/actix/helpers.rs
- lib/collection/src/shards/replica_set/mod.rs
- lib/collection/tests/integration/collection_restore_test.rs
- lib/collection/src/shards/local_shard/shard_ops.rs
- src/actix/api/recommend_api.rs
- lib/storage/src/content_manager/data_transfer.rs
- lib/storage/src/content_manager/toc/point_ops.rs
- lib/common/common/src/counter/hardware_data.rs
- lib/collection/src/collection/clean.rs
- src/actix/api/update_api.rs
- src/common/metrics.rs
- lib/collection/src/collection/point_ops.rs
- lib/collection/src/shards/replica_set/update.rs
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/consensus_tests/test_hw_measurement.py
5-5: from .utils import * used; unable to detect undefined names
(F403)
18-18: start_cluster may be undefined, or defined from star imports
(F405)
20-20: wait_collection_exists_and_active_on_all_peers may be undefined, or defined from star imports
(F405)
68-68: start_cluster may be undefined, or defined from star imports
(F405)
70-70: wait_collection_exists_and_active_on_all_peers may be undefined, or defined from star imports
(F405)
72-72: check_collection_points_count may be undefined, or defined from star imports
(F405)
80-80: wait_collection_points_count may be undefined, or defined from star imports
(F405)
95-95: Do not assert False (python -O removes these calls), raise AssertionError()
Replace assert False
(B011)
100-100: Do not assert False (python -O removes these calls), raise AssertionError()
Replace assert False
(B011)
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-consistency
- GitHub Check: test
- GitHub Check: test (macos-latest)
- GitHub Check: test (windows-latest)
- GitHub Check: test-consensus
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (69)
lib/api/src/grpc/qdrant.rs (1)
5798-5799: Changes appropriately track hardware usage metrics in response objects.These changes align well with the PR objective of adding hardware usage measurements to operations. The optional field ensures backward compatibility with existing clients.
Also applies to: 9303-9304, 9793-9794
lib/collection/src/update_handler.rs (3)
59-59: LGTM: Hardware measurements field additionThe addition of the
hw_measurementsfield toOperationDatalooks good and follows the project's pattern for tracking hardware metrics.
689-690: LGTM: Field extraction in pattern matchingPattern matching updated correctly to extract the hardware measurements field from the operation data.
702-707: LGTM: Proper passing of hardware measurementThe hardware measurement counter is correctly passed to the update function, replacing the disposable counter with the one from the operation data.
lib/collection/src/shards/queue_proxy_shard.rs (2)
692-693: LGTM: Function signature updateThe function signature is properly updated to include the hardware measurement accumulator parameter.
704-711: LGTM: Hardware measurement in forward updateThe hardware measurement accumulator is correctly passed to the
forward_updatemethod.lib/common/common/src/counter/hardware_counter.rs (3)
101-103: LGTM: New getter method for CPU measurementThis method correctly returns the real CPU value with the multiplier applied, encapsulating the calculation logic.
160-178: LGTM: Implementation of get_hw_data methodThe new method provides a clean way to get all hardware measurements as a single data structure. This improves code organization and maintainability.
180-182: LGTM: Simplified accumulator mergingThe refactored
merge_to_accumulatormethod is much cleaner, using the newget_hw_datamethod to obtain all measurements at once.src/tonic/api/points_api.rs (1)
55-56: Consider renaming the variable for clarity.The variable name
waitingdoesn't clearly convey its purpose. Since it's used to determine whether to report metrics to the API, a name likereport_to_apiwould better describe its function.- let waiting = wait != Some(false); - RequestHwCounter::new(counter, self.service_config.hardware_reporting() && waiting) + let report_to_api = wait != Some(false); + RequestHwCounter::new(counter, self.service_config.hardware_reporting() && report_to_api)lib/common/common/src/counter/hardware_accumulator.rs (3)
56-71: Good abstraction of hardware data accumulation.This implementation nicely centralizes the hardware data accumulation logic, making the code more maintainable and reducing duplication. The struct destructuring provides good readability.
136-140: Improved API with generic parameter.The refactoring to accept a generic type parameter that implements
Into<HardwareData>makes the API more flexible and extensible, while reducing code duplication. This is a good design choice.
143-148: Consistent implementation with the accumulate method.The consistent implementation between
accumulateandaccumulate_requestmaintains a clean API while providing specialized functionality.lib/api/src/grpc/conversions.rs (2)
2127-2137: Hardware usage tracking integration for response types.The inclusion of the
usagefield in conversions between internal and external response types ensures hardware telemetry data is preserved during transformations.
2142-2153: Consistent implementation for the reverse conversion.The bidirectional conversions consistently handle the
usagefield, ensuring data integrity when converting between different response types.tests/consensus_tests/test_hw_measurement.py (1)
31-31: Good test design for shard-awareness.Using
N_SHARDS * 20ensures that you test a proportional number of points for each shard, which is a good approach for cluster-wide testing.src/tonic/api/update_common.rs (16)
5-5: Added imports for hardware usage trackingThese import statements correctly bring in the new
HardwareUsage,HwMeasurementAcc, andRequestHwCountertypes for usage reporting. No issues found.Also applies to: 22-22, 28-28
39-46: Added hardware usage parameter & final usage reporting inupsertThe added
request_hw_counterparameter, retrieval viarequest_hw_counter.get_counter(), and final usage reporting throughrequest_hw_counter.to_grpc_api()seem consistent and correct. This approach ensures the function captures hardware usage metrics accurately.Also applies to: 75-80
84-91: Introducedrequest_hw_counterfor thedeletefunctionSimilar to
upsert, these additions correctly pass the hardware counter through the call chain, culminating in the usage included in the response. Implementation appears sound, and no issues are found regarding concurrency or error handling.Also applies to: 118-123
127-134: Hardware usage integration inupdate_vectorsThe
request_hw_counterparameter and usage reporting fit well into the function flow. The usage is properly collected and surfaced in the response object with no visible logic or concurrency issues.Also applies to: 175-180
184-190:delete_vectorsfunction extended withrequest_hw_counterThis function consistently follows the established pattern of requesting a counter, performing operations, and then embedding usage results in the response. The logic is clear and maintains code consistency.
Also applies to: 225-230
234-240: Integrated hardware usage intoset_payloadThe newly added parameter and final usage reporting lines appear correct and follow the same design pattern as other update functions. No concerns to raise.
Also applies to: 273-279
282-288:overwrite_payloadnow includes hardware usageAdopting the same pattern of obtaining the hardware accumulator and returning it in the final response. This addition is consistent with the rest of the update methods.
Also applies to: 321-327
330-336: Extendeddelete_payloadwith usage trackingExtends usage tracking to payload deletion as well. The logic is straightforward, and no issues with concurrency or error handling are apparent.
Also applies to: 367-373
376-382:clear_payloadhardware usage parameter propagationIn line with the other functions, the hardware usage is fetched, utilized, and returned. The design remains consistent.
Also applies to: 408-414
417-424: Comprehensive hardware usage injection inupdate_batchMultiple lines reflect the propagation of the
request_hw_counterthrough the entire batch update process for each operation variant. The logic is coherent, bridging the usage data to each individual operation call. Ensure thorough testing to verify correct accumulation of usage metrics across batched updates.Also applies to: 457-458, 474-475, 499-500, 525-526, 548-549, 567-568, 589-590, 612-613, 628-629, 648-649
664-670: Extendedcreate_field_indexwith hardware metricsPassing in
request_hw_counterand finalizing usage withto_grpc_api()is consistent. No immediate concerns.Also applies to: 696-697, 701-702
705-706:create_field_index_internalusage logicEmploys a disposable accumulator in line 730 and omits usage in the response, consistent with the inline comment indicating "API unmeasured". This approach aligns with the existing pattern where internal calls may not measure usage if not needed.
Also applies to: 730-731, 734-735
738-767: Added disposable hardware usage fordelete_field_indexLines 761-762 and 765-766 use a disposable accumulator again to mark unmeasured operations. The approach is correct for indexing concerns, and no concurrency or correctness issues stand out.
769-796: Disposable hardware usage indelete_field_index_internalSimilar to
create_field_index_internal, the usage is intentionally omitted from the final response. The change is consistent with other internal index operations.
798-846: Extendedsyncmethod with disposable hardware usageLines 840-841 and 844-845 show usage injection as “API unmeasured.” This design choice is aligned with the existing pattern for internal synchronization. No issues found.
848-859:points_operation_response_internalextended to accommodateusageThe final structure now includes an optional
HardwareUsagefield. This is central to presenting collected metrics from the addedrequest_hw_counter. Implementation looks correct.src/tonic/api/points_internal_api.rs (18)
62-62: Added optional hardware usage field inquery_batch_internalThe
request_hw_dataparameter andusage: request_hw_data.to_grpc_api()insertion are correctly implemented to measure hardware activity for batch queries. Looks consistent with the rest of the codebase.Also applies to: 103-103
112-113:facet_counts_internalupdated to report usageThe new request parameter
request_hw_dataand the final usage assignment are properly placed. This ensures facet count operations also capture hardware usage.Also applies to: 151-151
191-196:upsertfunction changesExtraction of the internal request, retrieval of collection usage counter, and final call to
upsertwith hardware metrics are consistently structured. No issues spotted.Also applies to: 203-203
222-227: Hardware usage indeletefunctionThe newly introduced
hw_metricsusage aligns with the pattern of internal request extraction and usage accumulation. Implementation matches the design established in the other methods.Also applies to: 234-235
253-258:update_vectorsusage injectionExtracts the request, obtains the hardware counter, then calls the standard method with usage as an argument. The approach is sound.
Also applies to: 265-266
282-287: Added hardware usage todelete_vectorsAdheres to the same pattern: an internal request is extracted, and hardware usage is tracked. The changes look good.
Also applies to: 294-295
310-315:set_payloadchangesConsistently uses
get_request_collection_hw_usage_counter_for_internalto gather metrics and passes them downstream. No concerns found.Also applies to: 321-322
338-343:overwrite_payloadmethod instrumentationSimilarly, these lines incorporate hardware usage retrieval and injection into the overwrite pipeline. Implementation is uniform with the rest of the code.
Also applies to: 349-350
366-371: Enabling usage reporting fordelete_payloadNo discrepancies in usage injection for payload deletion. The approach is entirely consistent with the other internal methods.
Also applies to: 377-378
394-399:clear_payloadusage trackingClear payload now includes hardware usage. The logic remains in sync with established patterns.
Also applies to: 405-406
471-482: Core search batch usageThis block obtains a hardware counter for the collection and uses it while performing a
core_search_list. Capturing usage for these internal search calls enables consistent telemetry.
507-513: Adding usage tracking forrecommendHardware usage retrieval and injection are now present for internal recommendation operations. This is aligned with the other newly instrumented calls.
535-545: Extendedscrollfunctionality with HW counterGrabs the hardware usage counter for scrolling and includes it in the subsequent call. The pattern is uniform and correct.
565-575: Included usage measurement ingetThis fetch route is now consistent with other paths, capturing hardware usage data properly.
592-604:countfunction updated for usage metricsAcquires the new usage counter, then calls the underlying method. Straightforward, with no detected issues.
611-627:syncremains uninstrumented via disposable usageThe sync logic does not seem to incorporate final usage results in the response, which matches the pattern established in other places for “unmeasured” internal calls. This choice is consistent and documented.
645-655:query_batchcapturing usageMimics the pattern of obtaining hardware usage within the collection context for batch queries. Implementation is uniform.
665-669:facetroute usage injectionThis matches the previously changed
facet_counts_internal, ensuring that internal calls also capture usage metrics. No issues found.src/common/update.rs (15)
14-14: AddedHwMeasurementAccimportBrings in the hardware accumulator to support hardware usage tracking. Implementation detail looks standard.
230-240:do_upsert_pointsnow includes hardware measurementsReceives a
hw_measurement_accand passes it into the coreupdateflow. The design is consistent and should reliably capture usage data.
272-282:do_delete_pointsusage accumulationThis deletion logic parallels the upsert approach by carrying
hw_measurement_accand forwarding it toupdate. No issues found.
307-317: Propagated accumulator intodo_update_vectorsAgain, the changes integrate
hw_measurement_accinto the existing flow, with consistent usage. Implementation looks correct.
342-405:do_delete_vectorsupdated for hardware usageEnsures that both filter-based and point-based vector deletions track usage properly. Multiple calls to
updateare made, each receiving the hardware accumulator. Good structure.
407-443:do_set_payloadgains hardware usage parameterThis addition is straightforward, matching the existing pattern of passing
hw_measurement_accthroughupdate. No concerns.
445-482:do_overwrite_payloadwith usage trackingSame design approach for overwriting payload. The logic for capturing usage remains uniform.
484-518:do_delete_payloadinstrumentationNo special edge cases appear unaddressed. This method reliably collects usage metrics through the
hw_measurement_acc.
520-551:do_clear_payloadaligned with usage accumulationMaintains consistency with other payload operations, passing usage upward into
update.
553-570: Hardware usage indo_batch_update_pointsEach sub-operation within the batch references the cloned
hw_measurement_acc, preventing usage data loss. This design is coherent for batched updates.Also applies to: 576-580
672-722:do_create_indexflow includes usage metricsAfter the consensus operation, the
do_create_index_internalcall withhw_measurement_accensures indexing also tracks hardware usage.
724-751: Usage integrated indo_create_index_internalAs with other methods, it routes the accumulator to
update. Straightforward addition with no red flags.
753-791:do_delete_indexinstrumentationThe changes ensure index deletion is also measured, following the same pattern as create index.
793-816:do_delete_index_internaltracking usageContinues the pattern of passing
hw_measurement_accinto theupdatecall for internal index deletions.
818-878:updatefunction now acceptshw_measurement_accThe main entry point for updates. The approach of passing the accumulator to
toc.updatecentralizes usage tracking. Implementation is coherent, with no concurrency or performance concerns evident.lib/collection/src/shards/remote_shard.rs (4)
226-227: Function signature enhancements for hardware usage trackingThe additional
hw_measurement_accparameter has been consistently added to all necessary operation methods. This will allow for comprehensive hardware usage tracking across remote shard operations.Also applies to: 251-252, 676-677, 1016-1017
503-505: Clean hardware usage accumulationGood implementation of hardware usage accumulation from point operation responses. The code properly checks for the presence of usage data before accumulating it.
682-690: Proper parameter passing to execute_update_operationThe method call has been properly updated to include the hardware measurement accumulator, maintaining consistent parameter order. The formatting change with each parameter on a new line also improves readability.
737-739: Consistent hardware usage accumulation across operation typesHardware usage accumulation has been consistently implemented across all operation types (scroll, search, count, retrieve, query_batch, facet). The pattern of checking for usage data before accumulation is maintained throughout.
Also applies to: 811-813, 876-878, 925-927, 982-984, 1052-1054
* Measure update operations hardware IO * Add support for distributed setups * also measure update_local * Add consensus tests for HW metrics of update operations * add test for upserting without waiting * Disable HW usage reporting when not waiting for update API * Review remarks * Fix resharding collecting hw measurements * Fix metric type * New struct HardwareData for better accumulation * Ensure we always apply CPU multiplier * Apply suggestions from code review * Update src/actix/api/update_api.rs Co-authored-by: Tim Visée <tim+github@visee.me> * Fix assert_with_upper_bound_error threshold calculation. * Clarifying why we don't measure shard cleanup --------- Co-authored-by: Tim Visée <tim+github@visee.me>
Depends on #5912
Adds measurements for all internal
updatefunctions, traits and reporting in rest+grpc API.This doesn't include vector storage yet, since measurements for this will be done in the next PR.
Payload measurements are already fully implemented and are now available in the API too.