Cardinality estimation IO measurements#6117
Conversation
6216639 to
570cfd0
Compare
c58bbae to
00c3999
Compare
570cfd0 to
e5cd057
Compare
00c3999 to
57412ea
Compare
e5cd057 to
8de3c8a
Compare
57412ea to
874eb0e
Compare
df104ed to
ed264d0
Compare
874eb0e to
d57c503
Compare
ed264d0 to
6f3ecd4
Compare
d57c503 to
eabc66d
Compare
6f3ecd4 to
6eb4428
Compare
| let len = self.postings.get(token_id as usize).map(|p| p.len()); | ||
| hw_counter | ||
| .payload_index_io_read_counter() | ||
| .incr_delta(size_of::<Option<CompressedPostingList>>() + len.unwrap_or(0)); |
There was a problem hiding this comment.
CompressedPostingList::len() only reads the len of two vecs, does that still count toward io reads?
| .map(|x| x.len()); | ||
| hw_counter | ||
| .payload_index_io_read_counter() | ||
| .incr_delta(size_of::<Option<PostingList>>() + len.unwrap_or(0)); |
There was a problem hiding this comment.
Same here, this len only reads the len of 1 vec
3d2b048 to
c766539
Compare
📝 WalkthroughWalkthroughThe changes consistently add a new parameter—a reference to a hardware counter—to numerous method signatures across the codebase. Methods related to cardinality estimation, point counting, and filtering now require an extra argument of type Tip ⚡🧪 Multi-step agentic review comment chat (experimental)
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms (13)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 2
🔭 Outside diff range comments (2)
lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs (1)
288-301:⚠️ Potential issueHardware counter integration for values_of_hash method.
Similar to the previous method, this correctly implements hardware counter tracking for IO operations.
The same issue with missing
size_ofimport applies here.Make sure to add the import as suggested in the previous comment.
lib/segment/src/index/field_index/map_index/mmap_map_index.rs (1)
274-288: 💡 Verification agent🧩 Analysis chain
Added placeholder for hardware measurement in iter_values_map
The TODO comment indicates this is a placeholder for future implementation. Consider defining a timeline or creating an issue to track the implementation of value collection in the hardware measurement.
🏁 Script executed:
#!/bin/bash # Check if there are other similar TODO comments echo "Searching for similar TODO comments..." rg "TODO\(io_measurement\): Collect values" --statsLength of output: 1027
Action Required: Establish Tracking for Hardware Measurement Implementation
The placeholder for hardware measurement in
lib/segment/src/index/field_index/map_index/mmap_map_index.rs(lines 274–288) is consistently used as a temporary stub. Note that similarTODO(io_measurement): Collect valuesmarkers exist in the following files:
lib/segment/src/index/field_index/null_index/mmap_null_index.rslib/segment/src/index/field_index/map_index/mutable_map_index.rslib/segment/src/index/field_index/map_index/immutable_map_index.rsPlease consider defining a timeline or creating a dedicated issue to track the implementation of value collection for hardware measurements across these modules.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (47)
lib/collection/src/collection_manager/holders/proxy_segment.rs(1 hunks)lib/collection/src/collection_manager/segments_searcher.rs(2 hunks)lib/collection/src/shards/forward_proxy_shard.rs(2 hunks)lib/collection/src/shards/local_shard/mod.rs(1 hunks)lib/collection/src/shards/local_shard/shard_ops.rs(1 hunks)lib/collection/src/shards/proxy_shard.rs(3 hunks)lib/collection/src/shards/queue_proxy_shard.rs(2 hunks)lib/collection/src/shards/replica_set/update.rs(1 hunks)lib/collection/src/shards/shard.rs(2 hunks)lib/quantization/src/encoded_vectors.rs(1 hunks)lib/segment/benches/conditional_search.rs(1 hunks)lib/segment/src/entry/entry_point.rs(1 hunks)lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs(1 hunks)lib/segment/src/index/field_index/bool_index/mod.rs(4 hunks)lib/segment/src/index/field_index/bool_index/simple_bool_index.rs(1 hunks)lib/segment/src/index/field_index/facet_index.rs(3 hunks)lib/segment/src/index/field_index/field_index_base.rs(2 hunks)lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs(1 hunks)lib/segment/src/index/field_index/full_text_index/inverted_index.rs(3 hunks)lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs(1 hunks)lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs(1 hunks)lib/segment/src/index/field_index/full_text_index/text_index.rs(2 hunks)lib/segment/src/index/field_index/geo_index/immutable_geo_index.rs(1 hunks)lib/segment/src/index/field_index/geo_index/mmap_geo_index.rs(1 hunks)lib/segment/src/index/field_index/geo_index/mod.rs(16 hunks)lib/segment/src/index/field_index/geo_index/mutable_geo_index.rs(3 hunks)lib/segment/src/index/field_index/map_index/immutable_map_index.rs(3 hunks)lib/segment/src/index/field_index/map_index/mmap_map_index.rs(3 hunks)lib/segment/src/index/field_index/map_index/mod.rs(23 hunks)lib/segment/src/index/field_index/map_index/mutable_map_index.rs(2 hunks)lib/segment/src/index/field_index/null_index/mmap_null_index.rs(3 hunks)lib/segment/src/index/field_index/numeric_index/mod.rs(1 hunks)lib/segment/src/index/hnsw_index/hnsw.rs(2 hunks)lib/segment/src/index/payload_index_base.rs(1 hunks)lib/segment/src/index/plain_payload_index.rs(2 hunks)lib/segment/src/index/plain_vector_index.rs(3 hunks)lib/segment/src/index/query_optimization/optimizer.rs(1 hunks)lib/segment/src/index/sparse_index/sparse_vector_index.rs(2 hunks)lib/segment/src/index/struct_payload_index.rs(6 hunks)lib/segment/src/segment/entry.rs(5 hunks)lib/segment/src/segment/facet.rs(3 hunks)lib/segment/src/segment/order_by.rs(1 hunks)lib/segment/src/segment/sampling.rs(1 hunks)lib/segment/src/segment/scroll.rs(2 hunks)lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs(1 hunks)lib/segment/tests/integration/payload_index_test.rs(9 hunks)src/actix/api/update_api.rs(0 hunks)
💤 Files with no reviewable changes (1)
- src/actix/api/update_api.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus
- GitHub Check: test (windows-latest)
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test-consensus-compose
- GitHub Check: test
🔇 Additional comments (119)
lib/segment/src/segment/sampling.rs (1)
22-22: Changes maintain functionality while integrating hardware counterThe modification correctly updates the call to
estimate_cardinalityby passing thehw_counterparameter, ensuring hardware resource usage is tracked during cardinality estimation without altering the core logic.lib/segment/src/vector_storage/dense/simple_dense_vector_storage.rs (1)
281-281: Now measuring deletions with disposable hardware counterThe code now properly passes a disposable hardware counter to the
update_storedmethod when deleting a vector, which enables gathering metrics for deletion operations.lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs (1)
85-94: Hardware counter integration properly tracks posting list I/OThe implementation correctly measures I/O operations by incrementing the payload index counter with both the size of the option type and the actual size of the posting list content.
CompressedPostingList::len()only reads the length of vectors, but this still counts toward I/O reads since accessing metadata involves memory operations that should be measured.lib/collection/src/shards/proxy_shard.rs (2)
149-155: Method signature correctly updated to include hardware counterThe
estimate_cardinalitymethod now properly accepts and forwards the hardware counter parameter to the wrapped shard, ensuring consistent measurement across the delegation chain.
180-181: Hardware counter correctly passed from accumulatorThe code now properly obtains the hardware counter from the measurement accumulator and passes it to the estimate_cardinality method, ensuring metrics are collected during filter-based cardinality estimation.
lib/collection/src/shards/forward_proxy_shard.rs (2)
7-7: Adding hardware counter import for tracking resource usageThe addition of the hardware counter import is necessary to support the new parameter in the method signature below.
236-242: Added hardware counter parameter to estimate_cardinality methodThis change consistently adds the hardware counter parameter to the
estimate_cardinalitymethod and passes it to the wrapped shard implementation. This aligns with the broader pattern of changes in this PR to track hardware resource usage during cardinality estimation operations.lib/collection/src/collection_manager/segments_searcher.rs (5)
775-775: Adding hardware counter import for test codeThe import is needed to support the usage of hardware counter in the test module.
804-805: Added hardware counter instantiation for testsCreating a new hardware counter instance to be used in the test cases.
808-809: Updated method call to include hardware counterThis change adds the hardware counter parameter to the method call, ensuring that tests reflect the updated API for performance tracking.
811-813: Updated second method call to include hardware counterSame pattern of change - adding the hardware counter to the method call to ensure test compliance with updated API.
819-824: Updated third method call to include hardware counterThis change completes the pattern of updating all calls to the
is_small_enough_for_unindexed_searchmethod to include the hardware counter parameter, ensuring all test cases properly reflect the updated API.lib/segment/src/segment/order_by.rs (1)
35-35: Updated cardinality estimation to use hardware counterThe change properly passes the hardware counter to the
estimate_cardinalitymethod, ensuring resource usage is tracked during this operation. This is consistent with the overall PR objective of adding hardware tracking to cardinality estimation functions.lib/segment/src/index/query_optimization/optimizer.rs (1)
129-129: Updated condition_cardinality method call to include hardware counterThis change ensures the hardware counter is passed through to the
condition_cardinalitymethod when converting conditions in query optimization. The update is consistent with the overall pattern of adding hardware tracking to cardinality-related functions across the codebase.lib/segment/src/entry/entry_point.rs (1)
206-210: Method signature change looks goodThe addition of the
hw_counterparameter to theestimate_point_countmethod is consistent with the project's goal of tracking hardware metrics during cardinality estimation operations.lib/segment/benches/conditional_search.rs (1)
145-145: Updated call correctly uses the hardware counterThe
hw_counterparameter has been correctly added to theestimate_cardinalitymethod call, using the variable that was already defined at line 136.lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs (1)
164-179: Potential issue with hardware counter calculationThe implementation correctly adds the
hw_counterparameter, but there may be an issue with how the counter is being updated.The calculation
size_of::<Option<PostingList>>() + len.unwrap_or(0) * size_of::<u8>()might not accurately represent the actual memory used by the posting list.This calculation only considers the length of one vector as mentioned in a past review, and it multiplies by
size_of::<u8>()which may not be the correct size for each element in the posting list. Consider using the actual size of the elements stored in the posting list.lib/collection/src/shards/queue_proxy_shard.rs (1)
223-231: Method signature and implementation updated correctlyThe
estimate_cardinalitymethod has been properly updated to include thehw_counterparameter and correctly forwards it to the wrapped shard's implementation.lib/segment/src/index/sparse_index/sparse_vector_index.rs (3)
266-270: Method signature updated to include hardware counter parameter.The method signature now includes a
hw_counterparameter which allows tracking hardware performance metrics during cardinality estimation.
275-275: Hardware counter correctly propagated to payload index.The
hw_counterparameter is now passed to theestimate_cardinalitymethod of the payload index, ensuring consistent hardware performance tracking throughout the cardinality estimation process.
438-439: Hardware counter correctly passed to cardinality estimation.The call to
get_query_cardinalitynow includes the hardware counter from the vector query context, ensuring performance metrics are tracked throughout the query execution path.lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs (2)
204-209: Method signature updated to include hardware counter parameter.The
get_posting_lenmethod now accepts and utilizes a hardware counter parameter, properly propagating it to the underlying postings implementation.
213-213: Helpful comment added explaining hardware counter usage.The comment clarifies that hardware counter propagation is not needed in this specific context since the function is only used for building the HNSW index, which is helpful for future maintenance.
lib/collection/src/shards/local_shard/mod.rs (2)
990-990: Method signature updated to include hardware counter parameter.Added
hw_counterparameter to theestimate_cardinalitymethod to enable hardware performance tracking during cardinality estimation.
995-1000: Hardware counter correctly propagated to segment estimations.The hardware counter is now properly passed to each segment's
estimate_point_countmethod, ensuring consistent performance tracking across all segments during cardinality estimation operations.lib/quantization/src/encoded_vectors.rs (2)
34-34: Fixed typo in parameter name.Corrected the parameter name from
hw_coutertohw_counterin thescore_pointmethod for consistency and correctness.
36-36: Fixed typo in parameter name.Corrected the parameter name from
hw_coutertohw_counterin thescore_internalmethod for consistency and correctness.lib/segment/src/index/field_index/bool_index/simple_bool_index.rs (1)
358-362: New hardware counter parameter with TODO for future implementation.The method signature has been updated to include
_hw_counter: &HardwareCounterCellbut it's not used yet, as indicated by the underscore prefix and the TODO comment. This is consistent with the PR objectives which indicate that some areas have TODOs for future improvements.Make sure the hardware counter collection is implemented in a follow-up PR as noted in the comment.
Are there other places in this file where hardware counters could be utilized for measurements? For example, the
filtermethod on line 339 has an_hw_accparameter that seems related.lib/segment/tests/integration/payload_index_test.rs (8)
467-473: Hardware counter integration for cardinality estimation.The
HardwareCounterCellis properly initialized and passed to theestimate_cardinalitymethod.
503-503: Consistent use of hardware counter for estimation.The hardware counter is correctly passed to the second cardinality estimation call.
565-571: Hardware counter for is_empty condition tests.The hardware counter is properly initialized and consistently used for both struct and plain segments.
Also applies to: 577-577
678-684: Hardware counter for cardinality estimation tests.The hardware counter is properly initialized and passed to the cardinality estimation method.
724-729: Hardware counter for nested array filter tests.The hardware counter is properly initialized and passed to the cardinality estimation method.
788-793: Hardware counter for nested array nesting tests.The hardware counter is properly initialized and passed to the cardinality estimation method.
879-885: Hardware counter for struct payload index tests.The hardware counter is properly initialized and passed to the cardinality estimation method.
1087-1092: Hardware counter for nested fields tests.The hardware counter is properly initialized and passed to the cardinality estimation method.
lib/collection/src/shards/local_shard/shard_ops.rs (1)
216-220: Add hardware counter to cardinality estimation.The change correctly passes the hardware counter to the
estimate_cardinalitymethod from the existinghw_measurement_accparameter, maintaining functionality while enabling performance monitoring.lib/segment/src/index/field_index/null_index/mmap_null_index.rs (3)
4-4: Add import for HardwareCounterCell.Import added to support the new parameter in the
estimate_cardinalitymethod.
255-259: New hardware counter parameter with TODO for future implementation.The method signature has been updated to include
_hw_counter: &HardwareCounterCellbut it's not used yet, as indicated by the underscore prefix and the TODO comment. This is consistent with the PR objectives which indicate that some areas have TODOs for future improvements.Make sure the hardware counter collection is implemented in a follow-up PR. Consider if similar measurement could be added to other methods like
filteron line 209 which already has a related_hw_counterparameter.
434-440: Update test to use hardware counter.The test has been properly updated to create a
HardwareCounterCellinstance and pass it to theestimate_cardinalitycalls.lib/segment/src/index/field_index/facet_index.rs (4)
1-1: LGTM - Proper hardware counter importThe import for
HwMeasurementAccis correctly added to support the new parameter in the trait and implementation.
20-23: LGTM - Trait interface updated correctlyThe
FacetIndextrait'siter_values_mapmethod signature is properly updated to include the hardware measurement accumulator parameter.
60-63: LGTM - Enum implementation updated consistentlyThe implementation of
iter_values_mapforFacetIndexEnumis properly updated to include the hardware measurement accumulator parameter.
65-68: LGTM - Parameter correctly propagatedThe hardware measurement accumulator parameter is correctly passed to all variant implementations, maintaining consistent behavior across all enum variants.
lib/segment/src/index/field_index/full_text_index/text_index.rs (4)
131-136: LGTM - Method signature updated properlyThe
estimate_cardinalitymethod inFullTextIndexcorrectly adds the hardware counter parameter.
138-146: LGTM - Implementation consistently updatedThe hardware counter parameter is correctly passed to all the inverted index implementations in each match arm.
355-359: LGTM - PayloadFieldIndex trait implementation updatedThe
estimate_cardinalitymethod implementation forPayloadFieldIndexis properly updated to include the hardware counter parameter.
361-362: LGTM - Hardware counter propagationThe hardware counter is correctly passed to the parse_query method and then used in the estimate_cardinality call.
lib/collection/src/shards/replica_set/update.rs (1)
548-551: LGTM - Properly passes hardware counter to cardinality estimationThe hardware counter is correctly passed to the
estimate_request_cardinalitymethod, allowing for proper measurement of hardware usage during the cardinality estimation process.lib/segment/src/segment/entry.rs (5)
361-361: LGTM - Hardware counter added to filtering checkThe
should_pre_filtermethod call is properly updated to include the hardware counter parameter.
385-385: LGTM - Consistent hardware counter usage in ordered filteringThe hardware counter is correctly propagated to the
should_pre_filtermethod in the ordered filtering path.
412-412: LGTM - Hardware counter for random filteringThe hardware counter parameter is consistently passed to the
should_pre_filtermethod in the random filtering path.
456-460: LGTM - Method signature updated for hardware trackingThe
estimate_point_countmethod signature is properly updated to include the hardware counter parameter, enabling resource usage tracking during point count estimation.
473-473: LGTM - Hardware counter correctly propagated to underlying estimationThe hardware counter is properly passed to the payload index's cardinality estimation method.
lib/segment/src/index/hnsw_index/hnsw.rs (2)
539-541: Implementation of hardware counter for cardinality estimationThe disposable hardware counter is correctly created and passed to the
estimate_cardinalitymethod, ensuring that hardware performance monitoring is properly integrated for internal operations.
1223-1226: TODO for hardware counter propagationThe code is currently using a disposable hardware counter that doesn't propagate measurements. The TODO comment correctly identifies this as an area for future improvement.
Based on the PR objectives, this is likely one of the TODOs mentioned as areas for improvement that will be addressed in a future PR (mentioned as depending on PR #5985).
lib/segment/src/index/field_index/numeric_index/mod.rs (1)
687-691: Method signature updated with hardware counter parameterThe
estimate_cardinalitymethod has been updated to accept a hardware counter parameter, consistent with other similar method signature updates across the codebase. The TODO comment correctly indicates that actual hardware counter collection is planned for future implementation.lib/segment/src/index/plain_payload_index.rs (2)
112-116: Added hardware counter parameter with helpful commentThe method signature has been properly updated to include the hardware counter parameter, with a clear comment indicating that no measurements are needed for this particular implementation.
126-134: Updated nested cardinality estimation to pass hardware counterThe
estimate_nested_cardinalitymethod correctly forwards the hardware counter to theestimate_cardinalitymethod, maintaining consistency in the API.lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (1)
397-414: Implemented hardware counter tracking for boolean index operationsThis change properly integrates the hardware counter to track I/O operations during cardinality estimation. The code correctly increments the counter by the slice length before counting ones in the slice.
lib/segment/src/index/payload_index_base.rs (2)
61-65: LGTM: Added hardware counter parameter to estimate_cardinality.The addition of the
hw_counterparameter to the trait method signature is appropriate and consistent with the hardware counter integration pattern being implemented throughout the codebase.
68-73: LGTM: Added hardware counter parameter to estimate_nested_cardinality.Consistent with the changes to
estimate_cardinality, this change correctly adds the hardware counter parameter to the nested cardinality estimation method.lib/collection/src/collection_manager/holders/proxy_segment.rs (1)
974-994: LGTM: Hardware counter parameter added to estimate_point_count method.The implementation properly passes the hardware counter to both the wrapped segment and write segment estimations, ensuring consistent performance tracking throughout the cardinality estimation process.
lib/collection/src/shards/shard.rs (1)
286-297: LGTM: Hardware counter added to estimate_request_cardinality method.The implementation correctly passes the hardware counter to the underlying
estimate_cardinalitymethod when dealing with filter-based effect areas.lib/segment/src/index/field_index/field_index_base.rs (2)
56-62: Integration of hardware counter for cardinality estimationThe
estimate_cardinalitymethod in thePayloadFieldIndextrait has been updated to include a hardware counter parameter, which will help track performance metrics during cardinality estimation operations.
261-268: Consistent parameter passing to underlying implementationThe
estimate_cardinalityimplementation inFieldIndexcorrectly passes the new hardware counter parameter to the payload field index, maintaining proper delegation.lib/segment/src/index/field_index/geo_index/mutable_geo_index.rs (4)
6-6: Added hardware counter importAdded import for
HardwareCounterCellto support the new parameter in method signatures.
183-184: Updated delegate macro with hardware counter parameterThe delegate macro has been updated to include the hardware counter parameter in the method signatures for
points_of_hashandvalues_of_hash.
236-240: Added hardware counter instrumentation for GeoHash readsThe method now increments the payload index IO read counter by the size of a GeoHash before performing the read operation, providing valuable metrics on IO usage.
243-247: Added hardware counter instrumentation for GeoHash values readsSimilarly to
points_of_hash, this method now tracks IO reads when retrieving values associated with a GeoHash.lib/segment/src/segment/facet.rs (3)
39-39: Updated cardinality estimation with hardware counterAdded the hardware counter parameter to the
estimate_cardinalitycall to track performance metrics during filter cardinality estimation.
79-79: Added hardware counter accumulator for value iterationUpdated the
iter_values_mapcall to include a new accumulator from the hardware counter, enabling performance tracking during value iteration.
127-127: Consistent hardware counter usage for facet values filteringAdded the hardware counter parameter to another instance of
estimate_cardinality, maintaining consistency in performance tracking across different code paths.lib/segment/src/segment/scroll.rs (3)
18-23: Updated method signature to include hardware counterThe
should_pre_filtermethod now accepts a hardware counter parameter, allowing for performance tracking during the pre-filtering decision process.
26-26: Added hardware counter to cardinality estimation in filtering decisionUpdated the
estimate_cardinalitycall to include the hardware counter parameter, enabling accurate tracking of performance metrics.
104-104: Consistent hardware counter usage in filtered readsAdded the hardware counter parameter to the
estimate_cardinalitycall in thefiltered_read_by_indexmethod, ensuring consistent performance tracking throughout the filtering process.lib/segment/src/index/field_index/geo_index/immutable_geo_index.rs (2)
104-117: Good implementation of hardware counter for binary search complexityThe addition of the hardware counter parameter and the implementation of I/O read estimation based on binary search complexity is well-designed. The logarithmic calculation for the complexity is mathematically accurate and the multiplication by the size of
Countsproperly accounts for the memory footprint.
119-130: Consistent implementation for values_of_hash methodThis implementation follows the same pattern as
points_of_hash, properly applying the hardware counter to track I/O read operations. The only difference is that it doesn't multiply by the size ofCounts, which is appropriate since we're only counting the complexity of the search itself, not the data access.lib/segment/src/index/field_index/map_index/mmap_map_index.rs (2)
8-9: New import for hardware counter accumulatorThe addition of the hardware accumulator import supports the changes in the
iter_values_mapmethod.
210-227: Updated get_count_for_value to use hardware counter for metricsThe method signature has been properly updated to accept a hardware counter parameter which is now passed to the
getmethod. This allows for consistent tracking of I/O operations throughout the call chain.lib/segment/src/index/field_index/map_index/mutable_map_index.rs (3)
6-7: New import for hardware counter accumulatorThe addition of the hardware accumulator import supports the changes in the
iter_values_mapmethod.
176-182: Added hardware counter parameter with TODO commentThe method has been properly modified to accept a hardware counter parameter, although it's currently unused as indicated by the underscore prefix. The TODO comment clarifies the intention for future implementation.
188-195: Added hardware measurement accumulator with TODO commentSimilar to the previous change, this method has been modified to accept a hardware measurement accumulator parameter, currently unused. The TODO comment indicates the same future implementation intention as in the
get_count_for_valuemethod.lib/segment/src/index/plain_vector_index.rs (3)
47-52: Added hardware counter parameter to is_small_enough_for_unindexed_searchThe method signature has been properly updated to accept a hardware counter parameter which will be used for cardinality estimation.
62-63: Updated estimate_cardinality call to pass hardware counterThe call to
estimate_cardinalityhas been properly updated to pass the hardware counter parameter, ensuring consistent hardware metrics tracking throughout the cardinality estimation process.
86-91: Updated method call to pass the hardware counter from query contextThe call to
is_small_enough_for_unindexed_searchnow correctly passes the hardware counter from the query context, ensuring proper propagation of hardware measurement capabilities.lib/segment/src/index/field_index/bool_index/mod.rs (3)
131-139: LGTM: Successfully updated method signature and implementationsThe
estimate_cardinalitymethod has been updated to include thehw_counterparameter, and the corresponding implementations in the match arms correctly pass this parameter to the underlying index implementations.
168-171: Implementation needed for TODO commentThe
iter_values_mapmethod signature has been updated to include hardware measurement accumulation, but the parameter is currently unused as indicated by the TODO comment.Consider implementing the hardware measurement functionality according to the TODO comment or provide a timeline for when this will be addressed.
472-481: LGTM: Tests updated correctly with hardware counterThe test code has been properly updated to instantiate a
HardwareCounterCelland pass it to theestimate_cardinalitymethod.lib/segment/src/index/field_index/map_index/immutable_map_index.rs (2)
293-301: Implementation needed for TODO commentThe
get_count_for_valuemethod signature has been updated to include a_hw_counterparameter, but it's currently unused as indicated by the TODO comment.Consider implementing the hardware measurement functionality according to the TODO comment. The underscore prefix indicates it's intentionally unused for now, but this should be addressed in the future.
309-314: LGTM: Successfully updated method signature with hardware counterThe
iter_values_mapmethod has been correctly updated to use the hardware counter from thehw_accparameter, and it's being properly passed to theget_iteratormethod.lib/segment/src/index/field_index/geo_index/mod.rs (6)
110-116: LGTM: Successfully updated method signature and implementationsThe
points_of_hashmethod has been updated to include thehw_counterparameter, and the implementations in all three variants properly pass this parameter.
118-124: LGTM: Successfully updated method signature and implementationsThe
values_of_hashmethod has been updated to include thehw_counterparameter, and the implementations in all three variants properly pass this parameter.
217-268: LGTM: Successfully updated method signature and internal callsThe
match_cardinalitymethod has been updated to include thehw_counterparameter, and all internal calls topoints_of_hashandvalues_of_hashnow correctly pass this parameter along.
551-601: LGTM: Successfully updated method signature and internal callsThe
estimate_cardinalitymethod has been updated to include thehw_counterparameter, and all internal calls tomatch_cardinalitynow correctly pass this parameter along, ensuring the hardware counter is used consistently throughout the cardinality estimation flow.
1055-1057: LGTM: Tests updated correctly with hardware counterThe test code has been properly updated to instantiate a
HardwareCounterCelland pass it to thepoints_of_hashmethod.
947-950: LGTM: Tests updated correctly with hardware counterThe
match_cardinalitytest has been updated to instantiate aHardwareCounterCelland pass it to the updated method calls correctly.lib/segment/src/index/field_index/full_text_index/inverted_index.rs (3)
96-97: LGTM: Updated method signature with hardware counterThe
get_posting_lenmethod signature has been updated to include thehw_counterparameter, aligning with the hardware counter integration pattern used throughout the codebase.
99-114: LGTM: Successfully updated method signature and internal callsThe
estimate_cardinalitymethod has been updated to include thehw_counterparameter, and the internal call toget_posting_lennow correctly passes this parameter along.
184-188: Fixed parameter name typoCorrected the parameter name from
hw_coutnertohw_counter, ensuring consistent naming throughout the codebase.lib/segment/src/index/struct_payload_index.rs (5)
71-83: Add hardware counter parameter toestimate_field_conditionThe addition of
hw_counter: &HardwareCounterCellinto both the function signature (line 71) and the subsequent invocation ofindex.estimate_cardinality(&full_path_condition, hw_counter)(line 83) looks consistent. This clearly propagates the hardware counter to underlying field indexes.
246-296: Extendcondition_cardinalitywithhw_counterusageAll added parameter references (lines 246, 253, 259, 266, and 296) properly pass
hw_counterinto nested or field-level estimations. This consistently integrates hardware counter measurement for each condition type. The logic remains coherent, and there are no immediate concerns.
446-453: Incorporatehw_counterinestimate_cardinalityLines 446–450 introduce a new parameter in the function signature, while lines 452–453 successfully reference the parameter in a closure. Passing
hw_counterintoself.condition_cardinalityallows consistent tracking of hardware metrics.
461-466: Nested cardinality estimation with hardware counterLines 461–466 similarly extend the nested cardinality estimation logic to accept and utilize
hw_counter. This is consistent with the rest of the cardinality estimation methods.
473-476: Propagatehw_counterinquery_pointsThese lines (473, 476) ensure that
hw_counteris provided when callingself.estimate_cardinality(...). Maintaining consistency with the rest of the index logic, this extension helps accurately measure hardware performance during queries.lib/segment/src/index/field_index/map_index/mod.rs (14)
179-184: New parameter inget_count_for_valueAdding
hw_counter: &HardwareCounterCell(line 179) and passing it down for Mutable, Immutable, and Mmap indexes (lines 181–184) is aligned with the broader hardware counter integration. This extension maintains uniform coverage across all index types.
211-219: Hardware accumulator initer_values_mapLines 211–219 introduce
hw_acc: HwMeasurementAccand forward it to specific index variants. This consistency ensures measurement coverage while iterating values.
234-240: Extendmatch_cardinalitywith hardware counterIn lines 234–240, the new parameter
hw_counteraids cardinality estimation for matched values. The changes are minimal yet fully consistent with the existing design.
317-364: Usehw_counterinexcept_cardinalityWithin lines 317 and 361–364, the function calculates how many points are excluded/retained, now also factoring in hardware counter calls with
self.get_count_for_value(...). This is a coherent extension of metric tracking to the “except” logic.
421-427: IntegrateHardwareCounterCellintoexcept_setAdding
hw_acc(line 421) and retrievinghw_counter(line 427) ensures uniform measurement coverage for set-based exclusions. No issues spotted.
565-565: Leverageself.except_setwithhw_accfor string filteringLine 565 adds the hardware counter argument when calling
self.except_set(...)for string-based exclusions. This keeps the filter logic consistent.
578-624: Implementhw_counterinestimate_cardinalityforMapIndex<str>These additions unify cardinality estimation under the hardware counter approach (lines 578–582, 586, 599, 624). The pattern of retrieving and passing
hw_countertomatch_cardinalityorexcept_cardinalityis well-executed.
755-817: ExtendMapIndex<UuidIntType>cardinality with hardware counterWithin lines 755–759, 764, 784–785, and 816–817, the updated function signature and calls consistently pass the hardware counter. This parallels the other map index implementations, ensuring uniform coverage.
913-913: Filter except set forMapIndex<IntPayloadType>Line 913 references the
Match::Exceptvariant, with line 921 callingself.except_set(integers, hw_acc). This extension properly includes the hardware counter for integer-based filters.Also applies to: 921-921
927-983: Integrate hardware counter inestimate_cardinalityforMapIndex<IntPayloadType>From lines 927–931, 936–937, 957–958, and 982–983, hardware metrics are consistently passed for integer-based cardinality estimations. The approach matches the other map index types.
997-1001: Add disposableHardwareCounterCellinpayload_blocksThe code (lines 997–1001) calls
get_count_for_value(...)with a disposable hardware counter, noting thatpayload_blocksusage is not measured. This maintains consistency in the interface while avoiding unneeded overhead.
1262-1268: Initialize and usehw_counterin tests (int map index)Defining
let hw_counter = HardwareCounterCell::new();(lines 1262–1263) and passing it to.except_cardinality(...)(lines 1267–1268) ensures the test code fully exercises the new interface. Looks good.
1305-1311: Initialize and passhw_counterin tests (string map index)As with the int map index tests, lines 1305–1306 create the hardware counter, then lines 1310–1311 pass it to
.except_cardinality(...). The logic is consistent and verifies the updated code path.
1331-1332: Verify empty index scenario withhw_counterAt lines 1331–1332, passing
hw_counterto.except_cardinality(...)confirms that an empty index yields zero cardinality. This ensures test coverage aligns with the new parameter.
c766539 to
260d8db
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs (1)
165-180: The hardware counter logic could be simplifiedThe method calculates the length first and then uses it for both the hardware counter increment and the return value. This is good for code clarity, but there's a redundant
.as_ref()call in the chain that could be removed.fn get_posting_len( &self, token_id: TokenId, hw_counter: &HardwareCounterCell, ) -> Option<usize> { let len = self .postings .get(token_id as usize) .and_then(|posting| posting.as_ref()) - .as_ref() .map(|x| x.len()); hw_counter .payload_index_io_read_counter() .incr_delta(size_of::<Option<PostingList>>() + len.unwrap_or(0) * size_of::<u8>()); len }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs(2 hunks)lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-low-resources
- GitHub Check: test-consensus
- GitHub Check: test (macos-latest)
- GitHub Check: test-consistency
- GitHub Check: test (windows-latest)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test-consensus-compose
🔇 Additional comments (1)
lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs (1)
175-179: Consider whether counting onlysize_of::<u8>()per element is accurateSimilar to the issue in the
filtermethod, there's a concern about whethersize_of::<u8>()accurately represents the memory footprint of each element in the posting list.
lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs
Show resolved
Hide resolved
260d8db to
83fb468
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs (1)
165-180: Consider making hardware counter incrementation more consistent with the filter method.The implementation correctly calculates and increments the hardware counter based on the posting list access, but there's a minor inefficiency in first retrieving the length in a separate variable and then using it for the counter increment.
Consider refactoring to a more direct approach:
fn get_posting_len( &self, token_id: TokenId, hw_counter: &HardwareCounterCell, ) -> Option<usize> { - let len = self - .postings - .get(token_id as usize) - .and_then(|posting| posting.as_ref()) - .as_ref() - .map(|x| x.len()); - hw_counter - .payload_index_io_read_counter() - .incr_delta(size_of::<Option<PostingList>>() + len.unwrap_or(0) * size_of::<u32>()); - len + let postings = self + .postings + .get(token_id as usize) + .and_then(|posting| posting.as_ref()); + + let len = postings.map(|x| x.len()); + hw_counter + .payload_index_io_read_counter() + .incr_delta(size_of::<Option<PostingList>>() + len.unwrap_or(0) * size_of::<u32>()); + len }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs(2 hunks)lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test-consensus
- GitHub Check: test (macos-latest)
- GitHub Check: test
- GitHub Check: test (windows-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (1)
lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs (1)
145-148: The calculation correctly accounts for posting list element size.The hardware counter incrementation for posting list size is correctly calculated using
size_of::<u32>()which matches thePointOffsetTypedefinition in the codebase.
83fb468 to
4f9a385
Compare
lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/segment/src/index/field_index/mmap_point_to_values.rs (1)
300-302: Optimize the cloning operation for hardware counter calculation.The code clones the value to calculate its memory-mapped size and then passes the original value to the check function. This cloning operation could be unnecessary and adds overhead, especially for larger values.
Consider calculating the size without cloning by using a method that accepts a reference:
- let mmap_size = T::mmapped_size(value.clone()); - hw_acc.payload_index_io_read_counter().incr_delta(mmap_size); - if check_fn(value) { + let mmap_size = T::mmapped_size(value); + hw_acc.payload_index_io_read_counter().incr_delta(mmap_size); + if check_fn(value) {This would require updating the
MmapValuetrait to ensuremmapped_sizecan work with referenced values directly, potentially improving performance for large values.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
lib/segment/src/index/field_index/mmap_point_to_values.rs(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: test (windows-latest)
- GitHub Check: test
- GitHub Check: test-consensus
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
🔇 Additional comments (2)
lib/segment/src/index/field_index/mmap_point_to_values.rs (2)
305-305: Good optimization: reusing calculated value.Reusing the pre-calculated
mmap_sizevariable is a good optimization that avoids recalculating the value's size.
287-288: Well-implemented hardware counter parameter.The addition of the
hw_accparameter aligns with the PR objective of adding hardware counter measurements for more accurate I/O tracking during cardinality estimation operations.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/segment/src/index/field_index/map_index/mod.rs (1)
314-416: Consider splitting this large function for clarity.
The logic inexcept_cardinalityis correct and makes good use of saturating arithmetic to handle edge cases. However, it’s quite lengthy. Refactoring into smaller helper methods (e.g., to compute minimal or maximal scenarios separately) could improve readability and maintainability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (24)
lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs(1 hunks)lib/segment/src/index/field_index/bool_index/mod.rs(7 hunks)lib/segment/src/index/field_index/bool_index/simple_bool_index.rs(2 hunks)lib/segment/src/index/field_index/field_index_base.rs(2 hunks)lib/segment/src/index/field_index/full_text_index/compressed_posting/compressed_chunks_reader.rs(6 hunks)lib/segment/src/index/field_index/full_text_index/compressed_posting/compressed_posting_list.rs(3 hunks)lib/segment/src/index/field_index/full_text_index/compressed_posting/compressed_posting_visitor.rs(4 hunks)lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs(4 hunks)lib/segment/src/index/field_index/full_text_index/inverted_index.rs(10 hunks)lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mmap_postings.rs(2 hunks)lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs(4 hunks)lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs(3 hunks)lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs(5 hunks)lib/segment/src/index/field_index/full_text_index/postings_iterator.rs(2 hunks)lib/segment/src/index/field_index/full_text_index/text_index.rs(4 hunks)lib/segment/src/index/field_index/geo_index/mod.rs(24 hunks)lib/segment/src/index/field_index/map_index/mod.rs(29 hunks)lib/segment/src/index/field_index/null_index/mmap_null_index.rs(10 hunks)lib/segment/src/index/field_index/numeric_index/immutable_numeric_index.rs(1 hunks)lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs(1 hunks)lib/segment/src/index/field_index/numeric_index/mod.rs(4 hunks)lib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rs(2 hunks)lib/segment/src/index/field_index/numeric_index/tests.rs(4 hunks)lib/segment/src/index/struct_payload_index.rs(8 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
- lib/segment/src/index/field_index/full_text_index/mutable_inverted_index.rs
- lib/segment/src/index/field_index/bool_index/simple_bool_index.rs
- lib/segment/src/index/struct_payload_index.rs
- lib/segment/src/index/field_index/null_index/mmap_null_index.rs
- lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mod.rs
- lib/segment/src/index/field_index/field_index_base.rs
- lib/segment/src/index/field_index/full_text_index/inverted_index.rs
- lib/segment/src/index/field_index/bool_index/mod.rs
- lib/segment/src/index/field_index/full_text_index/text_index.rs
- lib/segment/src/index/field_index/numeric_index/mod.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test-consensus-compose
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus
- GitHub Check: test (windows-latest)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (76)
lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mmap_postings.rs (2)
95-99: Explicit lifetime annotations improve borrowing relationshipsThe added lifetime parameter
'aproperly ties the returnedChunkReader's lifetime to the references it depends on (self,header, andhw_counter). This change enhances the clarity and safety of the borrowing relationships, ensuring theChunkReadercan't outlive any of its source data.
128-132: Consistent lifetime annotations for method chainThe lifetime annotations here are consistent with those added to
get_reader, ensuring proper lifetime propagation through the method chain. This is particularly important sincegetcallsget_readeron line 139. This change maintains memory safety while making the borrowing relationships explicit.lib/segment/src/index/field_index/numeric_index/immutable_numeric_index.rs (2)
209-214: Good implementation of hardware counter trackingThe updated
values_rangemethod now properly tracks read operations using the hardware counter. The lifetime parameter'aensures that both&selfand&hw_countershare the same lifetime, which is correctly propagated to the returned iterator.
217-221:Details
✅ Verification successful
Ensure proper size calculation in counter increment
The code increments the payload IO read counter with
size_of::<Point<T>>()which measures the size of thePoint<T>type. This looks correct, but verify that this matches the actual byte count read during this operation.
🏁 Script executed:
#!/bin/bash # Verify consistent memory size approach across the codebase rg -A 2 "size_of::<Point<" --type rustLength of output: 1080
Ensure Correct Byte Calculation in Payload IO
Our search confirms that the usage of
size_of::<Point<T>>()is consistent across bothimmutable_numeric_index.rsandmmap_numeric_index.rs. The counter increment accurately reflects the byte size of aPoint<T>. Please double-check that this computed size fully represents the actual bytes read during the operation and that no additional overhead needs to be accounted for.lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (1)
244-257:Details
❓ Verification inconclusive
Good implementation with correct counter usage
The updated
values_rangemethod correctly integrates the hardware counter with proper lifetime annotations. Nice use ofinspectto track the IO operations without modifying the data flow.Note that this implementation increments the
payload_index_io_read_counterwhile the implementation inimmutable_numeric_index.rsincrements thepayload_io_read_counter. Ensure this difference is intentional.
🏁 Script executed:
#!/bin/bash # Compare counter usage between implementations echo "Checking counter type differences between implementations:" rg -A 3 "hw_counter\s+.*\.(payload_.*_read_counter)" --type rustLength of output: 655
Minor Verification Required: Confirm Hardware Counter Differentiation
The implementation in
mmap_numeric_index.rsuses proper lifetime annotations and integrates the hardware counter correctly via theinspectmethod. However, as noted, it increments thepayload_index_io_read_counter—while, according to earlier context, the implementation inimmutable_numeric_index.rs(and similarly seen in other modules likemutable_inverted_index.rs) uses a different counter (payload_io_read_counter). Please verify that this discrepancy in counter usage is intentional and that the design rationale for using different counters across these modules is well documented.
- File:
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs– Usespayload_index_io_read_counter()- File:
lib/segment/src/index/field_index/numeric_index/immutable_numeric_index.rs– Confirm that the use ofpayload_io_read_counter()is deliberatelib/segment/src/index/field_index/numeric_index/tests.rs (4)
1-1: Good addition of necessary importThe import of
HwMeasurementAccis necessary for the changes in this file.
101-102: Properly extracting the counter cell from the accumulatorGood approach to get the counter cell from the accumulator. This improves the design by allowing the counter to be shared across multiple operations.
111-111: Consistent passing of hardware counter referenceThis change correctly passes the hardware counter reference to the filter method, maintaining consistency with other related changes.
439-441: Good test update for hardware counter usageThe test has been correctly updated to create a hardware counter and pass it to the filter method, ensuring consistent test coverage with the new parameter requirements.
lib/segment/src/index/field_index/numeric_index/mutable_numeric_index.rs (2)
108-122: Properly updated method signature and implementationThe
values_rangemethod has been correctly updated with appropriate lifetime parameters and to use the hardware counter.
291-343: Good refactoring from macro to inline methodsThe change from a delegate macro to explicit inline method implementations improves code readability and maintainability. Each method clearly delegates to the corresponding method in
in_memory_index, and the#[inline]annotations help maintain performance.lib/segment/src/index/field_index/geo_index/mod.rs (21)
109-115: Ensure consistent usage of hardware counter inpoints_of_hash
These changes correctly forwardhw_counteracross all index variants. This appears consistent and maintains clarity.
117-123: Mirror thehw_counterapproach invalues_of_hash
Similar topoints_of_hash, this method correctly delegates the hardware counter to underlying index variants.
216-220: Extended signature formatch_cardinality
Addinghw_counteraligns this function’s signature with the rest of the codebase’s performance tracking strategy.
230-231: Properly invokepoints_of_hash&values_of_hashwithhw_counter
This ensures that I/O readings are tracked when retrievingcommon_hashstatistics.
235-235: Inline call topoints_of_hashin map
Passinghw_counterhere maintains instrumentation consistency.
509-513: Updatedfiltermethod signature
Introducing the lifetime parameter andhw_counterreference is consistent with the pattern used elsewhere, enabling fine-grained performance monitoring.
518-518: Usehw_counterincheck_values_any
Continuing the consistent hardware counter pass-through helps with uniform tracing of I/O.
528-528: Repeatedcheck_values_anyusage
Same comment as above: passinghw_counteris properly aligned with the instrumentation approach.
538-538: Anothercheck_values_anycall
No issues; consistent with the rest of the code.
547-551: Expanded signature forestimate_cardinality
Welcomed consistency in providinghw_counterfor each cardinality estimation method.
554-554: Leveragematch_cardinalitywithhw_counter
Seamless alignment with the new signature.
573-579: Polygon interior and exterior cardinality
Calls tomatch_cardinalitynow incorporate the counter parameter, ensuring measurement is captured.
803-804: Test usage ofhw_counter
Creating and usingHardwareCounterCellin tests ensures coverage of the new parameter.
944-944: Checking cardinality in tests
Usingfield_index.estimate_cardinality(&field_condition, &hw_counter)here confirms consistent test coverage.
996-1013: Extended test flow with hardware counter
From counter creation to the filter calls, these lines confirm that the new signature is well-tested for standard filtering logic.
1053-1076:hw_counterin block-based test logic
This chunk ensures block-based filtering leverages consistent I/O measurement.
1100-1161: Multi-location cardinality checks
All references tohw_counterin this block—Berlin, NYC, Tokyo—confirm test coverage for multiple geographic scenarios.
1182-1205:hw_counterusage for close geo payload
Retains performance measurement logic across close proximity tests (Berlin <-> Potsdam).
1245-1263:hw_counterin disk-load test
Validates that loading from disk maintains the same instrumentation approach.
1367-1404: Empty index cardinality checking withhw_counter
Ensures hardware counter usage in edge cases with zero or partial data.
1406-1465: Boundaries across the antimeridian
Again, passinghw_counterthrough bounding box queries demonstrates well-rounded test coverage.lib/segment/src/index/field_index/full_text_index/postings_iterator.rs (3)
58-59: ImportedHardwareCounterCellin tests
Great addition to measure read operations while iterating compressed postings.
90-91: Initializehw_counterfor intersection tests
Allows verifying that I/O reads are tracked correctly when merging postings.
96-98: Passinghw_counterto compressed posting readers
Integrating instrumentation into each posting list ensures consistent performance metrics.lib/segment/src/index/field_index/full_text_index/mutable_text_index.rs (3)
151-153: Introducehw_accandhw_counter
Clear approach to obtain the counter cell from the hardware accumulator, avoiding duplicated counters.
156-174: Applyhw_counterin test filters
These lines confirm that full-text indexing logic with match queries now includes hardware metrics.
211-246: Persist and re-checkhw_counterusage
All subsequent filter calls maintain a consistent pattern of passing the same counter cell, ensuring reliable instrumentation data.lib/segment/src/index/field_index/full_text_index/immutable_inverted_index.rs (4)
47-51: Refined lifetime signature andhw_counter
Adoptingfn filter<'a>(…, hw_counter: &'a HardwareCounterCell) -> …clarifies ownership and ensures safe reference usage.
71-74: Passhw_countertoposting.reader
Good approach to accumulate read metrics for each token’s postings.
83-93: Introducehw_counteringet_posting_len
Incrementingpayload_index_io_read_counter()upon retrieving the posting length is a logical place for measuring I/O overhead.
123-124: Usehw_counterincheck_match
Ensures posting lookups also contribute to read metrics. No concurrency problems apparent.lib/segment/src/index/field_index/full_text_index/compressed_posting/compressed_posting_visitor.rs (5)
96-96: Improved encapsulation by using a dedicated methodThe change from directly calling
remainder_postings.binary_search(&val).is_ok()to using the newsearch_in_remaindermethod improves encapsulation and allows for proper hardware counter tracking.
99-99: Improved encapsulation by using accessor methodUsing
get_chunk_indexinstead of direct array access provides better encapsulation and allows for tracking hardware metrics when accessing chunk data.
125-129: Improved encapsulation with dedicated accessorsUsing the new
chunks_len()method andget_remainder_postingimproves encapsulation and enables hardware counter tracking, which is essential for IO monitoring.
150-158: Added hardware counter support to test codeProperly updated test code to include hardware counter initialization and usage, ensuring that tests accurately reflect the production code behavior with IO measurements.
173-175: Added hardware counter in second testCorrectly added hardware counter initialization and usage in the second test, maintaining consistency across all test cases.
lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (3)
376-380: Updated filter method signature to use HardwareCounterCellChanged from using
HwMeasurementAccto&'a HardwareCounterCellfor hardware counter tracking, which aligns with the broader changes across the codebase for consistent IO measurement.
388-388: Fixed bit calculation for hardware counter incrementUpdated counter increment to properly account for bits by dividing slice length by
u8::BITS, which provides more accurate measurement of IO operations.
395-412: Refactored estimate_cardinality to include hardware counterThe implementation now correctly tracks IO operations by incrementing the hardware counter after retrieving and processing the slice. The code is cleaner and more modular with better separation of concerns.
lib/segment/src/index/field_index/full_text_index/compressed_posting/compressed_chunks_reader.rs (6)
2-2: Added import for HardwareCounterCellAdded the necessary import for the hardware counter functionality.
10-15: Improved encapsulation and added hardware counterChanged field visibility from public to private and added a new
hw_counterfield to track hardware metrics, enhancing the class's encapsulation and measurement capabilities.
18-31: Updated constructor to include hardware counterModified the constructor to accept and store the hardware counter, maintaining consistency with the new field added to the struct.
100-105: Added hardware counter tracking for chunk decompressionAdded IO tracking with hardware counter when decompressing chunks, providing visibility into the performance characteristics of this operation.
105-105: Used u8::BITS instead of magic number for bit calculationReplaced hardcoded value with the more semantic
u8::BITS, improving code readability and maintainability.
118-141: Added accessor methods with hardware counter trackingImplemented new helper methods for accessing chunks, remainder postings, and performing searches, each with appropriate hardware counter tracking. This improves encapsulation and enables detailed IO measurement.
lib/segment/src/index/field_index/full_text_index/compressed_posting/compressed_posting_list.rs (4)
2-2: Added import for HardwareCounterCellAdded the necessary import for hardware counter functionality.
38-46: Updated reader method to accept hardware counterModified the
readermethod signature to include a hardware counter parameter and pass it to theChunkReaderconstructor, enabling IO tracking in the reader.
53-61: Updated iter method to use hardware counterModified the
itermethod to accept and pass the hardware counter to the reader, ensuring consistent hardware metric tracking throughout the iteration process.
97-105: Updated test to use hardware counterProperly updated the test code to initialize a hardware counter and pass it to the reader, ensuring that tests reflect the production code behavior with IO measurements.
lib/segment/src/index/field_index/map_index/mod.rs (16)
179-185: Implementation looks correct.
This update toget_count_for_valueconsistently passes thehw_counterparameter to specialized indices, ensuring hardware metrics are properly accounted for.
211-219: No issues with the revised signature.
Theiter_values_mapmethod now takes anHwMeasurementAccand cleanly propagates it to each variant implementation.
234-242: Accurate pass-through of the hardware counter.
Thematch_cardinalitymethod correctly usesget_count_for_valueto compute the exact cardinality, properly reusing thehw_counter.
421-433: Consistent hardware counter usage.
Inexcept_set, the hardware counter is correctly passed toget_iterator, ensuring performance metrics can be tallied for exclusion logic.
530-569: Filter updates look solid.
The refactoredfiltermethod for string-based payloads correctly invokesget_iteratororexcept_setwithhw_counter. The logic aligns with the updated signatures.
571-631: Clean integration ofhw_counterinto cardinality estimation.
Withinestimate_cardinality, the additional parameter is properly utilized in calls tomatch_cardinalityandexcept_cardinality. Design remains consistent.
639-653: Correct usage ofHardwareCounterCell::disposable().
The approach of passing a disposable counter inpayload_blocksis valid, especially since measurement is not needed for HNSW construction.
680-742: Filter method changes appear proper.
ForUuidIntType, the updated lines consistently passhw_countertoget_iterator, preserving metrics collection.
744-819: Proper integration with cardinality estimation for UUID-based payloads.
All calls tomatch_cardinalityandexcept_cardinalityreflect the newhw_counterparameter. Implementation aligns well with the enhancements.
821-844: Payload blocks logic is consistent.
Using a disposable hardware counter inpayload_blocksavoids unnecessary measurements while enabling a unified interface. Implementation is coherent.
871-905: Improved filter logic for integer payloads.
Passinghw_counterintoget_iteratorandexcept_setensures consistent performance tracking for filtering operations.
910-970: Cardinality estimation flows well with the new parameter.
For integer payloads, the expandedestimate_cardinalitymethod handles all matching variants and passeshw_counteras needed.
980-992: Accurate block condition building.
Within integer-basedpayload_blocks, the reuse ofHardwareCounterCell::disposable()maintains a cohesive approach to measuring or ignoring costs as appropriate.
1245-1253: Unit test updates confirm new behavior.
The creation and usage ofhw_counterforexcept_cardinalityverify that the code handles empty exclusions properly.
1288-1296: Additional test coverage for string-based map index.
Similarly verifies thathw_counteris well-integrated for empty sets.
1309-1317: Ensures correct edge-case handling for an empty index.
Passing an empty iterator and checking that the result yields zero cardinality demonstrates robust test coverage with the new parameter.
| ) -> Option<ChunkReader<'_>> { | ||
| fn get_reader<'a>( | ||
| &'a self, | ||
| header: &'a PostingListHeader, |
There was a problem hiding this comment.
header doesn't need a lifetime here, since it is not part of the returned value.
| ) -> Option<Box<dyn Iterator<Item = PointOffsetType> + '_>> { | ||
| fn filter<'a>( | ||
| &'a self, | ||
| condition: &'a FieldCondition, |
There was a problem hiding this comment.
No lifetime needed for condition
There was a problem hiding this comment.
Should we be concerned about double-counting, where we count some operation twice during the cardinality estimation and also while doing the operation itself?
I did some basic test with this, but definitely was not able to test all possible paths. Since this covers so much, it's very hard to keep track of what branches there are, and which we do or do not count.
generall
left a comment
There was a problem hiding this comment.
I made test measurements on keyword and text indexes, after some fixes they now match my expectations. Also, found a potential place for optimization - we can try to exclude primary condition from second stage filtering.
I propose to merge this and proceed with further steps: create a checklist and some docs, which would allow external people to understand what to expect from measurement and how to test them.
* Cardinality estimation measurements * Apply hw measurements to latest changes from dev * Clippy * Also measure cardinality estimation for geo index * Make measured units 'bytes' * Use PointOffsetType instead of u32 for size calculation * fix memory cost for check_values_any in mmap index * fix double counting for value reading in mmap, remove hw_counter from mmap hashmap * fmt * fix hw measurement for text index * Remove non necessary lifetime annotations --------- Co-authored-by: generall <andrey@vasnetsov.com>
Depends on #5954.
There are places where we would benefit from #5985 which have been left out and marked with a TODO.