Only measure HW if on disk for bool index#6296
Conversation
| fn make_conditioned_counter<'a>( | ||
| &self, | ||
| hw_counter: &'a HardwareCounterCell, | ||
| ) -> ConditionedCounter<'a> { | ||
| let on_disk = !self.populated; // Measure if on disk. | ||
| ConditionedCounter::new(on_disk, hw_counter) | ||
| } |
There was a problem hiding this comment.
Can't we return a dummy counter here somehow, without a boolean condition inside the counter itself?
There was a problem hiding this comment.
We probably can but we'll use this in more places so I thought having a separate logic with known semantics makes the code more readable and easier to implement.
I just implemented it for bool-index for now to demonstrate the usage and to make reviewing easier, since it's not many changes.
There was a problem hiding this comment.
Also there are places where we would run into lifetime issues with local disposable hardware counters.
An example for this is the ChunkReader in #6309
5e13c4b to
2fe6d18
Compare
📝 WalkthroughWalkthroughThis pull request introduces several changes related to hardware measurement and boolean indexing. A new method, Possibly Related PRs
Suggested Reviewers
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
lib/segment/src/index/field_index/bool_index/simple_bool_index.rs (2)
247-259: Consider removing hardware counter from iter_values_map for consistencyWhile other methods no longer use hardware counters, the
iter_values_mapmethod still takes and uses a hardware counter. For consistency with the other changes, consider removing the hardware counter usage from this method as well.pub fn iter_values_map<'a>( &'a self, - hw_counter: &'a HardwareCounterCell, ) -> impl Iterator<Item = (bool, IdIter<'a>)> + 'a { [ (false, Box::new(self.memory.iter_has_false()) as IdIter), (true, Box::new(self.memory.iter_has_true()) as IdIter), ] .into_iter() - .measure_hw_with_cell(hw_counter, size_of::<usize>(), |i| { - i.payload_index_io_read_counter() - }) }
345-346: Consider removing unused hardware counter parametersFor better clarity, consider completely removing the hardware counter parameters from methods that no longer use them, rather than just renaming them to
_. This would make the API cleaner and clearer about what parameters are actually used.fn filter<'a>( &'a self, condition: &'a crate::types::FieldCondition, - _: &'a HardwareCounterCell, ) -> Option<Box<dyn Iterator<Item = PointOffsetType> + 'a>> { // ... } fn estimate_cardinality( &self, condition: &FieldCondition, - _: &HardwareCounterCell, ) -> Option<CardinalityEstimation> { // ... } fn add_many( &mut self, id: PointOffsetType, values: Vec<bool>, - _: &HardwareCounterCell, ) -> OperationResult<()> { // ... }Also applies to: 364-365
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
lib/common/common/src/counter/iterator_hw_measurement.rs(1 hunks)lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs(13 hunks)lib/segment/src/index/field_index/bool_index/mod.rs(1 hunks)lib/segment/src/index/field_index/bool_index/simple_bool_index.rs(4 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
lib/segment/src/index/field_index/bool_index/simple_bool_index.rs (5)
lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (1)
check_values_any(219-235)lib/segment/src/index/field_index/bool_index/mod.rs (1)
check_values_any(66-76)lib/segment/src/index/field_index/numeric_index/mod.rs (1)
check_values_any(245-258)lib/segment/src/index/field_index/numeric_index/immutable_numeric_index.rs (1)
check_values_any(170-188)lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (1)
check_values_any(206-221)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-consensus
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus-compose
- GitHub Check: test (windows-latest)
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (15)
lib/segment/src/index/field_index/bool_index/mod.rs (1)
73-73: LGTM: Simplified parameter list for SimpleBoolIndex implementationThe hardware counter parameter is now omitted when calling the
check_values_anymethod onSimpleBoolIndex, which aligns with the simplified interface in theSimpleBoolIndeximplementation that no longer requires a hardware counter.lib/common/common/src/counter/iterator_hw_measurement.rs (1)
50-65: Good addition of hardware measurement method using accumulator and fractionThe new method
measure_hw_with_acc_and_fractionis a helpful counterpart to the existingmeasure_hw_with_cell_and_fractionmethod, allowing the use of aHwMeasurementAccinstead of directly using aHardwareCounterCell. This will enable cleaner code in implementations that use accumulators.lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (9)
33-33: LGTM: Adding populated flag to track memory/disk stateThis field will be used to determine whether hardware measurements should be taken (when on disk) or not (when in memory).
84-84: LGTM: Initialize populated fieldThe field is properly initialized based on the
populatevalue derived from!is_on_disk.
88-94: LGTM: Helper method to create conditioned hardware counterThis method creates a
ConditionedCounterthat only triggers measurements when the index is on disk (!populated), which aligns well with the PR objective to only measure HW I/O for boolean indexes when on disk.
103-104: LGTM: Using conditioned counterCorrectly creates and uses a conditioned counter that will only measure hardware I/O when the index is on disk.
225-226: LGTM: Consistent use of conditioned counterThe implementation consistently applies the conditioned counter pattern across all methods that previously used hardware counters directly.
245-246: LGTM: Consistent use of conditioned counterSame pattern consistently applied to the
iter_values_mapmethod.
440-454: LGTM: Using the new accumulator-based fraction measurementThe implementation now uses the newly added
measure_hw_with_acc_and_fractionmethod instead of the cell-based version, with the accumulator created from the conditioned counter.
466-467: LGTM: Consistent pattern applicationContinues the pattern of creating conditioned counters across all methods that use hardware measurements.
334-335: LGTM: Conditional hardware counter in add_pointEnsures that the MmapBoolIndexBuilder also follows the same pattern of using a conditioned counter.
lib/segment/src/index/field_index/bool_index/simple_bool_index.rs (4)
220-220: LGTM: Simplified method signatureThe hardware counter parameter has been removed from the
check_values_anymethod, as SimpleBoolIndex doesn't need to measure hardware I/O, matching the updated call in theBoolIndeximplementation.
345-346: LGTM: No longer using hardware counter for filteringThe method now ignores the hardware counter (renamed to
_) and directly returns the iterator without wrapping it in hardware measurement. This is consistent with the goal of avoiding unnecessary measurements.Also applies to: 352-355
364-365: LGTM: No longer using hardware counter for cardinality estimationSimilar to the filter method, the hardware counter is now ignored, which is appropriate for an in-memory index.
429-430: LGTM: No longer using hardware counter when adding valuesThe hardware counter parameter is now ignored in the
add_manymethod, consistent with the other changes.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
lib/segment/src/index/field_index/bool_index/mod.rs (1)
28-36: Inconsistent hardware counter usage may impact performance metricsWith these changes, hardware counters will only be used for the
Mmapvariant but not for theSimplevariant. This might lead to inconsistent performance metrics when comparing the two implementations, as one will include I/O operations in measurements while the other won't.Consider documenting this behavior change in comments to make it clear to users why hardware counters are only being used for the
Mmapvariant. This will help prevent confusion when analyzing performance metrics.pub fn iter_values_map<'a>( &'a self, hw_acc: &'a HardwareCounterCell, ) -> Box<dyn Iterator<Item = (bool, IdIter<'a>)> + 'a> { + // Hardware counters are only used for Mmap variant since Simple variant is in-memory match self { BoolIndex::Simple(index) => Box::new(index.iter_values_map()), BoolIndex::Mmap(index) => Box::new(index.iter_values_map(hw_acc)), } }And similarly for the
check_values_anymethod.Also applies to: 66-76
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
lib/segment/src/index/field_index/bool_index/mod.rs(2 hunks)lib/segment/src/index/field_index/bool_index/simple_bool_index.rs(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- lib/segment/src/index/field_index/bool_index/simple_bool_index.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-consensus-compose
- GitHub Check: test (macos-latest)
- GitHub Check: test-consensus
- GitHub Check: test (windows-latest)
- GitHub Check: test
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
- GitHub Check: test
* Only measure HW if on disk for bool index * Remove missed measurement
Depends on #6295
Only measure bool index io if index is on disk