Skip to content

Only measure HW if on disk for bool index#6296

Merged
JojiiOfficial merged 2 commits intodevfrom
only_measure_on_disk_bool_index
Apr 3, 2025
Merged

Only measure HW if on disk for bool index#6296
JojiiOfficial merged 2 commits intodevfrom
only_measure_on_disk_bool_index

Conversation

@JojiiOfficial
Copy link
Contributor

@JojiiOfficial JojiiOfficial commented Apr 2, 2025

Depends on #6295

Only measure bool index io if index is on disk

Comment on lines +88 to +94
fn make_conditioned_counter<'a>(
&self,
hw_counter: &'a HardwareCounterCell,
) -> ConditionedCounter<'a> {
let on_disk = !self.populated; // Measure if on disk.
ConditionedCounter::new(on_disk, hw_counter)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we return a dummy counter here somehow, without a boolean condition inside the counter itself?

Copy link
Contributor Author

@JojiiOfficial JojiiOfficial Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably can but we'll use this in more places so I thought having a separate logic with known semantics makes the code more readable and easier to implement.

I just implemented it for bool-index for now to demonstrate the usage and to make reviewing easier, since it's not many changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also there are places where we would run into lifetime issues with local disposable hardware counters.
An example for this is the ChunkReader in #6309

Base automatically changed from conditioned_counter to dev April 3, 2025 08:13
@JojiiOfficial JojiiOfficial force-pushed the only_measure_on_disk_bool_index branch from 5e13c4b to 2fe6d18 Compare April 3, 2025 09:08
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Apr 3, 2025

📝 Walkthrough

Walkthrough

This pull request introduces several changes related to hardware measurement and boolean indexing. A new method, measure_hw_with_acc_and_fraction, is added to the HwMeasurementIteratorExt trait to allow hardware usage measurement with fractional representation. In the MmapBoolIndex struct, a new boolean field populated is introduced to indicate whether the index is in memory or on disk, along with a new method make_conditioned_counter to create a ConditionedCounter based on this state. Various methods in the Mmap and Simple boolean index implementations are updated to utilize the ConditionedCounter, while hardware counter parameters are removed from the Simple index methods, streamlining their signatures. These changes enhance the handling of hardware counters and simplify the interface for boolean indexing.

Possibly Related PRs

  • Add ConditionedCounter #6295: The changes in the main PR, which introduce a new method for measuring hardware usage in an iterator, are related to the retrieved PR that adds a ConditionedCounter struct, as both involve enhancements to hardware counter measurement mechanisms and utilize HardwareCounterCell.

Suggested Reviewers

  • generall
  • agourlay
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
lib/segment/src/index/field_index/bool_index/simple_bool_index.rs (2)

247-259: Consider removing hardware counter from iter_values_map for consistency

While other methods no longer use hardware counters, the iter_values_map method still takes and uses a hardware counter. For consistency with the other changes, consider removing the hardware counter usage from this method as well.

pub fn iter_values_map<'a>(
    &'a self,
-    hw_counter: &'a HardwareCounterCell,
) -> impl Iterator<Item = (bool, IdIter<'a>)> + 'a {
    [
        (false, Box::new(self.memory.iter_has_false()) as IdIter),
        (true, Box::new(self.memory.iter_has_true()) as IdIter),
    ]
    .into_iter()
-    .measure_hw_with_cell(hw_counter, size_of::<usize>(), |i| {
-        i.payload_index_io_read_counter()
-    })
}

345-346: Consider removing unused hardware counter parameters

For better clarity, consider completely removing the hardware counter parameters from methods that no longer use them, rather than just renaming them to _. This would make the API cleaner and clearer about what parameters are actually used.

fn filter<'a>(
    &'a self,
    condition: &'a crate::types::FieldCondition,
-    _: &'a HardwareCounterCell,
) -> Option<Box<dyn Iterator<Item = PointOffsetType> + 'a>> {
    // ...
}

fn estimate_cardinality(
    &self,
    condition: &FieldCondition,
-    _: &HardwareCounterCell,
) -> Option<CardinalityEstimation> {
    // ...
}

fn add_many(
    &mut self,
    id: PointOffsetType,
    values: Vec<bool>,
-    _: &HardwareCounterCell,
) -> OperationResult<()> {
    // ...
}

Also applies to: 364-365

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 75e5635 and 2fe6d18.

📒 Files selected for processing (4)
  • lib/common/common/src/counter/iterator_hw_measurement.rs (1 hunks)
  • lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (13 hunks)
  • lib/segment/src/index/field_index/bool_index/mod.rs (1 hunks)
  • lib/segment/src/index/field_index/bool_index/simple_bool_index.rs (4 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
lib/segment/src/index/field_index/bool_index/simple_bool_index.rs (5)
lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (1)
  • check_values_any (219-235)
lib/segment/src/index/field_index/bool_index/mod.rs (1)
  • check_values_any (66-76)
lib/segment/src/index/field_index/numeric_index/mod.rs (1)
  • check_values_any (245-258)
lib/segment/src/index/field_index/numeric_index/immutable_numeric_index.rs (1)
  • check_values_any (170-188)
lib/segment/src/index/field_index/numeric_index/mmap_numeric_index.rs (1)
  • check_values_any (206-221)
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: test-consensus
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: test (macos-latest)
  • GitHub Check: test-consensus-compose
  • GitHub Check: test (windows-latest)
  • GitHub Check: test
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: test
  • GitHub Check: test
🔇 Additional comments (15)
lib/segment/src/index/field_index/bool_index/mod.rs (1)

73-73: LGTM: Simplified parameter list for SimpleBoolIndex implementation

The hardware counter parameter is now omitted when calling the check_values_any method on SimpleBoolIndex, which aligns with the simplified interface in the SimpleBoolIndex implementation that no longer requires a hardware counter.

lib/common/common/src/counter/iterator_hw_measurement.rs (1)

50-65: Good addition of hardware measurement method using accumulator and fraction

The new method measure_hw_with_acc_and_fraction is a helpful counterpart to the existing measure_hw_with_cell_and_fraction method, allowing the use of a HwMeasurementAcc instead of directly using a HardwareCounterCell. This will enable cleaner code in implementations that use accumulators.

lib/segment/src/index/field_index/bool_index/mmap_bool_index.rs (9)

33-33: LGTM: Adding populated flag to track memory/disk state

This field will be used to determine whether hardware measurements should be taken (when on disk) or not (when in memory).


84-84: LGTM: Initialize populated field

The field is properly initialized based on the populate value derived from !is_on_disk.


88-94: LGTM: Helper method to create conditioned hardware counter

This method creates a ConditionedCounter that only triggers measurements when the index is on disk (!populated), which aligns well with the PR objective to only measure HW I/O for boolean indexes when on disk.


103-104: LGTM: Using conditioned counter

Correctly creates and uses a conditioned counter that will only measure hardware I/O when the index is on disk.


225-226: LGTM: Consistent use of conditioned counter

The implementation consistently applies the conditioned counter pattern across all methods that previously used hardware counters directly.


245-246: LGTM: Consistent use of conditioned counter

Same pattern consistently applied to the iter_values_map method.


440-454: LGTM: Using the new accumulator-based fraction measurement

The implementation now uses the newly added measure_hw_with_acc_and_fraction method instead of the cell-based version, with the accumulator created from the conditioned counter.


466-467: LGTM: Consistent pattern application

Continues the pattern of creating conditioned counters across all methods that use hardware measurements.


334-335: LGTM: Conditional hardware counter in add_point

Ensures that the MmapBoolIndexBuilder also follows the same pattern of using a conditioned counter.

lib/segment/src/index/field_index/bool_index/simple_bool_index.rs (4)

220-220: LGTM: Simplified method signature

The hardware counter parameter has been removed from the check_values_any method, as SimpleBoolIndex doesn't need to measure hardware I/O, matching the updated call in the BoolIndex implementation.


345-346: LGTM: No longer using hardware counter for filtering

The method now ignores the hardware counter (renamed to _) and directly returns the iterator without wrapping it in hardware measurement. This is consistent with the goal of avoiding unnecessary measurements.

Also applies to: 352-355


364-365: LGTM: No longer using hardware counter for cardinality estimation

Similar to the filter method, the hardware counter is now ignored, which is appropriate for an in-memory index.


429-430: LGTM: No longer using hardware counter when adding values

The hardware counter parameter is now ignored in the add_many method, consistent with the other changes.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
lib/segment/src/index/field_index/bool_index/mod.rs (1)

28-36: Inconsistent hardware counter usage may impact performance metrics

With these changes, hardware counters will only be used for the Mmap variant but not for the Simple variant. This might lead to inconsistent performance metrics when comparing the two implementations, as one will include I/O operations in measurements while the other won't.

Consider documenting this behavior change in comments to make it clear to users why hardware counters are only being used for the Mmap variant. This will help prevent confusion when analyzing performance metrics.

 pub fn iter_values_map<'a>(
     &'a self,
     hw_acc: &'a HardwareCounterCell,
 ) -> Box<dyn Iterator<Item = (bool, IdIter<'a>)> + 'a> {
+    // Hardware counters are only used for Mmap variant since Simple variant is in-memory
     match self {
         BoolIndex::Simple(index) => Box::new(index.iter_values_map()),
         BoolIndex::Mmap(index) => Box::new(index.iter_values_map(hw_acc)),
     }
 }

And similarly for the check_values_any method.

Also applies to: 66-76

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2fe6d18 and e5432fe.

📒 Files selected for processing (2)
  • lib/segment/src/index/field_index/bool_index/mod.rs (2 hunks)
  • lib/segment/src/index/field_index/bool_index/simple_bool_index.rs (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • lib/segment/src/index/field_index/bool_index/simple_bool_index.rs
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-consensus-compose
  • GitHub Check: test (macos-latest)
  • GitHub Check: test-consensus
  • GitHub Check: test (windows-latest)
  • GitHub Check: test
  • GitHub Check: test
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: test

@JojiiOfficial JojiiOfficial merged commit 821aa0a into dev Apr 3, 2025
18 checks passed
@JojiiOfficial JojiiOfficial deleted the only_measure_on_disk_bool_index branch April 3, 2025 17:04
pull bot pushed a commit to kp-forks/qdrant that referenced this pull request Apr 21, 2025
* Only measure HW if on disk for bool index

* Remove missed measurement
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants