Skip to content

Measure hardware in generic posting list#6583

Merged
coszio merged 1 commit intodevfrom
posting-list-with-hw-measurement
May 28, 2025
Merged

Measure hardware in generic posting list#6583
coszio merged 1 commit intodevfrom
posting-list-with-hw-measurement

Conversation

@coszio
Copy link
Contributor

@coszio coszio commented May 22, 2025

Builds on top of #6565.

Adds a ConditionedCounter to the generic posting list view, and takes measurements whenever the PostingListView's slices are read

@coszio coszio added this to the phrase match milestone May 22, 2025
@coszio coszio force-pushed the full-text-with-generic-posting-list branch from 726975b to 1b12a92 Compare May 26, 2025 19:33
@coszio coszio force-pushed the posting-list-with-hw-measurement branch from 25791c9 to 5c07860 Compare May 26, 2025 19:37
Base automatically changed from full-text-with-generic-posting-list to dev May 27, 2025 22:53
@coszio coszio force-pushed the posting-list-with-hw-measurement branch from 5c07860 to c6595e2 Compare May 27, 2025 22:55
@coszio coszio marked this pull request as ready for review May 27, 2025 22:57
@coderabbitai
Copy link
Contributor

coderabbitai bot commented May 27, 2025

📝 Walkthrough

Walkthrough

This set of changes introduces hardware counter instrumentation into the posting list subsystem. The PostingListView struct is extended to include a ConditionedCounter, which is now passed through all constructors and used to track I/O or memory access costs during chunk and remainder access, as well as decompression operations. The ValueHandler trait and its implementations are updated so that get_value receives a reference to the counter, allowing for instrumentation of variable-sized data reads. All direct accesses to chunks and remainders in the posting list view and visitor are replaced by accessor methods that increment the counter as appropriate.

Possibly related PRs

  • Don't measure in-memory hw for full text indices #6309: Introduces the ConditionedCounter::never() method, which is used in this PR for hardware counter instrumentation and to replace direct PhantomData usage.
  • Fix measuring in memory inverted index #6316: Also modifies lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mmap_postings.rs to integrate ConditionedCounter for hardware counter instrumentation, directly affecting how counters are passed and used in MmapPostings.
  • Generalize compressed posting list #6528: Provides the foundational implementation of compressed posting lists and value handling traits, which this PR extends by adding hardware counter instrumentation and modifying construction and access patterns.

Suggested reviewers

  • timvisee
  • generall

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7a05502 and c6595e2.

📒 Files selected for processing (5)
  • lib/posting_list/src/posting_list.rs (2 hunks)
  • lib/posting_list/src/value_handler.rs (4 hunks)
  • lib/posting_list/src/view.rs (10 hunks)
  • lib/posting_list/src/visitor.rs (5 hunks)
  • lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mmap_postings.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
lib/posting_list/src/visitor.rs (1)
Learnt from: coszio
PR: qdrant/qdrant#6565
File: lib/posting_list/src/builder.rs:63-67
Timestamp: 2025-05-26T14:47:23.505Z
Learning: In the posting_list crate's PostingChunk struct, avoid adding extra fields like storing computed chunk_bits to prevent struct bloat and maintain mmap compatibility. The bit calculations from offsets are inexpensive compared to the memory and compatibility benefits of keeping the struct minimal.
🧬 Code Graph Analysis (2)
lib/posting_list/src/posting_list.rs (2)
lib/posting_list/src/view.rs (1)
  • from_components (124-141)
lib/common/common/src/counter/conditioned_counter.rs (1)
  • never (22-24)
lib/posting_list/src/view.rs (1)
lib/segment/src/index/field_index/full_text_index/compressed_posting/compressed_chunks_reader.rs (1)
  • chunks_len (128-130)
⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: storage-compat-test
  • GitHub Check: rust-tests (macos-latest)
  • GitHub Check: test-low-resources
  • GitHub Check: rust-tests (ubuntu-latest)
  • GitHub Check: test-consistency
  • GitHub Check: lint
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: rust-tests (windows-latest)
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-consensus-compose
  • GitHub Check: integration-tests
  • GitHub Check: integration-tests-consensus
  • GitHub Check: Basic TLS/HTTPS tests
🔇 Additional comments (10)
lib/segment/src/index/field_index/full_text_index/mmap_inverted_index/mmap_postings.rs (1)

122-128: LGTM! Proper propagation of hardware counter.

The change correctly passes the hw_counter to IdsPostingListView::from_ids_components, which aligns with the PR's objective of adding hardware counter instrumentation throughout the posting list subsystem.

lib/posting_list/src/posting_list.rs (1)

3-3: LGTM! Appropriate use of never() counter for regular views.

The change correctly adapts to the new PostingListView API by using from_components and passing ConditionedCounter::never(). This is the right choice for regular posting list views that don't require hardware counter instrumentation.

Also applies to: 93-100

lib/posting_list/src/visitor.rs (3)

100-100: LGTM! Consistent use of length accessor methods.

All direct field accesses for .remainders.len() and .chunks.len() have been properly replaced with the corresponding accessor methods remainders_len() and chunks_len().

Also applies to: 109-109, 146-146


123-124: LGTM! Proper encapsulation of chunk and remainder access.

All direct indexing into chunks and remainders has been replaced with appropriate accessor methods (get_chunk_unchecked, get_chunk, get_remainder). This encapsulation enables the hardware counter instrumentation while maintaining the same logic.

Also applies to: 148-148, 157-158, 161-161, 175-175, 177-177


164-169: LGTM! Updated ValueHandler calls with hardware counter.

Both H::get_value calls have been correctly updated to include the &self.list.hw_counter parameter, enabling instrumentation of variable-sized data reads.

Also applies to: 178-183

lib/posting_list/src/value_handler.rs (1)

30-38: Well-implemented hardware counter instrumentation for value retrieval.

The changes correctly extend the ValueHandler trait and its implementations to support hardware counter tracking:

  • SizedHandler appropriately ignores the counter since it doesn't perform I/O operations
  • UnsizedHandler correctly increments the counter by the exact byte range being read from var_data
  • The counter increment is properly placed before the actual data access

Also applies to: 52-62, 106-126

lib/posting_list/src/view.rs (4)

18-20: Good encapsulation improvement with visibility changes.

Changing chunks and remainders fields from public to private and providing controlled accessor methods (get_chunk, get_remainder, etc.) is a good design improvement that allows for proper instrumentation.


153-157: Correct placement of I/O measurement for decompression.

The hardware counter increment correctly measures the compressed data size before decompression, accurately tracking the I/O read cost.


177-183: Smart use of inspect for conditional counter increments.

Using inspect to increment the counter only on successful accesses (when Option::Some is returned) is a good pattern that ensures accurate measurement.

Also applies to: 193-199


238-242: Clever binary search complexity measurement.

The counter increment of chunks_slice.len().ilog2() * size_of::<PostingChunk> accurately models the expected I/O cost of binary search, accounting for the logarithmic number of accesses.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@coszio coszio merged commit d1699c0 into dev May 28, 2025
17 checks passed
@coszio coszio deleted the posting-list-with-hw-measurement branch May 28, 2025 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants