Skip to content

Io measurement cleanup and tests#5949

Merged
JojiiOfficial merged 4 commits intodevfrom
io_measurement_cleanup_plus_tests
Mar 12, 2025
Merged

Io measurement cleanup and tests#5949
JojiiOfficial merged 4 commits intodevfrom
io_measurement_cleanup_plus_tests

Conversation

@JojiiOfficial
Copy link
Contributor

Depends on #5944

  • State unmeasured functions in comments
  • New consensus test for measuring on remote shard

@JojiiOfficial JojiiOfficial force-pushed the measure_vector_insert_io_writes branch from 58d5737 to 1b8c742 Compare March 5, 2025 10:52
@JojiiOfficial JojiiOfficial force-pushed the io_measurement_cleanup_plus_tests branch from 778b532 to 8e0ffeb Compare March 5, 2025 11:14
@JojiiOfficial JojiiOfficial requested a review from timvisee March 5, 2025 13:24
@JojiiOfficial JojiiOfficial force-pushed the measure_vector_insert_io_writes branch from 1bc7f4f to 2b0c8bb Compare March 6, 2025 14:13
@JojiiOfficial JojiiOfficial force-pushed the io_measurement_cleanup_plus_tests branch from 8263c73 to 04d84c3 Compare March 6, 2025 14:15
@JojiiOfficial JojiiOfficial requested a review from timvisee March 6, 2025 14:26
@JojiiOfficial JojiiOfficial force-pushed the io_measurement_cleanup_plus_tests branch from f7db5d0 to 5cef69a Compare March 7, 2025 12:08
Base automatically changed from measure_vector_insert_io_writes to dev March 10, 2025 14:27
@JojiiOfficial JojiiOfficial force-pushed the io_measurement_cleanup_plus_tests branch from 5cef69a to 2b7ec5f Compare March 10, 2025 14:29
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 10, 2025

📝 Walkthrough

Walkthrough

This update implements several changes related to hardware counter handling and telemetry measurement. A new module, referenced_counter, has been added to define the struct HwMetricRefCounter, which wraps a CounterCell and implements the Deref trait. Various parts of the code have been modified to use methods such as ref_payload_io_write_counter and ref_vector_io_write_counter instead of directly passing a HardwareCounterCell. The changes include updates to function signatures and method calls in storage, vector, and benchmarking modules, aligning them with the new counter reference approach. Comment modifications clarify that certain measurements are internal rather than externally tracked. Additionally, test fixtures and functions have been updated to support the changes, including an added parameter in the collection creation function for sparse vector configuration and a new test to verify the behavior when no local shard is present.

Suggested reviewers

  • timvisee
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
tests/consensus_tests/test_hw_measurement.py (3)

27-31: Consider using a more efficient variable name for the unused parameter.

The loop control variable peer_info is not used within the loop body.

-    for peer_id, peer_info in collection_info['peers'].items():
+    for peer_id, _ in collection_info['peers'].items():
🧰 Tools
🪛 Ruff (0.8.2)

27-27: Loop control variable peer_info not used within loop body

Rename unused peer_info to _peer_info

(B007)


45-45: Improve readability with consistent spacing.

There's inconsistent spacing around the comma in the assert_with_upper_bound_error call.

-    assert_with_upper_bound_error(hw['vector_io_write'], 1000 * 4 * 4,upper_bound_error_percent=0.2)  # 1k vectors of dim 4 where each dim is 4 bytes
+    assert_with_upper_bound_error(hw['vector_io_write'], 1000 * 4 * 4, upper_bound_error_percent=0.2)  # 1k vectors of dim 4 where each dim is 4 bytes

165-165: Improve assertion failure handling.

Directly using assert False can be problematic when running Python with optimizations enabled (python -O), as assertion statements are removed.

-        assert False, f"Assertion {inp} < {upper_bound} (upperbound) failed. Allowed error = {upper_bound_error_percent}"
+        raise AssertionError(f"Assertion {inp} < {upper_bound} (upperbound) failed. Allowed error = {upper_bound_error_percent}")
🧰 Tools
🪛 Ruff (0.8.2)

165-165: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bf47c97 and 2b7ec5f.

📒 Files selected for processing (12)
  • lib/collection/src/collection/sharding_keys.rs (1 hunks)
  • lib/common/common/src/counter/mod.rs (1 hunks)
  • lib/common/common/src/counter/referenced_counter.rs (1 hunks)
  • lib/gridstore/benches/bustle_bench/payload_storage.rs (2 hunks)
  • lib/gridstore/benches/random_data_bench.rs (1 hunks)
  • lib/gridstore/benches/real_data_bench.rs (2 hunks)
  • lib/gridstore/src/gridstore.rs (19 hunks)
  • lib/segment/src/payload_storage/mmap_payload_storage.rs (4 hunks)
  • lib/segment/src/payload_storage/simple_payload_storage.rs (1 hunks)
  • lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (3 hunks)
  • tests/consensus_tests/fixtures.py (2 hunks)
  • tests/consensus_tests/test_hw_measurement.py (6 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/consensus_tests/test_hw_measurement.py

19-19: start_cluster may be undefined, or defined from star imports

(F405)


23-23: wait_collection_exists_and_active_on_all_peers may be undefined, or defined from star imports

(F405)


25-25: get_cluster_info may be undefined, or defined from star imports

(F405)


27-27: Loop control variable peer_info not used within loop body

Rename unused peer_info to _peer_info

(B007)


34-34: create_shard_key may be undefined, or defined from star imports

(F405)


55-55: start_cluster may be undefined, or defined from star imports

(F405)


57-57: wait_collection_exists_and_active_on_all_peers may be undefined, or defined from star imports

(F405)


165-165: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

⏰ Context from checks skipped due to timeout of 90000ms (13)
  • GitHub Check: Basic TLS/HTTPS tests
  • GitHub Check: test-snapshot-operations-s3-minio
  • GitHub Check: test-shard-snapshot-api-s3-minio
  • GitHub Check: test-low-resources
  • GitHub Check: test-consistency
  • GitHub Check: test (macos-latest)
  • GitHub Check: test
  • GitHub Check: test (windows-latest)
  • GitHub Check: test-consensus-compose
  • GitHub Check: test-consensus
  • GitHub Check: test
  • GitHub Check: test (ubuntu-latest)
  • GitHub Check: test
🔇 Additional comments (42)
tests/consensus_tests/test_hw_measurement.py (4)

18-52: Good addition of a test for the no-local-shard scenario.

This test effectively verifies that hardware measurements are correctly captured by the targeted node when there is no local shard. The test demonstrates that the non-leader node captures telemetry data while the leader remains unaffected.

🧰 Tools
🪛 Ruff (0.8.2)

19-19: start_cluster may be undefined, or defined from star imports

(F405)


23-23: wait_collection_exists_and_active_on_all_peers may be undefined, or defined from star imports

(F405)


25-25: get_cluster_info may be undefined, or defined from star imports

(F405)


27-27: Loop control variable peer_info not used within loop body

Rename unused peer_info to _peer_info

(B007)


34-34: create_shard_key may be undefined, or defined from star imports

(F405)


56-56: Consistent parameter handling for sparse vectors across test functions.

The update to include sparse_vectors=False in create_collection calls is consistent with the changes in fixtures.py. This ensures that the tests operate with the intended vector configuration.

Also applies to: 123-123, 132-132


58-58: Consistent parameter handling for sparse vectors in point operations.

The update to include with_sparse_vector=False in upsert_random_points calls aligns with the requirement to test without sparse vectors. This maintains consistency with the collection configuration.

Also applies to: 68-68, 132-132


164-165: Clarified assertion message improves test error diagnosis.

The updated assertion message more explicitly states the failure condition, which helps with debugging test failures.

🧰 Tools
🪛 Ruff (0.8.2)

165-165: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

tests/consensus_tests/fixtures.py (3)

143-143: Good addition of sparse_vectors parameter with a default value.

Adding the sparse_vectors parameter with a default of True maintains backward compatibility while enabling more specific test configurations.


145-159: Well-structured payload construction with conditional logic.

The refactored payload construction using a dictionary variable improves code readability and maintainability. The conditional inclusion of sparse vectors is cleanly implemented.


235-239: Improved error handling in get_telemetry_hw_info.

The function now properly checks if the collection exists in the hardware data before accessing it, which prevents potential errors when the collection is not found.

lib/common/common/src/counter/mod.rs (1)

5-5: Module addition looks good.

Adding the referenced_counter module is a clean way to introduce the new hardware counter reference functionality. This aligns with the PR objective of improving measurement and testing processes.

lib/collection/src/collection/sharding_keys.rs (1)

64-64: Comment clarification is helpful.

The updated comment accurately reflects that this is an internal operation that doesn't require measurement, which helps clarify the intention behind using a disposable hardware counter.

lib/gridstore/benches/real_data_bench.rs (2)

16-16: Good addition of hardware counter reference.

Creating a reference to the payload IO write counter using the new ref_payload_io_write_counter() method is consistent with the broader changes to how hardware counters are handled across the codebase.


27-27: Using the reference counter is consistent.

Updating the put_value call to use the new hardware counter reference maintains consistency with the API changes elsewhere in the codebase.

lib/segment/src/payload_storage/simple_payload_storage.rs (1)

60-60: Counter type change is appropriate.

Changing to payload_io_write_counter() more accurately represents the type of operation being performed - writing payload data to storage. This makes the telemetry data more precise and meaningful.

lib/gridstore/benches/bustle_bench/payload_storage.rs (2)

38-44: LGTM: Refined I/O measurement for payload insertion

The change to use ref_payload_io_write_counter() instead of passing the HardwareCounterCell directly provides more specific measurement of payload I/O write operations.


52-57: LGTM: Consistent usage of I/O counter references

The update method now consistently uses the specific payload I/O write counter reference, aligning with the insertion method and providing more granular measurement.

lib/gridstore/benches/random_data_bench.rs (2)

13-13: LGTM: Improved counter reference handling

Creating a specific reference to the payload I/O write counter improves the clarity and specificity of what's being measured.


18-18: LGTM: Consistent usage of the payload counter reference

The bench iteration now uses the dedicated counter reference instead of the general hardware counter, providing more accurate measurement of payload I/O write operations.

lib/segment/src/vector_storage/sparse/mmap_sparse_vector_storage.rs (3)

154-159: LGTM: Improved specificity for vector I/O measurements

Using ref_vector_io_write_counter() makes it clear that this operation specifically measures vector I/O write operations, providing more detailed telemetry.


284-284: LGTM: Clarified measurement policy

The comment now explicitly states that deletions are not measured, which provides valuable documentation for anyone working with or analyzing the telemetry data.


236-236: LGTM: Documentation of measurement policy

Comment clarifies that this function is only used for internal operations and doesn't need to be measured, which is helpful context for future developers.

lib/segment/src/payload_storage/mmap_payload_storage.rs (6)

72-74: LGTM: Consistent use of payload I/O counter

Changed to use the specific payload I/O write counter reference, consistent with other modifications in the PR.


89-94: LGTM: Refactored counter reference in set method

The first branch of the set method now properly uses the dedicated payload I/O write counter reference.


98-99: LGTM: Consistent counter usage in set method

The second branch of the set method now uses the same payload I/O write counter reference approach, ensuring consistency across the codebase.


117-122: LGTM: Applied counter reference pattern to set_by_key

The first branch of the set_by_key method follows the same pattern of using the specific counter reference.


128-133: LGTM: Consistent counter reference in set_by_key

The second branch of the set_by_key method maintains consistency with the counter reference pattern.


162-167: LGTM: Applied counter reference pattern to delete method

The delete method now also uses the payload I/O write counter reference, completing the consistent application of this pattern throughout the file.

lib/common/common/src/counter/referenced_counter.rs (1)

1-39: Well-designed and clearly documented abstraction!

This is a good implementation of a wrapper type that restricts access to only specific metrics within a HardwareCounterCell. The use of Deref allows transparent usage at call sites, while hiding the constructor and only exposing factory methods enforces the correct pattern of usage.

I particularly like:

  • The clear documentation explaining the purpose and use cases
  • Making the constructor private to control instantiation
  • The use of #[inline] for the factory methods to avoid any performance penalty
lib/gridstore/src/gridstore.rs (16)

5-5: Added import for new counter type

The import is correctly added to support the parameter type change in put_value.


326-331: Improved parameter type specificity in put_value

The parameter type change from &HardwareCounterCell to HwMetricRefCounter makes the function signature more explicit about which metric is being tracked. This is a good change that helps with code clarity and correctness.


384-384: Simplified counter usage

The call to increment the counter is now more direct, which is a result of the more specific parameter type. This makes the code cleaner and less error-prone.


562-563: Updated test for new counter referencing pattern

The test has been properly updated to use the new referencing pattern.


587-590: Updated test code to use the new counter referencing pattern

Test updated correctly to match the new API.


615-617: More descriptive variable naming in test

The use of hw_counter_ref makes it clearer that this is a reference to a specific counter rather than the general hardware counter cell.


652-657: Test updated to use new counter referencing pattern

The test has been correctly updated to use the new ref_payload_io_write_counter() method.


683-685: Test updated to use new counter referencing pattern

This test update is consistent with the pattern used in other tests.


717-720: Test updated to use new counter referencing pattern

All tests have been consistently updated to match the new API.


740-742: Test updated to use new counter referencing pattern

All instances of the put_value call have been updated to use the new referencing pattern consistently.


819-820: Test updated to use new counter referencing pattern

The complex test case "behave_like_hashmap" is also correctly updated.


900-901: Test updated to use new counter referencing pattern

The "handle_huge_payload" test is also correctly updated.


942-945: Test updated to use new counter referencing pattern

The "storage_persistence_basic" test is also correctly updated.


981-997: Test helper function updated for new counter referencing pattern

The write_data helper function is correctly updated to use the new pattern.


1065-1085: Test updated to use new counter referencing pattern

Both instances of put_value in this test section have been properly updated.


1121-1128: Final test updated for new counter referencing pattern

All tests throughout the file have been consistently updated to the new pattern.

@JojiiOfficial JojiiOfficial merged commit c9a6a96 into dev Mar 12, 2025
18 checks passed
@JojiiOfficial JojiiOfficial deleted the io_measurement_cleanup_plus_tests branch March 12, 2025 08:59
timvisee pushed a commit that referenced this pull request Mar 21, 2025
* HW measurement cleanup and more tests

* Fix wrong measurements

* Use new assertion function

* Fix Payload Measurement bug. Fix tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants